From Laurence.Amilhat at toulouse.inra.fr Thu Jan 3 09:29:09 2008 From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat) Date: Thu, 03 Jan 2008 15:29:09 +0100 Subject: [Bioperl-l] BioPerl and NHX tree Message-ID: <477CF135.9060104@toulouse.inra.fr> Dear all, I am trying to convert a newick tree into an NHX tree, so I can add the taxid tag for each leaf. I am using the modules: Bio::TreeIO & Bio::Tree::NodeNHX The idea is 1) to read the newick tree 2) get the leaf, and get the corresponding taxid for it 3) add the nhx species tag 4) write the nhx tree I was able to do the first 2 steps, and I could create an object node_nhx and add the tag T, but I don't know how to write an nhx Tree with the node_nhx previously created... Does anyone have an idea? any help are welcome. Thanks, laurence. Here are my code and the samples files for better understanding: newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt _newick2nhx.pl:_ use strict; use Bio::TreeIO; use Bio::Tree::NodeNHX; use Getopt::Long; my $tree_file; my $outfile; my $codefile; my %corresp; GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' =>\$codefile); open (CODE, "< $codefile"); while (
)
{
    chomp;
    my($a, $b)=split (/\t/);
    $corresp{$a}=$b;
}


my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file");
my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");

while (my $tree= $treeio->next_tree)
{
    my @nodes=$tree->get_nodes();
    foreach my $nd(@nodes)
    {
        if ($nd->is_Leaf())
        {
            my $id=$nd->id();
            print "$id TAXID ",$corresp{$id},"\n";
           
            my $nodenhx=new Bio::Tree::NodeNHX();
            $nodenhx->nhx_tag({T=>$corresp{$id}});
        }
    }
    $treeout->write_tree($tree);
}


_test_tree.nwk_:
(((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
(42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,AAEL015662:100.0):100.0,
42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
42558941:100.0);

_seq_taxid.txt:_
AAEL015662      7159
42558969        9606
42558981        10090
42558942        9606
42558970        6239
42558929        10116
42558987        9606
42558930        10116
42558943        9606
148887393       10090
42558958        10090
42558941        9606
56405380        10090
90185247        9606
66774197        6239


_And the tata resulting file:_
(((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.0[&&NHX],AAEL01566
2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);




-- 
====================================================================
= Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan     	   = 
= Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
====================================================================




From aaron.j.mackey at gsk.com  Thu Jan  3 10:12:22 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Thu, 3 Jan 2008 10:12:22 -0500
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: <477CF135.9060104@toulouse.inra.fr>
Message-ID: 

Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that 
way, your tree's nodes are already NodeNHX's.  Instead of creating a new 
$nodenhx, you can use the $node variable directly from the tree ...

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM:

> Dear all,
> 
> I am trying to convert a newick tree into an NHX tree, so I can add the 
> taxid tag for each leaf.
> 
> I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
> The idea is
> 1) to read the newick tree
> 2) get the leaf, and get the corresponding taxid for it
> 3) add the nhx species tag
> 4) write the nhx tree
> 
> I was able to do the first 2 steps, and I could create an object 
> node_nhx and add the tag T,
> but I don't know how to write an nhx Tree with the node_nhx previously 
> created...
> 
> Does anyone have an idea? any help are welcome.
> 
> Thanks,
> 
> laurence.
> 
> 
> Here are my code and the samples files for better understanding:
> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt
> 
> _newick2nhx.pl:_
> use strict;
> use Bio::TreeIO;
> use Bio::Tree::NodeNHX;
> use Getopt::Long;
> 
> 
> my $tree_file;
> my $outfile;
> my $codefile;
> my %corresp;
> 
> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
> =>\$codefile);
> 
> open (CODE, "< $codefile");
> while ()
> {
>     chomp;
>     my($a, $b)=split (/\t/);
>     $corresp{$a}=$b;
> }
> 
> 
> my $treeio = new Bio::TreeIO (-format => 'newick', -file => 
"$tree_file");
> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
> 
> while (my $tree= $treeio->next_tree)
> {
>     my @nodes=$tree->get_nodes();
>     foreach my $nd(@nodes)
>     {
>         if ($nd->is_Leaf())
>         {
>             my $id=$nd->id();
>             print "$id TAXID ",$corresp{$id},"\n";
> 
>             my $nodenhx=new Bio::Tree::NodeNHX();
>             $nodenhx->nhx_tag({T=>$corresp{$id}});
>         }
>     }
>     $treeout->write_tree($tree);
> }
> 
> 
> _test_tree.nwk_:
> 
(((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
> 
42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,
> AAEL015662:100.0):100.0,
> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
> 42558941:100.0);
> 
> _seq_taxid.txt:_
> AAEL015662      7159
> 42558969        9606
> 42558981        10090
> 42558942        9606
> 42558970        6239
> 42558929        10116
> 42558987        9606
> 42558930        10116
> 42558943        9606
> 148887393       10090
> 42558958        10090
> 42558941        9606
> 56405380        10090
> 90185247        9606
> 66774197        6239
> 
> 
> _And the tata resulting file:_
> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,
> 
(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,
> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.
> 0[&&NHX],AAEL01566
> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],
> 
(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);
> 
> 
> 
> 
> -- 
> ====================================================================
> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan           = 
> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
> ====================================================================
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From Laurence.Amilhat at toulouse.inra.fr  Fri Jan  4 03:33:22 2008
From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat)
Date: Fri, 04 Jan 2008 09:33:22 +0100
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: 
References: 
Message-ID: <477DEF52.20802@toulouse.inra.fr>

Thank you Aaron,

it's working now. I've changed to species instead of taxid, so I can 
color the species on my tree using the ATV viewer.
thanks again,

Regards,

Laurence.



aaron.j.mackey at gsk.com a ?crit :
> Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that 
> way, your tree's nodes are already NodeNHX's.  Instead of creating a new 
> $nodenhx, you can use the $node variable directly from the tree ...
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM:
>
>   
>> Dear all,
>>
>> I am trying to convert a newick tree into an NHX tree, so I can add the 
>> taxid tag for each leaf.
>>
>> I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
>> The idea is
>> 1) to read the newick tree
>> 2) get the leaf, and get the corresponding taxid for it
>> 3) add the nhx species tag
>> 4) write the nhx tree
>>
>> I was able to do the first 2 steps, and I could create an object 
>> node_nhx and add the tag T,
>> but I don't know how to write an nhx Tree with the node_nhx previously 
>> created...
>>
>> Does anyone have an idea? any help are welcome.
>>
>> Thanks,
>>
>> laurence.
>>
>>
>> Here are my code and the samples files for better understanding:
>> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt
>>
>> _newick2nhx.pl:_
>> use strict;
>> use Bio::TreeIO;
>> use Bio::Tree::NodeNHX;
>> use Getopt::Long;
>>
>>
>> my $tree_file;
>> my $outfile;
>> my $codefile;
>> my %corresp;
>>
>> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
>> =>\$codefile);
>>
>> open (CODE, "< $codefile");
>> while ()
>> {
>>     chomp;
>>     my($a, $b)=split (/\t/);
>>     $corresp{$a}=$b;
>> }
>>
>>
>> my $treeio = new Bio::TreeIO (-format => 'newick', -file => 
>>     
> "$tree_file");
>   
>> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
>>
>> while (my $tree= $treeio->next_tree)
>> {
>>     my @nodes=$tree->get_nodes();
>>     foreach my $nd(@nodes)
>>     {
>>         if ($nd->is_Leaf())
>>         {
>>             my $id=$nd->id();
>>             print "$id TAXID ",$corresp{$id},"\n";
>>
>>             my $nodenhx=new Bio::Tree::NodeNHX();
>>             $nodenhx->nhx_tag({T=>$corresp{$id}});
>>         }
>>     }
>>     $treeout->write_tree($tree);
>> }
>>
>>
>> _test_tree.nwk_:
>>
>>     
> (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
>   
> 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
>   
>> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,
>> AAEL015662:100.0):100.0,
>> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
>> 42558941:100.0);
>>
>> _seq_taxid.txt:_
>> AAEL015662      7159
>> 42558969        9606
>> 42558981        10090
>> 42558942        9606
>> 42558970        6239
>> 42558929        10116
>> 42558987        9606
>> 42558930        10116
>> 42558943        9606
>> 148887393       10090
>> 42558958        10090
>> 42558941        9606
>> 56405380        10090
>> 90185247        9606
>> 66774197        6239
>>
>>
>> _And the tata resulting file:_
>> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,
>>
>>     
> (42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
>   
>> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,
>> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.
>> 0[&&NHX],AAEL01566
>> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],
>>
>>     
> (42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);
>   
>>
>>
>> -- 
>> ====================================================================
>> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan           = 
>> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
>> ====================================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
>   


-- 
====================================================================
= Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan     	   = 
= Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
====================================================================




From hlapp at gmx.net  Sun Jan  6 22:02:32 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 6 Jan 2008 22:02:32 -0500
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
In-Reply-To: 
References: 
Message-ID: <640890C9-2D34-4C70-9179-26A9EAB397D2@gmx.net>

Hi Zhihua, you didn't ever respond to Marc's link to the Persistent  
Bioperl slides - did that help?

	-hilmar

On Dec 6, 2007, at 11:25 PM, zhihuali wrote:

>
> Hi netters,
>
> I've installed BioSQL and bioperl-db, and successfully created and  
> stored a persistent object:
>
> use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
> my $dbadp=Bio::DB::BioDB->new(- 
> database=>'biosql',                             - 
> user=>'annoymous',                             -dbname=>'bioseqdb');
>
> my $seqobj=Bio::Seq->new(- 
> accession_number=>"test",                      - 
> id=>"test1",                      - 
> seq=>"AGCTAGCT",                      -version=>1);my $dbobj=$dbadp- 
> >create_persistent($seqobj);$dbobj->create;$dbobj->commit;
>
> It's successful because I found corresponding rows in the bioseqdb  
> tables.
>
> Now I want to retrieve the object back from the database. There's  
> not much documents available and I've tried find_by_unique_key/ 
> primary_key but all failed. Maybe I didn't use them correctly.  
> Could anyone give me an example as how to retrieve the stored  
> Bio::Seq object?
>
> Thanks a lot!
>
> Zhihua Li
> _________________________________________________________________
> ? Live Search ???????
> http://www.live.com/?searchOnly=true
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From cain.cshl at gmail.com  Mon Jan  7 12:24:02 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 12:24:02 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
Message-ID: <1199726642.6374.10.camel@frissell>

Hello,

I was trying to get bioperl-live this morning from either cvs or svn and
failed.  I was wondering if something was going on with the server.

Here are the things I tried:

  cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co bioperl-live

which resulted in this:

cvs checkout: warning: cannot write to history file /home/repository/bioperl/CVSROOT/history: Permission denied
cvs checkout: Updating bioperl-live
cvs checkout: failed to create lock directory for `/home/repository/bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/#cvs.lock): Permission denied
cvs checkout: failed to obtain dir lock in repository `/home/repository/bioperl/bioperl-live'
cvs [checkout aborted]: read lock failed - giving up

Then I thought I'd try the suggested svn checkout method from the
bioperl wiki:

  svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live

which resulted in

svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live'

Finally, I after looking at the openbio server, I thought I'd try this:

   svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/bioperl/bioperl-live

which resulted in repeated requests for my password (which I supplied
correctly at least once out of the several requests).

So, what's up?

Thanks much,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hlapp at gmx.net  Mon Jan  7 12:36:02 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 7 Jan 2008 12:36:02 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <1199726642.6374.10.camel@frissell>
References: <1199726642.6374.10.camel@frissell>
Message-ID: 

I think we are still migrating to svn. It's probably better to wait  
for the announcement that everything is ready to go. (And then cvs  
won't work anymore except for anonymous checkout - which should  
actually continue to work while this is in progress. Have you tried  
that?)

	-hilmar

On Jan 7, 2008, at 12:24 PM, Scott Cain wrote:

> Hello,
>
> I was trying to get bioperl-live this morning from either cvs or  
> svn and
> failed.  I was wondering if something was going on with the server.
>
> Here are the things I tried:
>
>   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> bioperl-live
>
> which resulted in this:
>
> cvs checkout: warning: cannot write to history file /home/ 
> repository/bioperl/CVSROOT/history: Permission denied
> cvs checkout: Updating bioperl-live
> cvs checkout: failed to create lock directory for `/home/repository/ 
> bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> #cvs.lock): Permission denied
> cvs checkout: failed to obtain dir lock in repository `/home/ 
> repository/bioperl/bioperl-live'
> cvs [checkout aborted]: read lock failed - giving up
>
> Then I thought I'd try the suggested svn checkout method from the
> bioperl wiki:
>
>   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> bioperl-live
>
> which resulted in
>
> svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> hartzell/bioperl/bioperl-live'
>
> Finally, I after looking at the openbio server, I thought I'd try  
> this:
>
>    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> bioperl/bioperl-live
>
> which resulted in repeated requests for my password (which I supplied
> correctly at least once out of the several requests).
>
> So, what's up?
>
> Thanks much,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






From jason at bioperl.org  Mon Jan  7 12:43:18 2008
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 7 Jan 2008 09:43:18 -0800
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <1199726642.6374.10.camel@frissell>
References: <1199726642.6374.10.camel@frissell>
Message-ID: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>

CVS r/w is locked because we are transitioning to SVN - you can still  
checkout via anonymous CVS on code.open-bio.org.

The SVN is going to be in /home/svn-repositories/bioperl not George's  
directory, but we are still monkeying around with the directory  
structure.  You can try a checkout but be warned it may change a few  
more times if we add another directory layer in there.

You will get requests for your password at least three times - I  
strongly suggest you use SSH keys to avoid getting prompted each time  
- I don't know why you get asked 3 times as it is a SVN thing I  
assume it is having to make 3 separate requests to do a checkout.

That's what is up for now.  We'll report when the final SVN migration  
is done.

-jason
On Jan 7, 2008, at 9:24 AM, Scott Cain wrote:

> Hello,
>
> I was trying to get bioperl-live this morning from either cvs or  
> svn and
> failed.  I was wondering if something was going on with the server.
>
> Here are the things I tried:
>
>   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> bioperl-live
>
> which resulted in this:
>
> cvs checkout: warning: cannot write to history file /home/ 
> repository/bioperl/CVSROOT/history: Permission denied
> cvs checkout: Updating bioperl-live
> cvs checkout: failed to create lock directory for `/home/repository/ 
> bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> #cvs.lock): Permission denied
> cvs checkout: failed to obtain dir lock in repository `/home/ 
> repository/bioperl/bioperl-live'
> cvs [checkout aborted]: read lock failed - giving up
>
> Then I thought I'd try the suggested svn checkout method from the
> bioperl wiki:
>
>   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> bioperl-live
>
> which resulted in
>
> svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> hartzell/bioperl/bioperl-live'
>
> Finally, I after looking at the openbio server, I thought I'd try  
> this:
>
>    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> bioperl/bioperl-live
>
> which resulted in repeated requests for my password (which I supplied
> correctly at least once out of the several requests).
>
> So, what's up?
>
> Thanks much,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> ______________________________________________


From cain.cshl at gmail.com  Mon Jan  7 12:57:38 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 12:57:38 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>
References: <1199726642.6374.10.camel@frissell>
	<5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>
Message-ID: <1199728658.6374.12.camel@frissell>

Hi Hilmar and Jason,

Thanks--for some reason, I thought svn was done.  I'll remain anonymous
for right now (Kind of difficult to do when you announce it publicly :-)

Thanks,
Scott

On Mon, 2008-01-07 at 09:43 -0800, Jason Stajich wrote:
> CVS r/w is locked because we are transitioning to SVN - you can still  
> checkout via anonymous CVS on code.open-bio.org.
> 
> The SVN is going to be in /home/svn-repositories/bioperl not George's  
> directory, but we are still monkeying around with the directory  
> structure.  You can try a checkout but be warned it may change a few  
> more times if we add another directory layer in there.
> 
> You will get requests for your password at least three times - I  
> strongly suggest you use SSH keys to avoid getting prompted each time  
> - I don't know why you get asked 3 times as it is a SVN thing I  
> assume it is having to make 3 separate requests to do a checkout.
> 
> That's what is up for now.  We'll report when the final SVN migration  
> is done.
> 
> -jason
> On Jan 7, 2008, at 9:24 AM, Scott Cain wrote:
> 
> > Hello,
> >
> > I was trying to get bioperl-live this morning from either cvs or  
> > svn and
> > failed.  I was wondering if something was going on with the server.
> >
> > Here are the things I tried:
> >
> >   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> > bioperl-live
> >
> > which resulted in this:
> >
> > cvs checkout: warning: cannot write to history file /home/ 
> > repository/bioperl/CVSROOT/history: Permission denied
> > cvs checkout: Updating bioperl-live
> > cvs checkout: failed to create lock directory for `/home/repository/ 
> > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> > #cvs.lock): Permission denied
> > cvs checkout: failed to obtain dir lock in repository `/home/ 
> > repository/bioperl/bioperl-live'
> > cvs [checkout aborted]: read lock failed - giving up
> >
> > Then I thought I'd try the suggested svn checkout method from the
> > bioperl wiki:
> >
> >   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> > bioperl-live
> >
> > which resulted in
> >
> > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> > hartzell/bioperl/bioperl-live'
> >
> > Finally, I after looking at the openbio server, I thought I'd try  
> > this:
> >
> >    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> > bioperl/bioperl-live
> >
> > which resulted in repeated requests for my password (which I supplied
> > correctly at least once out of the several requests).
> >
> > So, what's up?
> >
> > Thanks much,
> > Scott
> >
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                    
> > cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> > ______________________________________________
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From cain.cshl at gmail.com  Mon Jan  7 13:34:25 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 13:34:25 -0500
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
Message-ID: <1199730865.6374.18.camel@frissell>

Hello,

I was wanting to implement this myself (and probably still will,
assuming it's not already there...) but I am not a Module::Build guru.
Here's what I'd like to do: add a parameter that I can add when evoking
perl Build.PL so that the default answers will be used when it would
normally ask me a question while running perl Build.PL, something like
this:

  perl Build.PL --yes

Is this sort of thing already built into Module::Build and I can't see
it?  Or can somebody suggest the best way of going about this?

Thanks much,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From cjfields at uiuc.edu  Mon Jan  7 17:22:35 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 7 Jan 2008 16:22:35 -0600
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
In-Reply-To: <1199730865.6374.18.camel@frissell>
References: <1199730865.6374.18.camel@frissell>
Message-ID: <31AD254B-DABA-488D-BDA8-D690F949CC39@uiuc.edu>

I agree it would be nice.  Not sure how hard it would be to implement;  
maybe it would be best to have a mode of installation, say if one  
wanted 'minimal' (no optional module installation, no scripts),  
'full', 'dev', (assume minimal install but don't test), and so on,  
falling back to the query-based approach if nothing is indicated.

chris

On Jan 7, 2008, at 12:34 PM, Scott Cain wrote:

> Hello,
>
> I was wanting to implement this myself (and probably still will,
> assuming it's not already there...) but I am not a Module::Build guru.
> Here's what I'd like to do: add a parameter that I can add when  
> evoking
> perl Build.PL so that the default answers will be used when it would
> normally ask me a question while running perl Build.PL, something like
> this:
>
>  perl Build.PL --yes
>
> Is this sort of thing already built into Module::Build and I can't see
> it?  Or can somebody suggest the best way of going about this?
>
> Thanks much,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From bix at sendu.me.uk  Mon Jan  7 17:37:36 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 07 Jan 2008 22:37:36 +0000
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
In-Reply-To: <1199730865.6374.18.camel@frissell>
References: <1199730865.6374.18.camel@frissell>
Message-ID: <4782A9B0.60203@sendu.me.uk>

Scott Cain wrote:
> Hello,
> 
> I was wanting to implement this myself (and probably still will,
> assuming it's not already there...) but I am not a Module::Build guru.
> Here's what I'd like to do: add a parameter that I can add when evoking
> perl Build.PL so that the default answers will be used when it would
> normally ask me a question while running perl Build.PL, something like
> this:
> 
>   perl Build.PL --yes
> 
> Is this sort of thing already built into Module::Build and I can't see
> it?  Or can somebody suggest the best way of going about this?

You should ask on the Module::Build mailing list. If it already exists I 
don't think it is obvious, however.

If your question is BioPerl related, and you're looking for a fast way 
of installing BioPerl without the annoying questions, I'm sure I could 
hack something into ModuleBuildBioperl.pm

From cain.cshl at gmail.com  Mon Jan  7 22:04:19 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 22:04:19 -0500
Subject: [Bioperl-l] Automatically accepting defaults for `perl	Build.PL`
In-Reply-To: <4782A9B0.60203@sendu.me.uk>
References: <1199730865.6374.18.camel@frissell> <4782A9B0.60203@sendu.me.uk>
Message-ID: <1199761459.6017.1.camel@frissell>

Hi Sendu,

I just hacked something up (I only needed to change a few lines--once I
figured out where everything was).  I like Chris' idea though; before I
commit it back (Ha, no rush there), I'll flesh it out a little more to
give more options.

Scott

On Mon, 2008-01-07 at 22:37 +0000, Sendu Bala wrote:
> Scott Cain wrote:
> > Hello,
> > 
> > I was wanting to implement this myself (and probably still will,
> > assuming it's not already there...) but I am not a Module::Build guru.
> > Here's what I'd like to do: add a parameter that I can add when evoking
> > perl Build.PL so that the default answers will be used when it would
> > normally ask me a question while running perl Build.PL, something like
> > this:
> > 
> >   perl Build.PL --yes
> > 
> > Is this sort of thing already built into Module::Build and I can't see
> > it?  Or can somebody suggest the best way of going about this?
> 
> You should ask on the Module::Build mailing list. If it already exists I 
> don't think it is obvious, however.
> 
> If your question is BioPerl related, and you're looking for a fast way 
> of installing BioPerl without the annoying questions, I'm sure I could 
> hack something into ModuleBuildBioperl.pm
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From granjeau at tagc.univ-mrs.fr  Wed Jan  9 03:30:17 2008
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Wed, 09 Jan 2008 09:30:17 +0100
Subject: [Bioperl-l] Parsing SwissProt annotation in comment
Message-ID: <47848619.40109@tagc.univ-mrs.fr>

Hello,

I would like to retrieve the human reviewed annotation of SwissProt 
entries; these information are in the comment section of the sequence 
file. Here is an example:

CC   -!- FUNCTION: Actins are highly conserved proteins that are involved
CC       in various types of cell motility and are ubiquitously expressed
CC       in all eukaryotic cells.
CC   -!- SUBUNIT: Polymerization of globular actin (G-actin) leads to a
CC       structural filament (F-actin) in the form of a two-stranded helix.
CC       Each actin can bind to 4 others. Found in a complex with XPO6,
CC       Ran, ACTB and PFN1. Component of a complex composed at least of
CC       ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with XPO6.
CC   -!- INTERACTION:
CC       Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668;
CC       P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161;
CC   -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton.

Is there a specific method to do such a job?

Thanks much,
Samuel

-- 

Samuel GRANJEAUD                   granjeau at tagc.univ-mrs.fr
INSERM - ICIM - TAGC               Tel: +33  (0)491 82 87 24
http://tagc.univ-mrs.fr            Fax: +33  (0)491 82 87 01
http://icim.marseille.inserm.fr/proteomique


From robfsouza at gmail.com  Wed Jan  9 08:20:08 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Wed, 9 Jan 2008 11:20:08 -0200
Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs
Message-ID: 

Hello All!

Greetings for everybody and happy new year for those following an
western calendary!

I'm starting a new project to store and analyze distinct sets of
sequence annotation data which are related in a way suitable for
representation in a directed (e.g. transcript splicing) or undirected
(e.g. gene product interaction) graph. Analysis will require frequent
queries based on interval overlaps, feature neighbourhood, annotation
and, most importantly, feature relationships and stored paths.

At first, I thought of build an entire new database structure to store
project specific data (e.g. alternative splicing or protein interaction),
but as I have some experience with Lincon's
Bio::DB::SeqFeature::Store, I'm now considering extending it for the
purpose of storing graphs describing relationships among features.

I'm aware that some other bioperl related databases, specifically
BioSQL and Chado, do have  components which might be suitable for
storing all or some of these data but, since Lincon's feature storage
and interval binning implementations in
Bio::DB::SeqFeature::Store::mysql are both clean, simple and very fast,
perhaps extending it in a seemingly modular way is desirable. A good
extension to Lincon's database could include tables like
feature_relationship and feature_path, for edges and transitive
closures (just like in BioSQL) and feature_stored_path, for exclusion
of biologically irrelevant paths in DAGs, like certain splicing
isoforms. These tables could be used  to store sequence assemblies or
EST alignments efficiently, including scaffolds inferred by connecting
contigs.

Before starting, I would like to know if the BioSQL and Chado schemata
do have accelerators for quering intervals among billions of features
and feature relatioships (some examples using these databases would
also help, if they that these databases are efficient for such tasks).
If these or other databases are not as suitable as Bio::DB::SeqFeature
for feature retrieval based on interval overlap and attributes,  then
again I might consider extending Bio::DB::seqFeature
and contributing such extensions back to bioperl...

Any thoughts?

Best regards,
Robson

PS: sorry if anyone gets two copies of this post, but took me some
time to realize my new e-mail wasn't subscribed to bioperl-l...

From bix at sendu.me.uk  Wed Jan  9 08:59:08 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 09 Jan 2008 13:59:08 +0000
Subject: [Bioperl-l] bioperl based database infrastucture for directed
 graphs
In-Reply-To: 
References: 
Message-ID: <4784D32C.9070807@sendu.me.uk>

Robson Francisco de Souza wrote:
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,

I'm using Bio::DB::SeqFeature for that purpose, but just a warning: I 
found that with millions of features it made a db that was too large in 
terms of disc space and too slow in terms of query time. I had to hack 
out its storage of feature objects in the db, instead generating feature 
objects on request from the stored attributes. Doing this turned out to 
be faster than simply unfreezing certain kinds of feature objects!

(I also had to hack in support for retrieval by source, a patch that 
Lincoln hasn't gotten back to me about yet.)

While I can't answer your main questions, I wish you good luck with your 
project and request that you keep us posted with what you achieve.

From bosborne11 at verizon.net  Wed Jan  9 09:46:42 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 09 Jan 2008 09:46:42 -0500
Subject: [Bioperl-l] Parsing SwissProt annotation in comment
In-Reply-To: <47848619.40109@tagc.univ-mrs.fr>
References: <47848619.40109@tagc.univ-mrs.fr>
Message-ID: <3DAEDA67-B9A5-47A4-8108-0915659F1052@verizon.net>

Samuel,

The Feature-Annotation HOWTO addresses this specifically:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation


Brian O.


On Jan 9, 2008, at 3:30 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello,
>
> I would like to retrieve the human reviewed annotation of SwissProt  
> entries; these information are in the comment section of the  
> sequence file. Here is an example:
>
> CC   -!- FUNCTION: Actins are highly conserved proteins that are  
> involved
> CC       in various types of cell motility and are ubiquitously  
> expressed
> CC       in all eukaryotic cells.
> CC   -!- SUBUNIT: Polymerization of globular actin (G-actin) leads  
> to a
> CC       structural filament (F-actin) in the form of a two-stranded  
> helix.
> CC       Each actin can bind to 4 others. Found in a complex with  
> XPO6,
> CC       Ran, ACTB and PFN1. Component of a complex composed at  
> least of
> CC       ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with  
> XPO6.
> CC   -!- INTERACTION:
> CC       Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668;
> CC       P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161;
> CC   -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton.
>
> Is there a specific method to do such a job?
>
> Thanks much,
> Samuel
>
> -- 
>
> Samuel GRANJEAUD                   granjeau at tagc.univ-mrs.fr
> INSERM - ICIM - TAGC               Tel: +33  (0)491 82 87 24
> http://tagc.univ-mrs.fr            Fax: +33  (0)491 82 87 01
> http://icim.marseille.inserm.fr/proteomique
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From alexanderptok at web.de  Wed Jan  9 10:34:56 2008
From: alexanderptok at web.de (Alexander Ptok)
Date: Wed, 09 Jan 2008 16:34:56 +0100
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN]
Message-ID: <2011210591@web.de>

Hi,

I am a beginner to BioPerl and working through the Beginners HOWTO

Version of BioPerl is 1.4-1 running on Debian etch

In the Howto everything worked fine until the section

Retrieving multiple sequences from a database

from where i copied the following script:

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
 
$query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
$query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  -query => $query );
 
$gb_obj = Bio::DB::GenBank->new;
 
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
 
while ($seq_obj = $stream_obj->next_seq) {    
    # do something with the sequence object    
    print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}


If i cut the 0:3000[SLEN] query it works and returns a lot of sequences, when i alter the query to e.g. 1830[SLEN] it
finds the one sequence that has the length 1830, but i was not able to query a range of lengths.

Please, does anyone know what i am doing wrong.
Greetings
A. Ptok
_________________________________________________________________________
In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! 
Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114


From cjm at fruitfly.org  Wed Jan  9 11:52:21 2008
From: cjm at fruitfly.org (Chris Mungall)
Date: Wed, 9 Jan 2008 08:52:21 -0800
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: 
References: 
Message-ID: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>

[cc-d to gmod-schema]

Chado does have some views and pg functions for interval-based  
retrieval. AFAIK there are no accelerators for deep feature graphs,  
as most chado users have relatively shallow gene-model/SO feature  
graphs. It may not be so hard to extend cvterm code for doing this,  
depending on the characteristics of your graphs (the closure of  
feature neighbourhood graphs may be particularly large)

On Jan 9, 2008, at 5:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Jan  9 10:00:38 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 Jan 2008 09:00:38 -0600
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: <4784D32C.9070807@sendu.me.uk>
References: 
	<4784D32C.9070807@sendu.me.uk>
Message-ID: 


On Jan 9, 2008, at 7:59 AM, Sendu Bala wrote:

> Robson Francisco de Souza wrote:
>> Before starting, I would like to know if the BioSQL and Chado  
>> schemata
>> do have accelerators for quering intervals among billions of features
>> and feature relatioships (some examples using these databases would
>> also help, if they that these databases are efficient for such  
>> tasks).
>> If these or other databases are not as suitable as  
>> Bio::DB::SeqFeature
>> for feature retrieval based on interval overlap and attributes,
>
> I'm using Bio::DB::SeqFeature for that purpose, but just a warning:  
> I found that with millions of features it made a db that was too  
> large in terms of disc space and too slow in terms of query time. I  
> had to hack out its storage of feature objects in the db, instead  
> generating feature objects on request from the stored attributes.  
> Doing this turned out to be faster than simply unfreezing certain  
> kinds of feature objects!

Would this be Bio::SF::Annotated objects? If so I bet Storable is  
storing the OntologyStore object information along with the SF (which  
argues for refactoring the FeatureIO/Bio::SF::Annotated stuff in 1.7).

Not sure what can be done about that beyond your hack, though it might  
be worth exploring whether one can optionally set the DB::Store to  
store the object instance.

> (I also had to hack in support for retrieval by source, a patch that  
> Lincoln hasn't gotten back to me about yet.)
>
> While I can't answer your main questions, I wish you good luck with  
> your project and request that you keep us posted with what you  
> achieve.

You can always try Lincoln on the GBrowse list as well.  I would say  
go ahead and commit the patch if it isn't a big deal.

chris

From cjfields at uiuc.edu  Wed Jan  9 13:12:55 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 Jan 2008 12:12:55 -0600
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: 
References: 
Message-ID: <128517E8-3A2A-45DD-83A0-0014863A25BC@uiuc.edu>

cc'ing the gbrowse list in case Lincoln hasn't seen this.

I believe the primary intent for Bio::DB::SeqFeature::Store was as a  
more GFF3-compatible replacement for Bio::DB::GFF (unlimited feature  
nesting, uses any SeqFeatureI, etc) and was streamlined for faster  
lookups by GBrowse.  I don't think adding tables would affect  
performance dramatically, though maybe Lincoln would have a better idea.

chris

On Jan 9, 2008, at 7:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From bosborne11 at verizon.net  Wed Jan  9 13:29:15 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 09 Jan 2008 13:29:15 -0500
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths
	0:3000[SLEN]
In-Reply-To: <2011210591@web.de>
References: <2011210591@web.de>
Message-ID: <0EB96131-7931-4FC3-802F-A8152B474A99@verizon.net>

Alexander,

I don't understand. By using the clause "0:3000[SLEN] " you are  
querying for sequences in the length range of 0 to 3000.


Brian O.


On Jan 9, 2008, at 10:34 AM, Alexander Ptok wrote:

> If i cut the 0:3000[SLEN] query it works and returns a lot of  
> sequences, when i alter the query to e.g. 1830[SLEN] it
> finds the one sequence that has the length 1830, but i was not able  
> to query a range of lengths.


From stefan.kirov at bms.com  Wed Jan  9 14:54:07 2008
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 09 Jan 2008 14:54:07 -0500
Subject: [Bioperl-l] pairwise_kaks.PLS: verbose rquired by PAML
Message-ID: <4785265F.6020500@bms.com>

Jason,
Even this last fix I still had problems with bp_pairwise_kaks.pl. It
turns out, verbose needs to be set on by default for codeml in order for
the sequences to appear in mlc file.\
That being said, we need instead of:
    $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
        (-verbose => $verbose,
         -params => { 'runmode' => -2,
                      'seqtype' => 1,
                  }
         );
this:

    $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
        (-verbose => $verbose,
         -params => { 'runmode' => -2,
                      'seqtype' => 1,
                      'verbose' => 1,
                  }
         );

verbose can 2 as well.... Just got this clarification from Ziheng. He
also offers to change the output so it becomes easier for us. I plan to
ask him to put the sequence in the mlc header by default.
Stefan


From robfsouza at gmail.com  Wed Jan  9 19:28:25 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Wed, 9 Jan 2008 22:28:25 -0200
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
References: 
	<199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
Message-ID: 

Hi,

2008/1/9, Chris Mungall :
> [cc-d to gmod-schema]
>
> Chado does have some views and pg functions for interval-based
> retrieval. AFAIK there are no accelerators for deep feature graphs,
> as most chado users have relatively shallow gene-model/SO feature
> graphs. It may not be so hard to extend cvterm code for doing this,
> depending on the characteristics of your graphs (the closure of
> feature neighbourhood graphs may be particularly large)

Great! I'm studing Chado and I will have a look at the interval optimizations.
Did any of you compared BioSQL and Chado for huge feature and feature
graph storage/retrieval efficiency? As Sendu pointed to limitations in
Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms
(or maybe another one?) would be best suited for these tasks... for
the moment, I will either extend Sendu's hack of Lincon's modules or
adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to
Chado, if it turns out to be more efficient than the pg functions.

Best,
Robson

PS: I could not find the most recent version of gmod by following the
Download link to gmod(Chado) from GMOD's site to the Sourceforge
download page. Did I miss the right link on the download site or is
this unexpected? Is the version available at IUBio's mirror (0.003-10)
the most recent one?

From cain.cshl at gmail.com  Wed Jan  9 22:15:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 09 Jan 2008 22:15:29 -0500
Subject: [Bioperl-l] bioperl based database infrastucture for
	directed	graphs
In-Reply-To: 
References: 
	<199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
	
Message-ID: <1199934929.6229.44.camel@frissell>

Hi Robson,

I seem to be perennially working on the 1.0 release of Chado.  The
schema itself is quite stable but I'm always working on the tools to
make them handle more cases and be as stable as possible.  For the time
being, you need to get Chado from cvs; see 

  http://www.gmod.org/wiki/index.php/Chado_-_Getting_Started#Chado_From_CVS

I removed the 0.003 release from the SourceForge site because the schema
in it is out of date relative to what we've been working on for the last
year.

Scott

On Wed, 2008-01-09 at 22:28 -0200, Robson Francisco de Souza wrote:
> Hi,
> 
> 2008/1/9, Chris Mungall :
> > [cc-d to gmod-schema]
> >
> > Chado does have some views and pg functions for interval-based
> > retrieval. AFAIK there are no accelerators for deep feature graphs,
> > as most chado users have relatively shallow gene-model/SO feature
> > graphs. It may not be so hard to extend cvterm code for doing this,
> > depending on the characteristics of your graphs (the closure of
> > feature neighbourhood graphs may be particularly large)
> 
> Great! I'm studing Chado and I will have a look at the interval optimizations.
> Did any of you compared BioSQL and Chado for huge feature and feature
> graph storage/retrieval efficiency? As Sendu pointed to limitations in
> Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms
> (or maybe another one?) would be best suited for these tasks... for
> the moment, I will either extend Sendu's hack of Lincon's modules or
> adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to
> Chado, if it turns out to be more efficient than the pg functions.
> 
> Best,
> Robson
> 
> PS: I could not find the most recent version of gmod by following the
> Download link to gmod(Chado) from GMOD's site to the Sourceforge
> download page. Did I miss the right link on the download site or is
> this unexpected? Is the version available at IUBio's mirror (0.003-10)
> the most recent one?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From bosborne11 at verizon.net  Thu Jan 10 09:16:16 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 10 Jan 2008 09:16:16 -0500
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths
	0:3000[SLEN]
In-Reply-To: <2013325230@web.de>
References: <2013325230@web.de>
Message-ID: <932550FF-8414-4B3E-92BB-1895FD9658AE@verizon.net>

Alexander,

OK, that is odd (meaning, this did work a while back but it's not  
clear to me what could have changed).

First thing to do, upgrade to Bioperl version 1.52. Can you do this?  
Version 1.4 is very old and you could run into other problems using it.


Brian O.



On Jan 10, 2008, at 8:54 AM, Alexander Ptok wrote:

> Hallo Brian,
>
> thanks for your answer. The principle is clear, but it doesn't work
> like it should, on my computer. So maybe i should repeat what i did
> step by step.
>
> 1. i took the following script:
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
> $query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  - 
> query => $query );
>
> $gb_obj = Bio::DB::GenBank->new;
>
> $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
> while ($seq_obj = $stream_obj->next_seq) {
>    # do something with the sequence object
>    print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
> }
>
> and then on the terminal
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script1.pl
> sv1494 at r04102:~/Desktop/bioperl$
>
> 2. i took out the 0:3000[SLEN]:
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]";
>
> and then on the terminal
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script2.pl
> NM_128760       2775
> NM_125788       2874
> NM_124913       3068
> NM_124912       3117
> NM_124775       871
> NM_120360       1655
> NM_111862       2199
> NM_001036386    2734
> NM_119270       3996
> NM_105072       1656
> NM_113294       4824
> NM_180431       1673
> NM_120495       2515
> NM_120493       2050
> NM_112156       1089
> .
> .
> and a lot more of hits, and one can clearly see, there are some with  
> a lenght between 0 and 3000
>
> 3. to have a look at the [SLEN] i tried another script with e.g.  
> 2199[SLEN]
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 2199[SLEN]";
>
> on the terminal:
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script3.pl
> NM_111862       2199
> sv1494 at r04102:~/Desktop/bioperl$
>
>
>
> It think everthing works fine, except that bioperl or maybe the  
> genbank doesn't understand
> the range clause 0:3000, but in every documentation says i have to  
> do it that way. Did
> i misunterstand something or is it just a problem of my computer/ 
> bioperl installation?
> Maybe you can tell me if the script does what it is suppose to do on  
> your computer?
>
> Thanks and greetings
>
> Alexander Ptok
>>
>> Alexander,
>>
>> I don't understand. By using the clause "0:3000[SLEN] " you are
>> querying for sequences in the length range of 0 to 3000.
>>
>
>
> _______________________________________________________________________
> Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 30 Tage
> kostenlos testen. http://www.pc-sicherheit.web.de/startseite/? 
> mc=022220
>



From pmiguel at purdue.edu  Fri Jan 11 11:22:38 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 11:22:38 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
Message-ID: <478797CE.9050202@purdue.edu>

No problem getting sequence from genbank via a myriad of methods. But as 
the volume of non-finished sequence in genbank increases the importance 
of also obtaining quality values for a given sequence increases. Some 
records include quality values.

I typically use bp_fetch.pl to grab a sequence from genbank:

bp_fetch.pl -fmt fasta net::genbank:AC207960

sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't designed 
to pull down quals evidently:

bp_fetch.pl -fmt qual net::genbank:AC207960

gives:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual object 
to write_seq() as a parameter named "source"
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::qual::write_seq 
/usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205
STACK: /usr/local/perl/bin/bp_fetch.pl:313
-----------------------------------------------------------

(running under bioperl 1.5.2)

The quality values for this accession are in genbank as these URLs 
demonstrate:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual

What is the best way to pull down these qual values? They aren't present 
in "GenBank(Full)" format. They are present in an ASN.1 format.

Advice would be appreciated.

-- 
Phillip
Purdue Genomics Core Facility





From cjfields at uiuc.edu  Fri Jan 11 12:09:40 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 Jan 2008 11:09:40 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <478797CE.9050202@purdue.edu>
References: <478797CE.9050202@purdue.edu>
Message-ID: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>

I don't think this is possible with the current setup for  
Bio::DB::GenBank (which the script uses).  We'll have to investigate  
whether it is possible to retrieve this data via NCBI's eutils; if so  
we can try adding it in.  If you want you can submit this as an  
enhancement request via bugzilla for tracking:

http://bugzilla.open-bio.org/

chris

On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:

> No problem getting sequence from genbank via a myriad of methods.  
> But as the volume of non-finished sequence in genbank increases the  
> importance of also obtaining quality values for a given sequence  
> increases. Some records include quality values.
>
> I typically use bp_fetch.pl to grab a sequence from genbank:
>
> bp_fetch.pl -fmt fasta net::genbank:AC207960
>
> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't  
> designed to pull down quals evidently:
>
> bp_fetch.pl -fmt qual net::genbank:AC207960
>
> gives:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual  
> object to write_seq() as a parameter named "source"
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ 
> 5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::qual::write_seq /usr/local/perl_5.8/lib/site_perl/ 
> 5.8.8/Bio/SeqIO/qual.pm:205
> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> -----------------------------------------------------------
>
> (running under bioperl 1.5.2)
>
> The quality values for this accession are in genbank as these URLs  
> demonstrate:
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual
>
> What is the best way to pull down these qual values? They aren't  
> present in "GenBank(Full)" format. They are present in an ASN.1  
> format.
>
> Advice would be appreciated.
>
> -- 
> Phillip
> Purdue Genomics Core Facility
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From MEC at stowers-institute.org  Fri Jan 11 14:14:10 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 11 Jan 2008 13:14:10 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
Message-ID: 

Indeed eutil is capable of this

The following use of my ncbi_eutil (attached) script yeilds what you
want:

ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

It depends on the version of NCBI_PowerScripting.pm , such as is
included in 

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Friday, January 11, 2008 11:10 AM
> To: Phillip San Miguel
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> I don't think this is possible with the current setup for 
> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate whether it is possible to retrieve this data via 
> NCBI's eutils; if so we can try adding it in.  If you want 
> you can submit this as an enhancement request via bugzilla 
> for tracking:
> 
> http://bugzilla.open-bio.org/
> 
> chris
> 
> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> 
> > No problem getting sequence from genbank via a myriad of methods.  
> > But as the volume of non-finished sequence in genbank increases the 
> > importance of also obtaining quality values for a given sequence 
> > increases. Some records include quality values.
> >
> > I typically use bp_fetch.pl to grab a sequence from genbank:
> >
> > bp_fetch.pl -fmt fasta net::genbank:AC207960
> >
> > sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> > designed to pull down quals evidently:
> >
> > bp_fetch.pl -fmt qual net::genbank:AC207960
> >
> > gives:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> > object to write_seq() as a parameter named "source"
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> > 5.8.8/Bio/Root/Root.pm:359
> > STACK: Bio::SeqIO::qual::write_seq 
> /usr/local/perl_5.8/lib/site_perl/
> > 5.8.8/Bio/SeqIO/qual.pm:205
> > STACK: /usr/local/perl/bin/bp_fetch.pl:313
> > -----------------------------------------------------------
> >
> > (running under bioperl 1.5.2)
> >
> > The quality values for this accession are in genbank as these URLs
> > demonstrate:
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
> > 4937460&dopt=fasta
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
> > 4937460&dopt=qual
> >
> > What is the best way to pull down these qual values? They aren't 
> > present in "GenBank(Full)" format. They are present in an ASN.1 
> > format.
> >
> > Advice would be appreciated.
> >
> > --
> > Phillip
> > Purdue Genomics Core Facility
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From pmiguel at purdue.edu  Fri Jan 11 14:33:13 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 14:33:13 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: 
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
Message-ID: <4787C479.8070600@purdue.edu>

Hi Malcolm,
    Looks like your email was (inadvertantly?) redacted in some way. (No 
attachment and last sentence truncated.) Would it be possible to get a 
complete version so I can be sure I'm following you?
Thanks,
Phillip

Cook, Malcolm wrote:
> Indeed eutil is capable of this
>
> The following use of my ncbi_eutil (attached) script yeilds what you
> want:
>
> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
> AC207960.qual
>
> It depends on the version of NCBI_PowerScripting.pm , such as is
> included in 
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Chris Fields
>> Sent: Friday, January 11, 2008 11:10 AM
>> To: Phillip San Miguel
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>> files from Genbank?
>>
>> I don't think this is possible with the current setup for 
>> Bio::DB::GenBank (which the script uses).  We'll have to 
>> investigate whether it is possible to retrieve this data via 
>> NCBI's eutils; if so we can try adding it in.  If you want 
>> you can submit this as an enhancement request via bugzilla 
>> for tracking:
>>
>> http://bugzilla.open-bio.org/
>>
>> chris
>>
>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>>
>>     
>>> No problem getting sequence from genbank via a myriad of methods.  
>>> But as the volume of non-finished sequence in genbank increases the 
>>> importance of also obtaining quality values for a given sequence 
>>> increases. Some records include quality values.
>>>
>>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>>
>>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>>
>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>>> designed to pull down quals evidently:
>>>
>>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>>
>>> gives:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>>> object to write_seq() as a parameter named "source"
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
>>> 5.8.8/Bio/Root/Root.pm:359
>>> STACK: Bio::SeqIO::qual::write_seq 
>>>       
>> /usr/local/perl_5.8/lib/site_perl/
>>     
>>> 5.8.8/Bio/SeqIO/qual.pm:205
>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>>> -----------------------------------------------------------
>>>
>>> (running under bioperl 1.5.2)
>>>
>>> The quality values for this accession are in genbank as these URLs
>>> demonstrate:
>>>
>>>
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>>     
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
>>     
>>> 4937460&dopt=fasta
>>>
>>>
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
>>     
>>> 4937460&dopt=qual
>>>
>>> What is the best way to pull down these qual values? They aren't 
>>> present in "GenBank(Full)" format. They are present in an ASN.1 
>>> format.
>>>
>>> Advice would be appreciated.
>>>
>>> --
>>> Phillip
>>> Purdue Genomics Core Facility
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>   


From pmiguel at purdue.edu  Fri Jan 11 14:37:24 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 14:37:24 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
Message-ID: <4787C574.8020003@purdue.edu>

Hi Chris,
Thanks. I have submitted this as an enhancement request to bugzilla.
Phillip

Chris Fields wrote:
> I don't think this is possible with the current setup for 
> Bio::DB::GenBank (which the script uses).  We'll have to investigate 
> whether it is possible to retrieve this data via NCBI's eutils; if so 
> we can try adding it in.  If you want you can submit this as an 
> enhancement request via bugzilla for tracking:
>
> http://bugzilla.open-bio.org/
>
> chris
>
> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>
>> No problem getting sequence from genbank via a myriad of methods. But 
>> as the volume of non-finished sequence in genbank increases the 
>> importance of also obtaining quality values for a given sequence 
>> increases. Some records include quality values.
>>
>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>
>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>
>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>> designed to pull down quals evidently:
>>
>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>
>> gives:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>> object to write_seq() as a parameter named "source"
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::SeqIO::qual::write_seq 
>> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205
>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>> -----------------------------------------------------------
>>
>> (running under bioperl 1.5.2)
>>
>> The quality values for this accession are in genbank as these URLs 
>> demonstrate:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta 
>>
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual 
>>
>>
>> What is the best way to pull down these qual values? They aren't 
>> present in "GenBank(Full)" format. They are present in an ASN.1 format.
>>
>> Advice would be appreciated.
>>
>> -- 
>> Phillip
>> Purdue Genomics Core Facility
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From pmiguel at purdue.edu  Fri Jan 11 15:46:59 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 15:46:59 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: 
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
	<4787C479.8070600@purdue.edu>
	
Message-ID: <4787D5C3.1030308@purdue.edu>

Hi Malcolm,
Yes that works great!
Well, one caveat:
    If you download both the fasta and the qual files:
ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=fasta > 
AC207960.fasta
ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > 
AC207960.fasta.qual

The "primary IDs" don't match. The fasta comes out:
 >gi|154937460|gb|AC207960.1|

and the qual comes out:
 >AC207960.1

which seems to choke most programs that use seq and qual (eg 
cross_match) because they want the primary IDs of the seq and qual files 
to match.

Otherwise fine, though.
Thanks,
Phillip

Cook, Malcolm wrote:
> Phillip:
>
> Of course - mea culpa - here's the full monty....
>
> Indeed NCBI's eutils can do this:
>
>   
>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
>>     
> AC207960.qual
>
> which uses my script (attached) to wrap NCBI's eutils.
>
> It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
> by NCBI in their "Jul 24-27, 2007" course found at
> http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html
>
> I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
> very beginning so that trace messages are not printed on STDOUT, such as
> this echoed header:
> 	 Retrieving 1 records from nucleotide...
> ... and footer:
> 	Received records 1 - 1.
> 	Wrote data to -.
>
> (otherwise they are interspersed with downloaded qual files)
>
> It also depends on recent version of GetOpt::Long.
>
> Hope it helps.
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
>   
>> -----Original Message-----
>> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
>> Sent: Friday, January 11, 2008 1:33 PM
>> To: Cook, Malcolm
>> Cc: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>> files from Genbank?
>>
>> Hi Malcolm,
>>     Looks like your email was (inadvertantly?) redacted in 
>> some way. (No attachment and last sentence truncated.) Would 
>> it be possible to get a complete version so I can be sure I'm 
>> following you?
>> Thanks,
>> Phillip
>>
>> Cook, Malcolm wrote:
>>     
>>> Indeed eutil is capable of this
>>>
>>> The following use of my ncbi_eutil (attached) script yeilds what you
>>> want:
>>>
>>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
>>>       
>> rettype=qual > 
>>     
>>> AC207960.qual
>>>
>>> It depends on the version of NCBI_PowerScripting.pm , such as is 
>>> included in
>>>
>>> Malcolm Cook
>>> Database Applications Manager - Bioinformatics Stowers 
>>>       
>> Institute for 
>>     
>>> Medical Research - Kansas City, Missouri
>>>   
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
>>>> Fields
>>>> Sent: Friday, January 11, 2008 11:10 AM
>>>> To: Phillip San Miguel
>>>> Cc: bioperl-l
>>>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>>>>         
>> files from 
>>     
>>>> Genbank?
>>>>
>>>> I don't think this is possible with the current setup for 
>>>> Bio::DB::GenBank (which the script uses).  We'll have to 
>>>>         
>> investigate 
>>     
>>>> whether it is possible to retrieve this data via NCBI's 
>>>>         
>> eutils; if so 
>>     
>>>> we can try adding it in.  If you want you can submit this as an 
>>>> enhancement request via bugzilla for tracking:
>>>>
>>>> http://bugzilla.open-bio.org/
>>>>
>>>> chris
>>>>
>>>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>>>>
>>>>     
>>>>         
>>>>> No problem getting sequence from genbank via a myriad of 
>>>>>           
>> methods.  
>>     
>>>>> But as the volume of non-finished sequence in genbank 
>>>>>           
>> increases the 
>>     
>>>>> importance of also obtaining quality values for a given sequence 
>>>>> increases. Some records include quality values.
>>>>>
>>>>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>>>>
>>>>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>>>>
>>>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>>>>> designed to pull down quals evidently:
>>>>>
>>>>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>>>>
>>>>> gives:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>>>>> object to write_seq() as a parameter named "source"
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
>>>>> 5.8.8/Bio/Root/Root.pm:359
>>>>> STACK: Bio::SeqIO::qual::write_seq
>>>>>       
>>>>>           
>>>> /usr/local/perl_5.8/lib/site_perl/
>>>>     
>>>>         
>>>>> 5.8.8/Bio/SeqIO/qual.pm:205
>>>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>>>>> -----------------------------------------------------------
>>>>>
>>>>> (running under bioperl 1.5.2)
>>>>>
>>>>> The quality values for this accession are in genbank as these URLs
>>>>> demonstrate:
>>>>>
>>>>>
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
>>     
>>>> 0
>>>>     
>>>>         
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
>>     
>>>> 5
>>>>     
>>>>         
>>>>> 4937460&dopt=fasta
>>>>>
>>>>>
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
>>     
>>>> 5
>>>>     
>>>>         
>>>>> 4937460&dopt=qual
>>>>>
>>>>> What is the best way to pull down these qual values? They aren't 
>>>>> present in "GenBank(Full)" format. They are present in an ASN.1 
>>>>> format.
>>>>>
>>>>> Advice would be appreciated.
>>>>>
>>>>> --
>>>>> Phillip
>>>>> Purdue Genomics Core Facility
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>     
>>>>         
>>>   
>>>       
>>
>>     


From MEC at stowers-institute.org  Fri Jan 11 14:40:14 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 11 Jan 2008 13:40:14 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <4787C479.8070600@purdue.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
	<4787C479.8070600@purdue.edu>
Message-ID: 

Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
	 Retrieving 1 records from nucleotide...
... and footer:
	Received records 1 - 1.
	Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in 
> some way. (No attachment and last sentence truncated.) Would 
> it be possible to get a complete version so I can be sure I'm 
> following you?
> Thanks,
> Phillip
> 
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
> rettype=qual > 
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is 
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers 
> Institute for 
> > Medical Research - Kansas City, Missouri
> >   
> >
> >   
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from 
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for 
> >> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate 
> >> whether it is possible to retrieve this data via NCBI's 
> eutils; if so 
> >> we can try adding it in.  If you want you can submit this as an 
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>     
> >>> No problem getting sequence from genbank via a myriad of 
> methods.  
> >>> But as the volume of non-finished sequence in genbank 
> increases the 
> >>> importance of also obtaining quality values for a given sequence 
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>       
> >> /usr/local/perl_5.8/lib/site_perl/
> >>     
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>     
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't 
> >>> present in "GenBank(Full)" format. They are present in an ASN.1 
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>       
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >   
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ncbi_eutil
Type: application/octet-stream
Size: 1854 bytes
Desc: ncbi_eutil
URL: 

From cain.cshl at gmail.com  Mon Jan 14 13:46:39 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 14 Jan 2008 13:46:39 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
Message-ID: <1200336399.6056.12.camel@frissell>

Hi all,

Last month, I got a bug report on the GBrowse bug tracker:

  http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291

about a problem with dumping invalid GenBank files.  GBrowse uses
Bio::SeqIO::genbank to create these dumps.  

In his bug report, he claims that feature names over 15 characters long
are invalid, and provided and example GenBank file where a feature is
named 'BAC_cloned_genomic_insert', which is over 15 characters.  What I
want to know is this: is this truly a restriction on the GenBank format,
or is it a software problem with some other package?  Do we need to fix
genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
believe this is really a bug.

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From lstein at cshl.edu  Mon Jan 14 13:53:15 2008
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Jan 2008 13:53:15 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <1200336399.6056.12.camel@frissell>
References: <1200336399.6056.12.camel@frissell>
Message-ID: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>

Hi Scott,

He is correct about the limitation, but we deliberately relaxed it because
we were running into situations where we lost information during
roundtripping from other formats into genbank.

Lincoln

On Jan 14, 2008 1:46 PM, Scott Cain  wrote:

> Hi all,
>
> Last month, I got a bug report on the GBrowse bug tracker:
>
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
>
> about a problem with dumping invalid GenBank files.  GBrowse uses
> Bio::SeqIO::genbank to create these dumps.
>
> In his bug report, he claims that feature names over 15 characters long
> are invalid, and provided and example GenBank file where a feature is
> named 'BAC_cloned_genomic_insert', which is over 15 characters.  What I
> want to know is this: is this truly a restriction on the GenBank format,
> or is it a software problem with some other package?  Do we need to fix
> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> believe this is really a bug.
>
> Thanks,
> Scott
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From cjfields at uiuc.edu  Mon Jan 14 14:35:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 14 Jan 2008 13:35:46 -0600
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
Message-ID: 

It looks like the keys in the feature table run into the location  
string w/o intervening space, which would probably cause havoc with  
roundtripping from this output.  A few examples:

      BAC_cloned_genomic_insert<1..>1000
      combined_genscanjoin(<1..347,400..498,794..>1000)
      splign_na_dbEST_ncbi<1..>1000

I would think at least a space in between the location and the key  
would be required for round-tripping out of genbank format.

chris

On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:

> Hi Scott,
>
> He is correct about the limitation, but we deliberately relaxed it  
> because
> we were running into situations where we lost information during
> roundtripping from other formats into genbank.
>
> Lincoln
>
> On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
>
>> Hi all,
>>
>> Last month, I got a bug report on the GBrowse bug tracker:
>>
>>
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
>>
>> about a problem with dumping invalid GenBank files.  GBrowse uses
>> Bio::SeqIO::genbank to create these dumps.
>>
>> In his bug report, he claims that feature names over 15 characters  
>> long
>> are invalid, and provided and example GenBank file where a feature is
>> named 'BAC_cloned_genomic_insert', which is over 15 characters.   
>> What I
>> want to know is this: is this truly a restriction on the GenBank  
>> format,
>> or is it a software problem with some other package?  Do we need to  
>> fix
>> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
>> believe this is really a bug.
>>
>> Thanks,
>> Scott
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From lstein at cshl.edu  Mon Jan 14 14:46:20 2008
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Jan 2008 14:46:20 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: 
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
	
Message-ID: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>

That's a new bug. The version I worked on inserted a space after the name.

Lincoln

On Jan 14, 2008 2:35 PM, Chris Fields  wrote:

> It looks like the keys in the feature table run into the location
> string w/o intervening space, which would probably cause havoc with
> roundtripping from this output.  A few examples:
>
>      BAC_cloned_genomic_insert<1..>1000
>      combined_genscanjoin(<1..347,400..498,794..>1000)
>      splign_na_dbEST_ncbi<1..>1000
>
> I would think at least a space in between the location and the key
> would be required for round-tripping out of genbank format.
>
> chris
>
> On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:
>
> > Hi Scott,
> >
> > He is correct about the limitation, but we deliberately relaxed it
> > because
> > we were running into situations where we lost information during
> > roundtripping from other formats into genbank.
> >
> > Lincoln
> >
> > On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
> >
> >> Hi all,
> >>
> >> Last month, I got a bug report on the GBrowse bug tracker:
> >>
> >>
> >>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
> >>
> >> about a problem with dumping invalid GenBank files.  GBrowse uses
> >> Bio::SeqIO::genbank to create these dumps.
> >>
> >> In his bug report, he claims that feature names over 15 characters
> >> long
> >> are invalid, and provided and example GenBank file where a feature is
> >> named 'BAC_cloned_genomic_insert', which is over 15 characters.
> >> What I
> >> want to know is this: is this truly a restriction on the GenBank
> >> format,
> >> or is it a software problem with some other package?  Do we need to
> >> fix
> >> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> >> believe this is really a bug.
> >>
> >> Thanks,
> >> Scott
> >>
> >> --
> >>
> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.
> cain.cshl at gmail.com
> >> GMOD Coordinator (http://www.gmod.org/)
> >> 216-392-3087
> >> Cold Spring Harbor Laboratory
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From diogoat at gmail.com  Tue Jan 15 08:40:10 2008
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 Jan 2008 11:40:10 -0200
Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS
Message-ID: <638512560801150540m108db442r227d82c709a954@mail.gmail.com>

Hello,

I want to extract protein_id and transcript from a CDS tag, from genome in
genbak format but i have one problem, when the sequence in the file don't
have the protein_id or the transcript the script gives me this error:

------------- EXCEPTION  -------------
MSG: asking for tag value that does not exist protein_id
STACK Bio::SeqFeature::Generic::get_tag_values
/usr/share/perl5/Bio/SeqFeature/Generic.pm:504
STACK toplevel parser_cds.pl:25
--------------------------------------

Bellow I past the script

##############################################
use Bio::SeqIO;
use warnings;

my $infile = $ARGV[0];
my $outfile = "$infile.out";
open (OUT, ">>$outfile");

          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                      '-format' => 'Genbank');

         while (my $inseq = $seq_in->next_seq) {

        for my $feat_object ($inseq->get_SeqFeatures){
            if ($feat_object->primary_tag eq "CDS"){
                print OUT $feat_object->get_tag_values('protein_id')," ";
            print OUT $feat_object->get_tag_values('translation'),"\n";
        }
    }
}
###############################################

Somebody can helps me?

Thank

Diogo Tschoeke

From Marc.Logghe at ablynx.com  Tue Jan 15 09:44:54 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Tue, 15 Jan 2008 15:44:54 +0100
Subject: [Bioperl-l] Problem to extract protein_id and transcript from
	CDS
In-Reply-To: <638512560801150540m108db442r227d82c709a954@mail.gmail.com>
Message-ID: <03C512635899144083CADB0EE2220189013E2BEC@alpaca.lan.ablynx.com>

Hi,
Try testing for existence first using the has_tag() method.
It is provided by Bio::AnnotatableI.

print OUT $feat_object->get_tag_values('protein_id')," " if
($feat->has_tag('protein_id'));


HTH,
Marc

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Diogo Tschoeke
> Sent: dinsdag 15 januari 2008 14:40
> To: Bioperl-list
> Subject: [Bioperl-l] Problem to extract protein_id and transcript from
CDS
> 
> Hello,
> 
> I want to extract protein_id and transcript from a CDS tag, from
genome in
> genbak format but i have one problem, when the sequence in the file
don't
> have the protein_id or the transcript the script gives me this error:
> 
> ------------- EXCEPTION  -------------
> MSG: asking for tag value that does not exist protein_id
> STACK Bio::SeqFeature::Generic::get_tag_values
> /usr/share/perl5/Bio/SeqFeature/Generic.pm:504
> STACK toplevel parser_cds.pl:25
> --------------------------------------
> 
> Bellow I past the script
> 
> ##############################################
> use Bio::SeqIO;
> use warnings;
> 
> my $infile = $ARGV[0];
> my $outfile = "$infile.out";
> open (OUT, ">>$outfile");
> 
>           my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => 'Genbank');
> 
>          while (my $inseq = $seq_in->next_seq) {
> 
>         for my $feat_object ($inseq->get_SeqFeatures){
>             if ($feat_object->primary_tag eq "CDS"){
>                 print OUT $feat_object->get_tag_values('protein_id'),"
";
>             print OUT
$feat_object->get_tag_values('translation'),"\n";
>         }
>     }
> }
> ###############################################
> 
> Somebody can helps me?
> 
> Thank
> 
> Diogo Tschoeke
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cuiw at ncbi.nlm.nih.gov  Tue Jan 15 11:50:53 2008
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Tue, 15 Jan 2008 11:50:53 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
References: <478797CE.9050202@purdue.edu><14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu><4787C479.8070600@purdue.edu>
	
Message-ID: <18C407FD4FFB424292D769FBD68C1987048E95CC@NIHCESMLBX8.nih.gov>

There is an alternative way if you can download and compile NCBI C++ Toolkit (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/2007/Aug_27_2007/) . Simply call the binary like:
 
id1_fetch -fmt quality -gi 13508865
 
Wenwu Cui

________________________________

From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Fri 1/11/2008 2:40 PM
To: Phillip San Miguel
Cc: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] Recommended way to download qual files from Genbank?



Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
         Retrieving 1 records from nucleotide...
... and footer:
        Received records 1 - 1.
        Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu]
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from Genbank?
>
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in
> some way. (No attachment and last sentence truncated.) Would
> it be possible to get a complete version so I can be sure I'm
> following you?
> Thanks,
> Phillip
>
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch
> rettype=qual >
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers
> Institute for
> > Medical Research - Kansas City, Missouri
> >  
> >
> >  
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for
> >> Bio::DB::GenBank (which the script uses).  We'll have to
> investigate
> >> whether it is possible to retrieve this data via NCBI's
> eutils; if so
> >> we can try adding it in.  If you want you can submit this as an
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>    
> >>> No problem getting sequence from genbank via a myriad of
> methods. 
> >>> But as the volume of non-finished sequence in genbank
> increases the
> >>> importance of also obtaining quality values for a given sequence
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>      
> >> /usr/local/perl_5.8/lib/site_perl/
> >>    
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>    
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't
> >>> present in "GenBank(Full)" format. They are present in an ASN.1
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>      
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>    
> >
> >  
>
>
>




From singhal at berkeley.edu  Tue Jan 15 17:50:12 2008
From: singhal at berkeley.edu (Sonal Singhal)
Date: Tue, 15 Jan 2008 14:50:12 -0800
Subject: [Bioperl-l] redundant sequences
Message-ID: 

Hi all,

I am mining a few genomes to find all the genes in a gene family, and
of course multiple BLAST searches of different paralogs are returning
a lot of redundant hits.   I have searched the BioPerl documentation,
and I cannot find an easy way to cluster and then purge redundant
sequences.  Any ideas?

Cheers,
sonal

From MEC at stowers-institute.org  Tue Jan 15 18:21:00 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 15 Jan 2008 17:21:00 -0600
Subject: [Bioperl-l] redundant sequences
In-Reply-To: 
References: 
Message-ID: 

Cd-hit: http://bioinformatics.burnham.org/cd-hi/

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Sonal Singhal
> Sent: Tuesday, January 15, 2008 4:50 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] redundant sequences
> 
> Hi all,
> 
> I am mining a few genomes to find all the genes in a gene 
> family, and of course multiple BLAST searches of different 
> paralogs are returning
> a lot of redundant hits.   I have searched the BioPerl documentation,
> and I cannot find an easy way to cluster and then purge 
> redundant sequences.  Any ideas?
> 
> Cheers,
> sonal
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cain.cshl at gmail.com  Tue Jan 15 21:24:50 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 15 Jan 2008 21:24:50 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
	
	<6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>
Message-ID: <1200450290.7276.3.camel@frissell>

Hi Chris and Lincoln,

I've attached my suggested patch.  So, can I use svn to check it in?  It
only adds a space after the feature type name; I suspect that will be
enough to fix the file format for most uses.

Scott

On Mon, 2008-01-14 at 14:46 -0500, Lincoln Stein wrote:
> That's a new bug. The version I worked on inserted a space after the name.
> 
> Lincoln
> 
> On Jan 14, 2008 2:35 PM, Chris Fields  wrote:
> 
> > It looks like the keys in the feature table run into the location
> > string w/o intervening space, which would probably cause havoc with
> > roundtripping from this output.  A few examples:
> >
> >      BAC_cloned_genomic_insert<1..>1000
> >      combined_genscanjoin(<1..347,400..498,794..>1000)
> >      splign_na_dbEST_ncbi<1..>1000
> >
> > I would think at least a space in between the location and the key
> > would be required for round-tripping out of genbank format.
> >
> > chris
> >
> > On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:
> >
> > > Hi Scott,
> > >
> > > He is correct about the limitation, but we deliberately relaxed it
> > > because
> > > we were running into situations where we lost information during
> > > roundtripping from other formats into genbank.
> > >
> > > Lincoln
> > >
> > > On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
> > >
> > >> Hi all,
> > >>
> > >> Last month, I got a bug report on the GBrowse bug tracker:
> > >>
> > >>
> > >>
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
> > >>
> > >> about a problem with dumping invalid GenBank files.  GBrowse uses
> > >> Bio::SeqIO::genbank to create these dumps.
> > >>
> > >> In his bug report, he claims that feature names over 15 characters
> > >> long
> > >> are invalid, and provided and example GenBank file where a feature is
> > >> named 'BAC_cloned_genomic_insert', which is over 15 characters.
> > >> What I
> > >> want to know is this: is this truly a restriction on the GenBank
> > >> format,
> > >> or is it a software problem with some other package?  Do we need to
> > >> fix
> > >> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> > >> believe this is really a bug.
> > >>
> > >> Thanks,
> > >> Scott
> > >>
> > >> --
> > >>
> > ------------------------------------------------------------------------
> > >> Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > >> GMOD Coordinator (http://www.gmod.org/)
> > >> 216-392-3087
> > >> Cold Spring Harbor Laboratory
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >
> > >
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> >
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: genbank.pm.patch
Type: text/x-patch
Size: 1110 bytes
Desc: not available
URL: 

From cjfields at uiuc.edu  Tue Jan 15 22:15:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 15 Jan 2008 21:15:51 -0600
Subject: [Bioperl-l] Subversion migration complete
Message-ID: 

On behalf of the BioPerl core developers, I am proud to announce that  
the BioPerl SVN migration has been completed.  We would like to thank  
everyone who helped, in particular George Hartzell and Chris  
Dagdigian, both of who played instrumental roles in the CVS->SVN  
conversion and anonymous SVN setup for BioPerl.

Anonymous SVN checkouts for bioperl-live are now possible using:
svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live

Developers can obtain a checkout from:
svn co svn+ssh://USER at dev.open-bio.org/home/svn-repositories/bioperl/ 
bioperl-live/trunk bioperl-live

Browsable repository:
http://code.open-bio.org/svnweb/index.cgi/bioperl/

Basic instructions:
http://www.bioperl.org/wiki/Using_Subversion

We are still in the midst of implementing a few extra details related  
to SVN migration; the status on these can be viewed here:
http://www.bioperl.org/wiki/CVS_to_SVN_Migration

Enjoy!

chris


From bug-bioperl at rt.cpan.org  Wed Jan 16 22:35:30 2008
From: bug-bioperl at rt.cpan.org (Chris Fields via RT)
Date: Wed, 16 Jan 2008 22:35:30 -0500
Subject: [Bioperl-l] [rt.cpan.org #29533] Bio::SeqIO::interpro depends on
	XML::DOM::XPath
In-Reply-To: 
References:   
	
Message-ID: 


       Queue: bioperl
 Ticket 

On Fri Sep 21 10:28:52 2007, support at helpdesk.open-bio.org wrote:
> Hi Mike,
> 
> The proper place to submit this fix is the bioperl-l at lists.open-bio.org
> mailing list or the OBF Bugzilla queue at:
> http://bugzilla.open-bio.org/, this RT system is mainly for sysadmin
> activities rather than for tracking code changes. Would you be so kind
> to re-send your request to one of the places above? Thanks for the heads
> up! :)
> 
> Regards,
> Mauricio.

This has been fixed.  I'll get the CPAN maintainer to close this out.

From vipingjo at gmail.com  Thu Jan 17 03:48:36 2008
From: vipingjo at gmail.com (viping)
Date: Thu, 17 Jan 2008 16:48:36 +0800
Subject: [Bioperl-l] Can't locate object method "is_compatible" via package
	"Bio::Tree::Tree"
Message-ID: <200801171648332965577@gmail.com>

Hi Everyone??

I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + Windows XP SP2.
When running example codes(attched below as t.pl) within Bio\Tree\Compatible.pm , I got this error:

Can't locate object method "is_compatible" via package "Bio::Tree::Tree"

I replaced "$t1->is_compatible($t2)" with "is_compatible Bio::Tree::Compatible ($t1,$t2)", the error changed:
Can't locate object method "get_nodes" via package "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252,  line 1.

I modified Compatible.pm, changed code for "get_nodes" like this "get_nodes Bio::Tree::Tree($self);", new error arised :
Can't use string ("Bio::Tree::Tree") as a HASH ref while "strict refs" in use at i:/Perl/site/lib/Bio\Tree\Tree.pm line 198,  line 1.

I gived up. Any help will be deeply appreciated.




# this is the example script in Bio::Tree::Compatible??t.pl
  use Bio::Tree::Compatible;
  use Bio::TreeIO;
  my $input = new Bio::TreeIO('-format' => 'newick',
                              '-file'   => 'input.tre');
  my $t1 = $input->next_tree;
  my $t2 = $input->next_tree;

  my ($incompat, $ilabels, $inodes) = $t1->is_compatible($t2);
  if ($incompat) {
    my %cluster1 = %{ $t1->cluster_representation };
    my %cluster2 = %{ $t2->cluster_representation };
    print "incompatible trees\n";
    if (scalar(@$ilabels)) {
      foreach my $label (@$ilabels) {
        my $node1 = $t1->find_node(-id => $label);
        my $node2 = $t2->find_node(-id => $label);
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "label $label";
        print " cluster"; map { print " ",$_ } @c1;
        print " cluster"; map { print " ",$_ } @c2; print "\n";
      }
    }
    if (scalar(@$inodes)) {
      while (@$inodes) {
        my $node1 = shift @$inodes;
        my $node2 = shift @$inodes;
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "cluster"; map { print " ",$_ } @c1;
        print " properly intersects cluster";
        map { print " ",$_ } @c2; print "\n";
      }
    }
  } else {
    print "compatible trees\n";
  }

__END__;

# this is the file 'input.tre':
(((A,B)C,D),(E,F,G));
((A,B)H,E,(J,(K)G)I);

# this is the full messages I got running like this: "perl.exe -w t.pl"
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96.
Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145.
Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162.
Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196.
Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211.
Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257.
Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278.
Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314.
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100.
Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152.
Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190.
Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252.
Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300.
Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334.
Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375.
Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399.
Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449.
Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491.
Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505.
Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526.
Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552.
Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577.
Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597.
Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617.
Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637.
Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653.
Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669.
Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685.
Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690.
Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717.
Can't locate object method "is_compatible" via package "Bio::Tree::Tree" at Z:\bp\t.pl line 8,  line 2.



From bix at sendu.me.uk  Thu Jan 17 06:18:56 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 17 Jan 2008 11:18:56 +0000
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
 package "Bio::Tree::Tree"
In-Reply-To: <200801171648332965577@gmail.com>
References: <200801171648332965577@gmail.com>
Message-ID: <478F39A0.2030508@sendu.me.uk>

viping wrote:
> Hi Everyone??
> 
> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + 
> Windows XP SP2. When running example codes(attched below as t.pl) 
> within Bio\Tree\Compatible.pm , I got this error:
> 
> Can't locate object method "is_compatible" via package 
> "Bio::Tree::Tree"
> 
> I replaced "$t1->is_compatible($t2)" with "is_compatible 
> Bio::Tree::Compatible ($t1,$t2)",

Yup, you had the right idea; unfortunately the synopsis code for
Bio::Tree::Compatible is wrong.
I've now fixed it in svn.


> the error changed: Can't locate object method "get_nodes" via package
>  "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm 
> line 252,  line 1.

I didn't get quite that error; instead I had an issue with TreeIO: for
whatever reason it is only returning one tree from your input file (ie.
$t2 is undefined).

I therefore got "Can't call method "get_nodes" on an undefined value [...]"

Can someone look into/confirm that?


From bix at sendu.me.uk  Thu Jan 17 06:35:57 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 17 Jan 2008 11:35:57 +0000
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
 package "Bio::Tree::Tree"
In-Reply-To: <478F39A0.2030508@sendu.me.uk>
References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk>
Message-ID: <478F3D9D.6050306@sendu.me.uk>

Sendu Bala wrote:
>> the error changed: Can't locate object method "get_nodes" via
>> package "Bio::Tree::Compatible" at
>> i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252,  line 1.
> 
> I didn't get quite that error; instead I had an issue with TreeIO:
> for whatever reason it is only returning one tree from your input
> file (ie. $t2 is undefined).
> 
> I therefore got "Can't call method "get_nodes" on an undefined value
> [...]"
> 
> Can someone look into/confirm that?

... Yeah, I think I'm losing my mind. The code below is 'ok' using the
commented out -fh input for TreeIO, but is 'not ok' using the -file
input, where the specified file contains the exact same data as
__DATA__. Huh?


#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::Tree::Compatible;
use Bio::TreeIO;
my $input = new Bio::TreeIO('-format' => 'newick',
                             #-fh      => \*DATA,
                             -file    => 'input.tre'
                             );
my $t1 = $input->next_tree;
my $t2 = $input->next_tree;

if ($t2) {
    print "ok\n";
}
else {
    print "not ok\n";
}

__DATA__
(((A,B)C,D),(E,F,G));
((A,B)H,E,(J,(K)G)I);



From vipingjo at gmail.com  Thu Jan 17 08:23:14 2008
From: vipingjo at gmail.com (viping)
Date: Thu, 17 Jan 2008 21:23:14 +0800
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
	package"Bio::Tree::Tree"
References: <200801171648332965577@gmail.com>, <478F39A0.2030508@sendu.me.uk>
Message-ID: <200801172123112184046@gmail.com>

I got latest  code modified by Sendu Bala vi SVN. It works well while "input.tre" and "t.pl" are in the same directory. Thank you, Sendu Bala.  

This is output:
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96.
Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145.
Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162.
Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196.
Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211.
Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257.
Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278.
Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314.
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100.
Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152.
Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190.
Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252.
Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300.
Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334.
Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375.
Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399.
Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449.
Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491.
Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505.
Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526.
Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552.
Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577.
Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597.
Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617.
Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637.
Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653.
Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669.
Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685.
Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690.
Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717.
incompatible trees
label G cluster G cluster G K
cluster A B C properly intersects cluster A B H
cluster A B C properly intersects cluster A B E G H I J K
cluster A B C D properly intersects cluster A B H
cluster A B C D properly intersects cluster A B E G H I J K
cluster E F G properly intersects cluster G K
cluster E F G properly intersects cluster G I J K
cluster E F G properly intersects cluster A B E G H I J K
cluster A B C D E F G properly intersects cluster A B H
cluster A B C D E F G properly intersects cluster G K
cluster A B C D E F G properly intersects cluster G I J K
cluster A B C D E F G properly intersects cluster A B E G H I J K

#this is latest code:
  use Bio::Tree::Compatible;
  use Bio::TreeIO;
  my $input = Bio::TreeIO->new('-format' => 'newick',
                               '-file'   => 'input.tre');
  my $t1 = $input->next_tree;
  my $t2 = $input->next_tree;

  my ($incompat, $ilabels, $inodes) = Bio::Tree::Compatible::is_compatible($t1,$t2);
  if ($incompat) {
    my %cluster1 = %{ Bio::Tree::Compatible::cluster_representation($t1) };
    my %cluster2 = %{ Bio::Tree::Compatible::cluster_representation($t2) };
    print "incompatible trees\n";
    if (scalar(@$ilabels)) {
      foreach my $label (@$ilabels) {
        my $node1 = $t1->find_node(-id => $label);
        my $node2 = $t2->find_node(-id => $label);
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "label $label";
        print " cluster"; map { print " ",$_ } @c1;
        print " cluster"; map { print " ",$_ } @c2; print "\n";
      }
    }
    if (scalar(@$inodes)) {
      while (@$inodes) {
        my $node1 = shift @$inodes;
        my $node2 = shift @$inodes;
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "cluster"; map { print " ",$_ } @c1;
        print " properly intersects cluster";
        map { print " ",$_ } @c2; print "\n";
      }
    }
  } else {
    print "compatible trees\n";
  }


------------------				 
viping
2008-01-17

-------------------------------------------------------------
From: Sendu Bala
Date: 2008-01-17 19:19:30
To: viping
Cc: bioperl-l
Subject: Re: [Bioperl-l] Can't locate object method "is_compatible" via package"Bio::Tree::Tree"

viping wrote:
> Hi Everyone??
> 
> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + 
> Windows XP SP2. When running example codes(attched below as t.pl) 
> within Bio\Tree\Compatible.pm , I got this error:
> 
> Can't locate object method "is_compatible" via package 
> "Bio::Tree::Tree"
> 
> I replaced "$t1->is_compatible($t2)" with "is_compatible 
> Bio::Tree::Compatible ($t1,$t2)",

Yup, you had the right idea; unfortunately the synopsis code for
Bio::Tree::Compatible is wrong.
I've now fixed it in svn.


> the error changed: Can't locate object method "get_nodes" via package
>  "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm 
> line 252,  line 1.

I didn't get quite that error; instead I had an issue with TreeIO: for
whatever reason it is only returning one tree from your input file (ie.
$t2 is undefined).

I therefore got "Can't call method "get_nodes" on an undefined value [...]"

Can someone look into/confirm that?


From cjfields at uiuc.edu  Thu Jan 17 08:25:41 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 17 Jan 2008 07:25:41 -0600
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
	package "Bio::Tree::Tree"
In-Reply-To: <478F39A0.2030508@sendu.me.uk>
References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk>
Message-ID: <7BF3650B-F1D4-4F21-9C59-3AC13CA35945@uiuc.edu>

Probably need to file this as a bug.  There is a similar issue with  
Bio::TreeIO::nexus, but it probably isn't related unless it is using  
the same parsing logic:

http://bugzilla.open-bio.org/show_bug.cgi?id=2356

chris

On Jan 17, 2008, at 5:18 AM, Sendu Bala wrote:

> viping wrote:
>> Hi Everyone?
>>
>> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 +
>> Windows XP SP2. When running example codes(attched below as t.pl)
>> within Bio\Tree\Compatible.pm , I got this error:
>>
>> Can't locate object method "is_compatible" via package
>> "Bio::Tree::Tree"
>>
>> I replaced "$t1->is_compatible($t2)" with "is_compatible
>> Bio::Tree::Compatible ($t1,$t2)",
>
> Yup, you had the right idea; unfortunately the synopsis code for
> Bio::Tree::Compatible is wrong.
> I've now fixed it in svn.
>
>
>> the error changed: Can't locate object method "get_nodes" via package
>> "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm
>> line 252,  line 1.
>
> I didn't get quite that error; instead I had an issue with TreeIO: for
> whatever reason it is only returning one tree from your input file  
> (ie.
> $t2 is undefined).
>
> I therefore got "Can't call method "get_nodes" on an undefined value  
> [...]"
>
> Can someone look into/confirm that?
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From N.Haigh at sheffield.ac.uk  Fri Jan 18 07:47:48 2008
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 18 Jan 2008 12:47:48 +0000
Subject: [Bioperl-l] Parsing Primer3 output
Message-ID: <1200660468.47909ff498dd0@webmail.shef.ac.uk>

I might be overlooking something, but is it possible to parse primer3 output?

Cheers
Nath


From cjfields at uiuc.edu  Fri Jan 18 08:27:47 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 Jan 2008 07:27:47 -0600
Subject: [Bioperl-l] Parsing Primer3 output
In-Reply-To: <1200660468.47909ff498dd0@webmail.shef.ac.uk>
References: <1200660468.47909ff498dd0@webmail.shef.ac.uk>
Message-ID: <8C8BF818-FC04-42E3-9210-3FE23F92EA8F@uiuc.edu>

Bio::Tools::Primer3.

chris

On Jan 18, 2008, at 6:47 AM, Nathan S. Haigh wrote:

> I might be overlooking something, but is it possible to parse  
> primer3 output?
>
> Cheers
> Nath
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hangsyin at gmail.com  Sat Jan 19 13:25:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sat, 19 Jan 2008 10:25:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined
 value at BIO::DB::GFF.pl
Message-ID: <14971922.post@talk.nabble.com>


Hi, everyone,

I met this problem when I was running this script to extract features
overlaps with 4:20,000..25,000. It always responds like "Can't call method
"features" on an undefined value at BIO::DB::GFF.pl line XX".
==============================================================
use Bio::DB::GFF;
use Bio::Tools::GFF;
my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
                                        -dsn =>
'dbi:mysql:dmel_gff:localhost',
                                        -user => 'XXXX',
                                        -pass => 'XXXX') || die "database
open failed";

my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
my @features = $segment->features(-types => ['gene', 'exon', 'intron',
'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
print(scalar(@features)."\n");

================================================================
I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
Other methods failed also. 

Any help will be deeply appreciated!

Best,
Jon

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14971922.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cain.cshl at gmail.com  Sat Jan 19 22:36:44 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Sat, 19 Jan 2008 22:36:44 -0500
Subject: [Bioperl-l] Problem: Can't call method "features" on
	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <14971922.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com>
Message-ID: <1200800204.6069.5.camel@frissell>

Hi Jon,

I think it's funny that you have "or die" on the database opening line,
"or die" on the @features line, but you didn't put one on the $segment
line.  Try adding "or die: $!" to the $segment line to see what it says,
also add a 'print $segment' after you create it and before you try to
get the features from it.  

Clearly, the problem is that $segment is not defined (that is, nothing
is in it, not that the wrong thing is in it).  The next trick is to find
out why.  My first guess, without looking at the data set, is that the
arm is not really named '4'.

Scott

On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> Hi, everyone,
> 
> I met this problem when I was running this script to extract features
> overlaps with 4:20,000..25,000. It always responds like "Can't call method
> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> ==============================================================
> use Bio::DB::GFF;
> use Bio::Tools::GFF;
> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>                                         -dsn =>
> 'dbi:mysql:dmel_gff:localhost',
>                                         -user => 'XXXX',
>                                         -pass => 'XXXX') || die "database
> open failed";
> 
> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> print(scalar(@features)."\n");
> 
> ================================================================
> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
> Other methods failed also. 
> 
> Any help will be deeply appreciated!
> 
> Best,
> Jon
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From hangsyin at gmail.com  Sat Jan 19 22:49:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sat, 19 Jan 2008 19:49:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on
	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <1200800204.6069.5.camel@frissell>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
Message-ID: <14978241.post@talk.nabble.com>


Hi, Scott,

After adding die $!, I know something is wrong at line:
"my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);"

my gff file is like this:
##gff-version 3
##sequence-region 4 1 1351857
4	FlyBase	transposable_element	2	611	.	+	.
ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
4	repeatmasker_dummy	match	2	347	.	+	.
ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
4	repeatmasker_dummy	match_part	2	347	2367	+	.
ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
5860 6210 +;
...
...
I really got confused. Any further suggestion? Thank you!

Jon





Scott Cain-3 wrote:
> 
> Hi Jon,
> 
> I think it's funny that you have "or die" on the database opening line,
> "or die" on the @features line, but you didn't put one on the $segment
> line.  Try adding "or die: $!" to the $segment line to see what it says,
> also add a 'print $segment' after you create it and before you try to
> get the features from it.  
> 
> Clearly, the problem is that $segment is not defined (that is, nothing
> is in it, not that the wrong thing is in it).  The next trick is to find
> out why.  My first guess, without looking at the data set, is that the
> arm is not really named '4'.
> 
> Scott
> 
> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>> Hi, everyone,
>> 
>> I met this problem when I was running this script to extract features
>> overlaps with 4:20,000..25,000. It always responds like "Can't call
>> method
>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>> ==============================================================
>> use Bio::DB::GFF;
>> use Bio::Tools::GFF;
>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>                                         -dsn =>
>> 'dbi:mysql:dmel_gff:localhost',
>>                                         -user => 'XXXX',
>>                                         -pass => 'XXXX') || die "database
>> open failed";
>> 
>> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
>> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
>> print(scalar(@features)."\n");
>> 
>> ================================================================
>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
>> Other methods failed also. 
>> 
>> Any help will be deeply appreciated!
>> 
>> Best,
>> Jon
>> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14978241.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cain.cshl at gmail.com  Sat Jan 19 23:08:04 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Sat, 19 Jan 2008 23:08:04 -0500
Subject: [Bioperl-l] Problem: Can't call method "features"
	on	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <14978241.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com>
	<1200800204.6069.5.camel@frissell>  <14978241.post@talk.nabble.com>
Message-ID: <1200802084.6069.11.camel@frissell>

Hi Jon,

Well, seeing the error message would be helpful, but my first guess
without is that there are a few things you can try:

  * removing the "sequence-region" line from the GFF file, adding a line
like this:

  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4

and then reloading the database.

  * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
is, with three levels of features (like gene, mRNA and CDS)).

Scott

On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
> Hi, Scott,
> 
> After adding die $!, I know something is wrong at line:
> "my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);"
> 
> my gff file is like this:
> ##gff-version 3
> ##sequence-region 4 1 1351857
> 4	FlyBase	transposable_element	2	611	.	+	.
> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
> 4	repeatmasker_dummy	match	2	347	.	+	.
> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
> 5860 6210 +;
> ...
> ...
> I really got confused. Any further suggestion? Thank you!
> 
> Jon
> 
> 
> 
> 
> 
> Scott Cain-3 wrote:
> > 
> > Hi Jon,
> > 
> > I think it's funny that you have "or die" on the database opening line,
> > "or die" on the @features line, but you didn't put one on the $segment
> > line.  Try adding "or die: $!" to the $segment line to see what it says,
> > also add a 'print $segment' after you create it and before you try to
> > get the features from it.  
> > 
> > Clearly, the problem is that $segment is not defined (that is, nothing
> > is in it, not that the wrong thing is in it).  The next trick is to find
> > out why.  My first guess, without looking at the data set, is that the
> > arm is not really named '4'.
> > 
> > Scott
> > 
> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> >> Hi, everyone,
> >> 
> >> I met this problem when I was running this script to extract features
> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
> >> method
> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> >> ==============================================================
> >> use Bio::DB::GFF;
> >> use Bio::Tools::GFF;
> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >>                                         -dsn =>
> >> 'dbi:mysql:dmel_gff:localhost',
> >>                                         -user => 'XXXX',
> >>                                         -pass => 'XXXX') || die "database
> >> open failed";
> >> 
> >> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> >> print(scalar(@features)."\n");
> >> 
> >> ================================================================
> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
> >> Other methods failed also. 
> >> 
> >> Any help will be deeply appreciated!
> >> 
> >> Best,
> >> Jon
> >> 
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From hangsyin at gmail.com  Sun Jan 20 10:08:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sun, 20 Jan 2008 07:08:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features"
	on	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <1200802084.6069.11.camel@frissell>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
Message-ID: <14982665.post@talk.nabble.com>


Hi, Scott,
I tried to change sequence-region line to "4   FlyBase  chromosome_arm  1 
1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say anything but
"died at line 12".

So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to
load the dmel-all-r5.4.gff(from Flybase) to a test database:
=============================================================
use Bio::DB::SeqFeature::Store;
use Bio::DB::SeqFeature::Store::GFF3Loader;
my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
                                         -dsn     => 'dbi:mysql:test',
                                         -user    => 'root',
                                         -pass    => 'XXXXX',
                                         -write   =>  1 );
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    => $db,
                                                         -verbose  => 1);
$loader->load(./'dmel-all-r5.4.gff');
=============================================================
I got bunch of errors like this:
"DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at
C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316".
The line 1316 in mysql.pm looks like this: $sth->execute($name) or die
$sth->errstr;
I checked the database test after failed loading. There is only one table
created, which call 'meta'. I also tried 'grant all on test to
XXX at localhost' and used that -user and -pass to load gff, it didn't work
either.

Jon


Scott Cain-3 wrote:
> 
> Hi Jon,
> 
> Well, seeing the error message would be helpful, but my first guess
> without is that there are a few things you can try:
> 
>   * removing the "sequence-region" line from the GFF file, adding a line
> like this:
> 
>   4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
> 
> and then reloading the database.
> 
>   * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
> is, with three levels of features (like gene, mRNA and CDS)).
> 
> Scott
> 
> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>> Hi, Scott,
>> 
>> After adding die $!, I know something is wrong at line:
>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>> 25000);"
>> 
>> my gff file is like this:
>> ##gff-version 3
>> ##sequence-region 4 1 1351857
>> 4	FlyBase	transposable_element	2	611	.	+	.
>> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>> 4	repeatmasker_dummy	match	2	347	.	+	.
>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
>> 5860 6210 +;
>> ...
>> ...
>> I really got confused. Any further suggestion? Thank you!
>> 
>> Jon
>> 
>> 
>> 
>> 
>> 
>> Scott Cain-3 wrote:
>> > 
>> > Hi Jon,
>> > 
>> > I think it's funny that you have "or die" on the database opening line,
>> > "or die" on the @features line, but you didn't put one on the $segment
>> > line.  Try adding "or die: $!" to the $segment line to see what it
>> says,
>> > also add a 'print $segment' after you create it and before you try to
>> > get the features from it.  
>> > 
>> > Clearly, the problem is that $segment is not defined (that is, nothing
>> > is in it, not that the wrong thing is in it).  The next trick is to
>> find
>> > out why.  My first guess, without looking at the data set, is that the
>> > arm is not really named '4'.
>> > 
>> > Scott
>> > 
>> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>> >> Hi, everyone,
>> >> 
>> >> I met this problem when I was running this script to extract features
>> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
>> >> method
>> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>> >> ==============================================================
>> >> use Bio::DB::GFF;
>> >> use Bio::Tools::GFF;
>> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>> >>                                         -dsn =>
>> >> 'dbi:mysql:dmel_gff:localhost',
>> >>                                         -user => 'XXXX',
>> >>                                         -pass => 'XXXX') || die
>> "database
>> >> open failed";
>> >> 
>> >> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>> 25000);
>> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
>> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
>> >> print(scalar(@features)."\n");
>> >> 
>> >> ================================================================
>> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
>> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>> error.
>> >> Other methods failed also. 
>> >> 
>> >> Any help will be deeply appreciated!
>> >> 
>> >> Best,
>> >> Jon
>> >> 
>> > -- 
>> >
>> ------------------------------------------------------------------------
>> > Scott Cain, Ph. D.                                        
>> cain at cshl.edu
>> > GMOD Coordinator (http://www.gmod.org/)                    
>> 216-392-3087
>> > Cold Spring Harbor Laboratory
>> > 
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > 
>> > 
>> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cain at cshl.edu  Sun Jan 20 10:25:16 2008
From: cain at cshl.edu (Scott Cain)
Date: Sun, 20 Jan 2008 10:25:16 -0500 (EST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an
 undefined value at BIO::DB::GFF.pl
In-Reply-To: <14982665.post@talk.nabble.com>
Message-ID: 

Jon,

There is a script for loading a SeqFeature database just like the GFF
database, though I don't know what it's called off hand (I'm not at my
normal computer right now).  Be sure to read the documentation and you
will probably want to use the 'fast' option (I don't remember what it is
called either).

Scott


----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain at cshl.edu
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Sun, 20 Jan 2008, Hang wrote:

> 
> Hi, Scott,
> I tried to change sequence-region line to "4   FlyBase  chromosome_arm  1 
> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say anything but
> "died at line 12".
> 
> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to
> load the dmel-all-r5.4.gff(from Flybase) to a test database:
> =============================================================
> use Bio::DB::SeqFeature::Store;
> use Bio::DB::SeqFeature::Store::GFF3Loader;
> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>                                          -dsn     => 'dbi:mysql:test',
>                                          -user    => 'root',
>                                          -pass    => 'XXXXX',
>                                          -write   =>  1 );
> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    => $db,
>                                                          -verbose  => 1);
> $loader->load(./'dmel-all-r5.4.gff');
> =============================================================
> I got bunch of errors like this:
> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at
> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316".
> The line 1316 in mysql.pm looks like this: $sth->execute($name) or die
> $sth->errstr;
> I checked the database test after failed loading. There is only one table
> created, which call 'meta'. I also tried 'grant all on test to
> XXX at localhost' and used that -user and -pass to load gff, it didn't work
> either.
> 
> Jon
> 
> 
> Scott Cain-3 wrote:
> > 
> > Hi Jon,
> > 
> > Well, seeing the error message would be helpful, but my first guess
> > without is that there are a few things you can try:
> > 
> >   * removing the "sequence-region" line from the GFF file, adding a line
> > like this:
> > 
> >   4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
> > 
> > and then reloading the database.
> > 
> >   * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
> > Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
> > is, with three levels of features (like gene, mRNA and CDS)).
> > 
> > Scott
> > 
> > On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
> >> Hi, Scott,
> >> 
> >> After adding die $!, I know something is wrong at line:
> >> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
> >> 25000);"
> >> 
> >> my gff file is like this:
> >> ##gff-version 3
> >> ##sequence-region 4 1 1351857
> >> 4	FlyBase	transposable_element	2	611	.	+	.
> >> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
> >> 4	repeatmasker_dummy	match	2	347	.	+	.
> >> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
> >> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
> >> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
> >> 5860 6210 +;
> >> ...
> >> ...
> >> I really got confused. Any further suggestion? Thank you!
> >> 
> >> Jon
> >> 
> >> 
> >> 
> >> 
> >> 
> >> Scott Cain-3 wrote:
> >> > 
> >> > Hi Jon,
> >> > 
> >> > I think it's funny that you have "or die" on the database opening line,
> >> > "or die" on the @features line, but you didn't put one on the $segment
> >> > line.  Try adding "or die: $!" to the $segment line to see what it
> >> says,
> >> > also add a 'print $segment' after you create it and before you try to
> >> > get the features from it.  
> >> > 
> >> > Clearly, the problem is that $segment is not defined (that is, nothing
> >> > is in it, not that the wrong thing is in it).  The next trick is to
> >> find
> >> > out why.  My first guess, without looking at the data set, is that the
> >> > arm is not really named '4'.
> >> > 
> >> > Scott
> >> > 
> >> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> >> >> Hi, everyone,
> >> >> 
> >> >> I met this problem when I was running this script to extract features
> >> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
> >> >> method
> >> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> >> >> ==============================================================
> >> >> use Bio::DB::GFF;
> >> >> use Bio::Tools::GFF;
> >> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >> >>                                         -dsn =>
> >> >> 'dbi:mysql:dmel_gff:localhost',
> >> >>                                         -user => 'XXXX',
> >> >>                                         -pass => 'XXXX') || die
> >> "database
> >> >> open failed";
> >> >> 
> >> >> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
> >> 25000);
> >> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> >> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> >> >> print(scalar(@features)."\n");
> >> >> 
> >> >> ================================================================
> >> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> >> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
> >> error.
> >> >> Other methods failed also. 
> >> >> 
> >> >> Any help will be deeply appreciated!
> >> >> 
> >> >> Best,
> >> >> Jon
> >> >> 
> >> > -- 
> >> >
> >> ------------------------------------------------------------------------
> >> > Scott Cain, Ph. D.                                        
> >> cain at cshl.edu
> >> > GMOD Coordinator (http://www.gmod.org/)                    
> >> 216-392-3087
> >> > Cold Spring Harbor Laboratory
> >> > 
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > 
> >> > 
> >> 
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
> -- 
> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Sun Jan 20 12:10:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 20 Jan 2008 11:10:27 -0600
Subject: [Bioperl-l] Problem: Can't call method "features" on an
	undefined value at BIO::DB::GFF.pl
In-Reply-To: 
References: 
Message-ID: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>

It's bp_seqfeature_load.pl (if you have the full bioperl core  
distribution, it's in script/Bio-SeqFeature/Store).  I had some  
problems with the fast-loading option but it was likely just my gff  
formatting; example data loaded just fine.

As for the error, you need to use the '-create' flag when initializing  
a database (or wiping data from a current one):

=============================================================
use Bio::DB::SeqFeature::Store;
use Bio::DB::SeqFeature::Store::GFF3Loader;
my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
                                         -dsn     => 'dbi:mysql:test',
                                         -user    => 'root',
                                         -pass    => 'XXXXX',
                                         -write   =>  1
                                         -create  => 1);
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>  
$db,
                                                         -verbose  =>  
1);
$loader->load(./'dmel-all-r5.4.gff');
=============================================================

chris

On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:

> Jon,
>
> There is a script for loading a SeqFeature database just like the GFF
> database, though I don't know what it's called off hand (I'm not at my
> normal computer right now).  Be sure to read the documentation and you
> will probably want to use the 'fast' option (I don't remember what  
> it is
> called either).
>
> Scott
>
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain at cshl.edu
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Sun, 20 Jan 2008, Hang wrote:
>
>>
>> Hi, Scott,
>> I tried to change sequence-region line to "4   FlyBase   
>> chromosome_arm  1
>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say  
>> anything but
>> "died at line 12".
>>
>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my  
>> code to
>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>> =============================================================
>> use Bio::DB::SeqFeature::Store;
>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>                                         -dsn     => 'dbi:mysql:test',
>>                                         -user    => 'root',
>>                                         -pass    => 'XXXXX',
>>                                         -write   =>  1 );
>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store     
>> => $db,
>>                                                         -verbose   
>> => 1);
>> $loader->load(./'dmel-all-r5.4.gff');
>> =============================================================
>> I got bunch of errors like this:
>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't  
>> exist at
>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line  
>> 1316".
>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or  
>> die
>> $sth->errstr;
>> I checked the database test after failed loading. There is only one  
>> table
>> created, which call 'meta'. I also tried 'grant all on test to
>> XXX at localhost' and used that -user and -pass to load gff, it didn't  
>> work
>> either.
>>
>> Jon
>>
>>
>> Scott Cain-3 wrote:
>>>
>>> Hi Jon,
>>>
>>> Well, seeing the error message would be helpful, but my first guess
>>> without is that there are a few things you can try:
>>>
>>>  * removing the "sequence-region" line from the GFF file, adding a  
>>> line
>>> like this:
>>>
>>>  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>
>>> and then reloading the database.
>>>
>>>  * Or, you may want to consider using Bio::DB::SeqFeature::Store,  
>>> since
>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3  
>>> (that
>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>
>>> Scott
>>>
>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>> Hi, Scott,
>>>>
>>>> After adding die $!, I know something is wrong at line:
>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>> 25000);"
>>>>
>>>> my gff file is like this:
>>>> ##gff-version 3
>>>> ##sequence-region 4 1 1351857
>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>> ID=FBti0062890;Name=ninja-Dsim- 
>>>> like 
>>>> {}4829 
>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- 
>>>> RepeatMasker;
>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>> ID=:5142029_dummy;Name=:5142029;Parent=: 
>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>> 5860 6210 +;
>>>> ...
>>>> ...
>>>> I really got confused. Any further suggestion? Thank you!
>>>>
>>>> Jon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Scott Cain-3 wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> I think it's funny that you have "or die" on the database  
>>>>> opening line,
>>>>> "or die" on the @features line, but you didn't put one on the  
>>>>> $segment
>>>>> line.  Try adding "or die: $!" to the $segment line to see what it
>>>> says,
>>>>> also add a 'print $segment' after you create it and before you  
>>>>> try to
>>>>> get the features from it.
>>>>>
>>>>> Clearly, the problem is that $segment is not defined (that is,  
>>>>> nothing
>>>>> is in it, not that the wrong thing is in it).  The next trick is  
>>>>> to
>>>> find
>>>>> out why.  My first guess, without looking at the data set, is  
>>>>> that the
>>>>> arm is not really named '4'.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>> Hi, everyone,
>>>>>>
>>>>>> I met this problem when I was running this script to extract  
>>>>>> features
>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't  
>>>>>> call
>>>>>> method
>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>> ==============================================================
>>>>>> use Bio::DB::GFF;
>>>>>> use Bio::Tools::GFF;
>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>                                        -dsn =>
>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>                                        -user => 'XXXX',
>>>>>>                                        -pass => 'XXXX') || die
>>>> "database
>>>>>> open failed";
>>>>>>
>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>> 25000);
>>>>>> my @features = $segment->features(-types => ['gene', 'exon',  
>>>>>> 'intron',
>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no  
>>>>>> features";
>>>>>> print(scalar(@features)."\n");
>>>>>>
>>>>>> ================================================================
>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I  
>>>>>> loaded
>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>>>> error.
>>>>>> Other methods failed also.
>>>>>>
>>>>>> Any help will be deeply appreciated!
>>>>>>
>>>>>> Best,
>>>>>> Jon
>>>>>>
>>>>> -- 
>>>>>
>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>> -- 
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From ykumagai at biken.osaka-u.ac.jp  Mon Jan 21 11:56:53 2008
From: ykumagai at biken.osaka-u.ac.jp (Yutaro Kumagai)
Date: Tue, 22 Jan 2008 01:56:53 +0900
Subject: [Bioperl-l] Problem with Bio::ASN1::EntrezGene::Indexer
Message-ID: <4794CED5.3070307@biken.osaka-u.ac.jp>

Hi, everyone,

I'm working on Bio::ASN1::EntrezGene::Indexer as below:

###
use Bio::ASN1::EntrezGene::Indexer
use Bio::ASN1::EntrezGene
use Bio::SeqIO;

my $inx = Bio::ASN1::EntrezGene::Indexer->new(-filename =>
					      'c:/chrm/asn/entrezgene.idx');

# The index file has already been made successfully. I checked it
# by counting the num. of records by $inx -> count_records etc. etc.

my $seq1 = $inx -> fetch_hash(15959);

# The ID 15969 surely exists, because I had no err message and
# by dumpening $seq1, I confirmed that $seq1 contains some data.

my $seq2 = $inx -> fetch(15969);
###

However, the last method returned this error:
"you must pass in a file name or handle through new() or input_file() first
before calling next_seq!
at C:/Perl/site/lib/Bio\SeqIO\entrezgene.pm line 136".

I chased the programm by the debugger, and found that somehow _fh()
in Bio::Index::AbstractSeq failed to pass the filehandle to fetch.

Now, I have two questions:

1) what's wrong with the above methods? Is this a bug? Or just my
fault? If so, what is my fault?

2) If I could'nt work with "fetch", how can I extract the data
of sequences (position in genomic contig, strand etc.) from
the data obtained by "fetch_hash"? Now I can't understand how
the data structure of results by "fetch_hash" is...

Thank you in advance.

Yutaro Kumagai.

-- 
**********************************
Yutaro Kumagai
Dept. of Host Defense
Res. Inst. for Microbial Diseases
Osaka University
Japan
ykumagai at biken.osaka-u.ac.jp
**********************************

From hangsyin at gmail.com  Mon Jan 21 14:22:55 2008
From: hangsyin at gmail.com (Hang)
Date: Mon, 21 Jan 2008 11:22:55 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an
 undefined value at BIO::DB::GFF.pl
In-Reply-To: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
	<14982665.post@talk.nabble.com>
	
	<3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
Message-ID: <15004412.post@talk.nabble.com>


Hi, Chris:

Following your suggestion, I added -create flag and the GFF3loader started
to work. Thanks alot!
When I load dmel-all-5.4.gff into mysql with -fast, I had the following
error:
   Data too long for column 'attribute_value' at c:/../../../mysql.pm line
510
If I don't use -fast, it is OK, except for the annoying slow speed. Do you
have any suggestion on this?

Best,
Hang




Chris Fields wrote:
> 
> It's bp_seqfeature_load.pl (if you have the full bioperl core  
> distribution, it's in script/Bio-SeqFeature/Store).  I had some  
> problems with the fast-loading option but it was likely just my gff  
> formatting; example data loaded just fine.
> 
> As for the error, you need to use the '-create' flag when initializing  
> a database (or wiping data from a current one):
> 
> =============================================================
> use Bio::DB::SeqFeature::Store;
> use Bio::DB::SeqFeature::Store::GFF3Loader;
> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>                                          -dsn     => 'dbi:mysql:test',
>                                          -user    => 'root',
>                                          -pass    => 'XXXXX',
>                                          -write   =>  1
>                                          -create  => 1);
> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>  
> $db,
>                                                          -verbose  =>  
> 1);
> $loader->load(./'dmel-all-r5.4.gff');
> =============================================================
> 
> chris
> 
> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:
> 
>> Jon,
>>
>> There is a script for loading a SeqFeature database just like the GFF
>> database, though I don't know what it's called off hand (I'm not at my
>> normal computer right now).  Be sure to read the documentation and you
>> will probably want to use the 'fast' option (I don't remember what  
>> it is
>> called either).
>>
>> Scott
>>
>>
>> ----------------------------------------------------------------------
>> Scott Cain, Ph. D.				 	 cain at cshl.edu
>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>> ----------------------------------------------------------------------
>>
>>
>> On Sun, 20 Jan 2008, Hang wrote:
>>
>>>
>>> Hi, Scott,
>>> I tried to change sequence-region line to "4   FlyBase   
>>> chromosome_arm  1
>>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say  
>>> anything but
>>> "died at line 12".
>>>
>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my  
>>> code to
>>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>>> =============================================================
>>> use Bio::DB::SeqFeature::Store;
>>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>>                                         -dsn     => 'dbi:mysql:test',
>>>                                         -user    => 'root',
>>>                                         -pass    => 'XXXXX',
>>>                                         -write   =>  1 );
>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store     
>>> => $db,
>>>                                                         -verbose   
>>> => 1);
>>> $loader->load(./'dmel-all-r5.4.gff');
>>> =============================================================
>>> I got bunch of errors like this:
>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't  
>>> exist at
>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line  
>>> 1316".
>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or  
>>> die
>>> $sth->errstr;
>>> I checked the database test after failed loading. There is only one  
>>> table
>>> created, which call 'meta'. I also tried 'grant all on test to
>>> XXX at localhost' and used that -user and -pass to load gff, it didn't  
>>> work
>>> either.
>>>
>>> Jon
>>>
>>>
>>> Scott Cain-3 wrote:
>>>>
>>>> Hi Jon,
>>>>
>>>> Well, seeing the error message would be helpful, but my first guess
>>>> without is that there are a few things you can try:
>>>>
>>>>  * removing the "sequence-region" line from the GFF file, adding a  
>>>> line
>>>> like this:
>>>>
>>>>  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>>
>>>> and then reloading the database.
>>>>
>>>>  * Or, you may want to consider using Bio::DB::SeqFeature::Store,  
>>>> since
>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3  
>>>> (that
>>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>>
>>>> Scott
>>>>
>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>>> Hi, Scott,
>>>>>
>>>>> After adding die $!, I know something is wrong at line:
>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>>> 25000);"
>>>>>
>>>>> my gff file is like this:
>>>>> ##gff-version 3
>>>>> ##sequence-region 4 1 1351857
>>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>>> ID=FBti0062890;Name=ninja-Dsim- 
>>>>> like 
>>>>> {}4829 
>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- 
>>>>> RepeatMasker;
>>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>>> ID=:5142029_dummy;Name=:5142029;Parent=: 
>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>>> 5860 6210 +;
>>>>> ...
>>>>> ...
>>>>> I really got confused. Any further suggestion? Thank you!
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Scott Cain-3 wrote:
>>>>>>
>>>>>> Hi Jon,
>>>>>>
>>>>>> I think it's funny that you have "or die" on the database  
>>>>>> opening line,
>>>>>> "or die" on the @features line, but you didn't put one on the  
>>>>>> $segment
>>>>>> line.  Try adding "or die: $!" to the $segment line to see what it
>>>>> says,
>>>>>> also add a 'print $segment' after you create it and before you  
>>>>>> try to
>>>>>> get the features from it.
>>>>>>
>>>>>> Clearly, the problem is that $segment is not defined (that is,  
>>>>>> nothing
>>>>>> is in it, not that the wrong thing is in it).  The next trick is  
>>>>>> to
>>>>> find
>>>>>> out why.  My first guess, without looking at the data set, is  
>>>>>> that the
>>>>>> arm is not really named '4'.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>>> Hi, everyone,
>>>>>>>
>>>>>>> I met this problem when I was running this script to extract  
>>>>>>> features
>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't  
>>>>>>> call
>>>>>>> method
>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>>> ==============================================================
>>>>>>> use Bio::DB::GFF;
>>>>>>> use Bio::Tools::GFF;
>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>>                                        -dsn =>
>>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>>                                        -user => 'XXXX',
>>>>>>>                                        -pass => 'XXXX') || die
>>>>> "database
>>>>>>> open failed";
>>>>>>>
>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>>> 25000);
>>>>>>> my @features = $segment->features(-types => ['gene', 'exon',  
>>>>>>> 'intron',
>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no  
>>>>>>> features";
>>>>>>> print(scalar(@features)."\n");
>>>>>>>
>>>>>>> ================================================================
>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I  
>>>>>>> loaded
>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>>>>> error.
>>>>>>> Other methods failed also.
>>>>>>>
>>>>>>> Any help will be deeply appreciated!
>>>>>>>
>>>>>>> Best,
>>>>>>> Jon
>>>>>>>
>>>>>> -- 
>>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>> Scott Cain, Ph. D.
>>>>> cain at cshl.edu
>>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>> 216-392-3087
>>>>>> Cold Spring Harbor Laboratory
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                        
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                      
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Jan 21 23:21:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 Jan 2008 22:21:27 -0600
Subject: [Bioperl-l] Problem: Can't call method "features" on an
	undefined value at BIO::DB::GFF.pl
In-Reply-To: <15004412.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
	<14982665.post@talk.nabble.com>
	
	<3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
	<15004412.post@talk.nabble.com>
Message-ID: <8B1956B2-1380-4E73-8F14-F79CA5435697@uiuc.edu>

I'm cc'ing this to the gbrowse list just in case Lincoln or Scott have  
an idea.  My guess is it's a bug in the fast loader.  Could you file  
this in bugzilla?

http://bugzilla.open-bio.org/

chris

On Jan 21, 2008, at 1:22 PM, Hang wrote:

>
> Hi, Chris:
>
> Following your suggestion, I added -create flag and the GFF3loader  
> started
> to work. Thanks alot!
> When I load dmel-all-5.4.gff into mysql with -fast, I had the  
> following
> error:
>   Data too long for column 'attribute_value' at c:/../../../mysql.pm  
> line
> 510
> If I don't use -fast, it is OK, except for the annoying slow speed.  
> Do you
> have any suggestion on this?
>
> Best,
> Hang
>
>
>
>
> Chris Fields wrote:
>>
>> It's bp_seqfeature_load.pl (if you have the full bioperl core
>> distribution, it's in script/Bio-SeqFeature/Store).  I had some
>> problems with the fast-loading option but it was likely just my gff
>> formatting; example data loaded just fine.
>>
>> As for the error, you need to use the '-create' flag when  
>> initializing
>> a database (or wiping data from a current one):
>>
>> =============================================================
>> use Bio::DB::SeqFeature::Store;
>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>                                         -dsn     => 'dbi:mysql:test',
>>                                         -user    => 'root',
>>                                         -pass    => 'XXXXX',
>>                                         -write   =>  1
>>                                         -create  => 1);
>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>
>> $db,
>>                                                         -verbose  =>
>> 1);
>> $loader->load(./'dmel-all-r5.4.gff');
>> =============================================================
>>
>> chris
>>
>> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:
>>
>>> Jon,
>>>
>>> There is a script for loading a SeqFeature database just like the  
>>> GFF
>>> database, though I don't know what it's called off hand (I'm not  
>>> at my
>>> normal computer right now).  Be sure to read the documentation and  
>>> you
>>> will probably want to use the 'fast' option (I don't remember what
>>> it is
>>> called either).
>>>
>>> Scott
>>>
>>>
>>> ----------------------------------------------------------------------
>>> Scott Cain, Ph. D.				 	 cain at cshl.edu
>>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>>> ----------------------------------------------------------------------
>>>
>>>
>>> On Sun, 20 Jan 2008, Hang wrote:
>>>
>>>>
>>>> Hi, Scott,
>>>> I tried to change sequence-region line to "4   FlyBase
>>>> chromosome_arm  1
>>>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say
>>>> anything but
>>>> "died at line 12".
>>>>
>>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my
>>>> code to
>>>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>>>> =============================================================
>>>> use Bio::DB::SeqFeature::Store;
>>>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>>>                                        -dsn     =>  
>>>> 'dbi:mysql:test',
>>>>                                        -user    => 'root',
>>>>                                        -pass    => 'XXXXX',
>>>>                                        -write   =>  1 );
>>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store
>>>> => $db,
>>>>                                                        -verbose
>>>> => 1);
>>>> $loader->load(./'dmel-all-r5.4.gff');
>>>> =============================================================
>>>> I got bunch of errors like this:
>>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't
>>>> exist at
>>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line
>>>> 1316".
>>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or
>>>> die
>>>> $sth->errstr;
>>>> I checked the database test after failed loading. There is only one
>>>> table
>>>> created, which call 'meta'. I also tried 'grant all on test to
>>>> XXX at localhost' and used that -user and -pass to load gff, it didn't
>>>> work
>>>> either.
>>>>
>>>> Jon
>>>>
>>>>
>>>> Scott Cain-3 wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> Well, seeing the error message would be helpful, but my first  
>>>>> guess
>>>>> without is that there are a few things you can try:
>>>>>
>>>>> * removing the "sequence-region" line from the GFF file, adding a
>>>>> line
>>>>> like this:
>>>>>
>>>>> 4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>>>
>>>>> and then reloading the database.
>>>>>
>>>>> * Or, you may want to consider using Bio::DB::SeqFeature::Store,
>>>>> since
>>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3
>>>>> (that
>>>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>>>
>>>>> Scott
>>>>>
>>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>>>> Hi, Scott,
>>>>>>
>>>>>> After adding die $!, I know something is wrong at line:
>>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end  
>>>>>> =>
>>>>>> 25000);"
>>>>>>
>>>>>> my gff file is like this:
>>>>>> ##gff-version 3
>>>>>> ##sequence-region 4 1 1351857
>>>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>>>> ID=FBti0062890;Name=ninja-Dsim-
>>>>>> like
>>>>>> {}4829
>>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-
>>>>>> RepeatMasker;
>>>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>>>> ID=:5142029_dummy;Name=:5142029;Parent=:
>>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>>>> 5860 6210 +;
>>>>>> ...
>>>>>> ...
>>>>>> I really got confused. Any further suggestion? Thank you!
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Scott Cain-3 wrote:
>>>>>>>
>>>>>>> Hi Jon,
>>>>>>>
>>>>>>> I think it's funny that you have "or die" on the database
>>>>>>> opening line,
>>>>>>> "or die" on the @features line, but you didn't put one on the
>>>>>>> $segment
>>>>>>> line.  Try adding "or die: $!" to the $segment line to see  
>>>>>>> what it
>>>>>> says,
>>>>>>> also add a 'print $segment' after you create it and before you
>>>>>>> try to
>>>>>>> get the features from it.
>>>>>>>
>>>>>>> Clearly, the problem is that $segment is not defined (that is,
>>>>>>> nothing
>>>>>>> is in it, not that the wrong thing is in it).  The next trick is
>>>>>>> to
>>>>>> find
>>>>>>> out why.  My first guess, without looking at the data set, is
>>>>>>> that the
>>>>>>> arm is not really named '4'.
>>>>>>>
>>>>>>> Scott
>>>>>>>
>>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>>>> Hi, everyone,
>>>>>>>>
>>>>>>>> I met this problem when I was running this script to extract
>>>>>>>> features
>>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't
>>>>>>>> call
>>>>>>>> method
>>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>>>> ==============================================================
>>>>>>>> use Bio::DB::GFF;
>>>>>>>> use Bio::Tools::GFF;
>>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>>>                                       -dsn =>
>>>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>>>                                       -user => 'XXXX',
>>>>>>>>                                       -pass => 'XXXX') || die
>>>>>> "database
>>>>>>>> open failed";
>>>>>>>>
>>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, - 
>>>>>>>> end =>
>>>>>> 25000);
>>>>>>>> my @features = $segment->features(-types => ['gene', 'exon',
>>>>>>>> 'intron',
>>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no
>>>>>>>> features";
>>>>>>>> print(scalar(@features)."\n");
>>>>>>>>
>>>>>>>> = 
>>>>>>>> ===============================================================
>>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I
>>>>>>>> loaded
>>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without  
>>>>>>>> any
>>>>>> error.
>>>>>>>> Other methods failed also.
>>>>>>>>
>>>>>>>> Any help will be deeply appreciated!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jon
>>>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>> Scott Cain, Ph. D.
>>>>>> cain at cshl.edu
>>>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>>> 216-392-3087
>>>>>>> Cold Spring Harbor Laboratory
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>> -- 
>>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.
>>>>> cain at cshl.edu
>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>> 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From jason at bioperl.org  Wed Jan 23 03:14:06 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 23 Jan 2008 00:14:06 -0800
Subject: [Bioperl-l] [Bioperl-guts-l] [14455]
	bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm: fixed up the
	gene glyph so that it works properly with CDS-only genes
In-Reply-To: <200801222048.m0MKmhiI007977@dev.open-bio.org>
References: <200801222048.m0MKmhiI007977@dev.open-bio.org>
Message-ID: <91659EDD-B102-47C8-BF93-92576C2CF324@bioperl.org>

Lincoln -- Thank you, Thank you for this fix!  This takes care of  
inconsistency problems I was having with GFF3 and GFF2 data.  It  
works so much more beautifully now!

-jason
On Jan 22, 2008, at 12:48 PM, Lincoln Stein wrote:

> Revision: 14455
> Author:   lstein
> Date:     2008-01-22 15:48:42 -0500 (Tue, 22 Jan 2008)
>
> Log Message:
> -----------
> fixed up the gene glyph so that it works properly with CDS-only genes
>
> Modified Paths:
> --------------
>     bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm
>
> Modified: bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm
> ===================================================================
> --- bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm	2008-01-22  
> 00:16:02 UTC (rev 14454)
> +++ bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm	2008-01-22  
> 20:48:42 UTC (rev 14455)
> @@ -44,7 +44,9 @@
>
>  sub bump {
>    my $self = shift;
> -  return 1 if $self->{level} == 0; # top level bumps, other levels  
> don't unless specified in config
> +  return 1
> +    if $self->{level} == 0
> +      && lc $self->feature->primary_tag eq 'gene'; # top level  
> bumps, other levels don't unless specified in config
>    return $self->SUPER::bump;
>  }
>
> @@ -92,12 +94,16 @@
>  sub _subfeat {
>    my $class   = shift;
>    my $feature = shift;
> -  if ($feature->primary_tag eq 'gene') {
> +  if (lc $feature->primary_tag eq 'gene') {
>      my @transcripts;
>      for my $t (qw/mRNA tRNA snRNA snoRNA miRNA ncRNA pseudogene/) {
>        push @transcripts, $feature->get_SeqFeatures($t);
>      }
>      return @transcripts;
> +  } elsif (lc $feature->primary_tag eq 'cds') {
> +    my @parts = $feature->get_SeqFeatures();
> +    return ($feature) if $class->{level} == 0 and !@parts;
> +    return @parts;
>    }
>
>    my @subparts;
>
>
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l


From ste.ghi at libero.it  Thu Jan 24 08:42:49 2008
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Thu, 24 Jan 2008 14:42:49 +0100
Subject: [Bioperl-l] parsing ACE file
Message-ID: 

Dear All,
    dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job?

Any suggestion about how to start is welcome...
Cheers

Stefano



From pmiguel at purdue.edu  Thu Jan 24 14:06:35 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Thu, 24 Jan 2008 14:06:35 -0500
Subject: [Bioperl-l] parsing ACE file
In-Reply-To: 
References: 
Message-ID: <4798E1BB.2020809@purdue.edu>

Stefano Ghignone wrote:
> Dear All,
>     dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job?
>
> Any suggestion about how to start is welcome...
> Cheers
>
> Stefano
>
>   
 perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace

will give you a list of each the contigs followed by the reads in each 
contig, if "acefile.ace" is a phrap ace file.

There is a bioperl module for handling phrap ace file, but I'm not sure 
what its current status is. Last time I looked (probably a couple of 
years ago) it seemed to have been abandoned half-finished.

-- 
Phillip

From golharam at umdnj.edu  Thu Jan 24 14:36:29 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 24 Jan 2008 14:36:29 -0500
Subject: [Bioperl-l] Wiki inconsistency?
Message-ID: <4798E8BD.7030107@umdnj.edu>

Hi,

I haven't used Bioperl in a while but recently started using it.  I was 
using 1.4.0 but see on the website that 1.5.2 has been released.   If I 
click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), 
I see a two versions:

bioperl-1.5.2_102

and

bioperl-1.5.2_100

However, If I click on the Downloads link on the left toolbar, then 
scroll down, I see 1.5.2 Developer Release.  The tar file here points to 
  current_core_unstable.tar.gz.

Is this supposed to be this way?  It seems a bit confusing.  I think it 
might be appropriate to put all the download links in one 
location...just my two cents...

Ryan


From cjfields at uiuc.edu  Thu Jan 24 15:58:25 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 24 Jan 2008 14:58:25 -0600
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <4798E8BD.7030107@umdnj.edu>
References: <4798E8BD.7030107@umdnj.edu>
Message-ID: 

Maybe Sendu can answer more specifically, but I believe the extra  
designation referred to the release candidate (of which bioperl-core  
was the only one with '102').  You definitely want the core package.   
The other ones with '100' are other bioperl-related distributions  
which require the core package but have additional functionality  
(BioSQL-related functions, wrapper modules, etc.).

chris

On Jan 24, 2008, at 1:36 PM, Ryan Golhar wrote:

> Hi,
>
> I haven't used Bioperl in a while but recently started using it.  I  
> was using 1.4.0 but see on the website that 1.5.2 has been  
> released.   If I click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2 
> ), I see a two versions:
>
> bioperl-1.5.2_102
>
> and
>
> bioperl-1.5.2_100
>
> However, If I click on the Downloads link on the left toolbar, then  
> scroll down, I see 1.5.2 Developer Release.  The tar file here  
> points to  current_core_unstable.tar.gz.
>
> Is this supposed to be this way?  It seems a bit confusing.  I think  
> it might be appropriate to put all the download links in one  
> location...just my two cents...
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From florent.angly at gmail.com  Thu Jan 24 17:06:29 2008
From: florent.angly at gmail.com (Florent Angly)
Date: Thu, 24 Jan 2008 14:06:29 -0800
Subject: [Bioperl-l] parsing ACE file
In-Reply-To: <4798E1BB.2020809@purdue.edu>
References: 
	<4798E1BB.2020809@purdue.edu>
Message-ID: <47990BE5.2010005@gmail.com>

That would be the module Bio::Assembly::IO::ace
It works fine as far as I know.
To parse an assembly, use Bio::Assembly::IO: 
http://doc.bioperl.org/bioperl-live/Bio/Assembly/IO.html
Regards,
Florent

Phillip San Miguel wrote:
> Stefano Ghignone wrote:
>> Dear All,
>>     dealing with an assembly .ace file and a list of contigs (from 
>> that assembly), how can I extract from the .ace file the read names 
>> forming each listed contig? Is there any module doing this job?
>>
>> Any suggestion about how to start is welcome...
>> Cheers
>>
>> Stefano
>>
>>   
> perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace
>
> will give you a list of each the contigs followed by the reads in each 
> contig, if "acefile.ace" is a phrap ace file.
>
> There is a bioperl module for handling phrap ace file, but I'm not 
> sure what its current status is. Last time I looked (probably a couple 
> of years ago) it seemed to have been abandoned half-finished.
>


From golharam at umdnj.edu  Thu Jan 24 16:17:14 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 24 Jan 2008 16:17:14 -0500
Subject: [Bioperl-l] GenBank updated sequence not being retrieved
Message-ID: <4799005A.5030204@umdnj.edu>

I'm using Bioperl 1.4 (and tried with 1.5.1).

I'm trying to download GenBank sequence for which I have accession #'s. 
  One of the sequences has been replaced with a newer version.  I'm 
using get_Seq_by_acc, which returns the warning:

-------------------- WARNING ---------------------
MSG: acc (gb|XM_087386) does not exist
---------------------------------------------------

If I check NCBI's website for the sequence, it has indeed been replaced 
by an NM_ sequence.  How can I get BioPerl to retrieve the latest 
version of a sequence?


From johan.nilsson at sh.se  Thu Jan 24 17:33:42 2008
From: johan.nilsson at sh.se (Johan Nilsson)
Date: Thu, 24 Jan 2008 23:33:42 +0100
Subject: [Bioperl-l] Quickest Codon Based MSA?
Message-ID: <47991246.6010106@sh.se>

Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign', 
but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage 
of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson

From e-just at northwestern.edu  Thu Jan 24 18:07:57 2008
From: e-just at northwestern.edu (Eric Just)
Date: Thu, 24 Jan 2008 17:07:57 -0600
Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago
Message-ID: 

Hello everyone,

We have an opening at dictyBase (Northwestern University in Chicago) for a
Bioinformatics Software Engineer.  This job involves writing and maintaining
software for a genome database using Chado/OO-Perl/ Bioperl and many other
state-of-the-art technologies.

For more information please see:
http://dictybase.org/dictybase_jobs.htm

Thanks,
Eric

From bix at sendu.me.uk  Thu Jan 24 18:16:14 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 24 Jan 2008 23:16:14 +0000
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <4798E8BD.7030107@umdnj.edu>
References: <4798E8BD.7030107@umdnj.edu>
Message-ID: <47991C3E.2010908@sendu.me.uk>

Ryan Golhar wrote:
> Hi,
> 
> I haven't used Bioperl in a while but recently started using it.  I was 
> using 1.4.0 but see on the website that 1.5.2 has been released.   If I 
> click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), 
> I see a two versions:
> 
> bioperl-1.5.2_102
> 
> and
> 
> bioperl-1.5.2_100

Where do you see this older version? I did a search on the page and that 
term isn't found. _100 was the first version of 1.5.2 core to go out. 
There were then 2 minor revisions released, as detailed in the 'Updates' 
section of the page.


> However, If I click on the Downloads link on the left toolbar, then 
> scroll down, I see 1.5.2 Developer Release.  The tar file here points to 
> current_core_unstable.tar.gz.

Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest 
version happens to be. So that people don't need to worry about the 
actual version, they can just have one static bookmark.


> Is this supposed to be this way?  It seems a bit confusing.  I think it 
> might be appropriate to put all the download links in one 
> location...just my two cents...

Well the primary page where all the links are found is the Downloads 
page. The Release_1.5.2 page is specific to 1.5.2 and will remain for 
historic reasons (so at some point there will be 1.5.3 or something and 
the appropriate links on the main Downloads page will be updated to 
that, but if someone specifically wants 1.5.2 they can still find the 
1.5.2 downloads on its own dedicated page).

From jason at bioperl.org  Thu Jan 24 21:17:02 2008
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 24 Jan 2008 18:17:02 -0800
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
References: <47991246.6010106@sh.se>
Message-ID: 

I don't know if it is faster or slower than what you have tried but  
the aa_to_dna_aln translates a protein alignment back to CDS.  You  
can see example code of it in use in the pairwise_kaks script in  
scripts/utilities/pairwise_kaks.PLS

-jason
On Jan 24, 2008, at 2:33 PM, Johan Nilsson wrote:

> Hello,
>
> I have a question which might not necessarily be related to  
> Bioperl, although I do believe the expertise is available here. I  
> have a couple of thousand FASTA files, each containing 20 CDS  
> sequence orthologues of rather high sequence similarity. I would  
> like to create a codon-based multiple sequence alignment for each  
> of these FASTA files (i.e. a nucleotide sequence alignment inferred  
> from alignment of the translated peptide sequences, to assure that  
> no frame shifts will occur). I first tried running Dialign2, which  
> can perform the translation/back-translation in one go, but this  
> turned out to be far too slow. I next tried to build protein  
> alignments using ClustalW and subsequently built the coding region  
> alignment using EMBOSS 'tranalign', but this also was too slow.
>
> Is there any method available which significantly speeds up the  
> codon-preserving alignment??? As I mentioned, the sequences to be  
> aligned are in general very conserved, so any heuristic taking  
> advantage of the low divergence would be very helpful! Also, is  
> there any adjustable parameter in dialign2/dialign-T that might  
> speed up the program when looking at highly similar sequences?
>
> Best regards
> /Johan Nilsson
> _______________________________________________
> Bioperl-l mailing list


From tristan.lefebure at gmail.com  Thu Jan 24 22:07:52 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Thu, 24 Jan 2008 22:07:52 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree, and how to combine trees
Message-ID: <200801242207.52991.tristan.lefebure@gmail.com>

Hi,

I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would 
like to merge several "one leaf taxonomic trees" into a taxonomic tree with 
several leafs. For example:

#####BEGINNING#####
#! /usr/bin/perl

use strict;
use warnings;
use Bio::DB::Taxonomy;
use Bio::TreeIO;

# The taxonomic database
# You might want to switch to a different flatfile or to Entrez 
my $dbh = new Bio::DB::Taxonomy(-source   => 'flatfile',
                                  -directory=> '/tmp',  
                                  -nodesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/nodes.dmp', 
                                  -namesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/names.dmp');

# Fetch 4 taxa for the example
my $tax_decapoda =  $dbh->get_taxon(-name => 'Decapoda');
my $tax_heteroptera =  $dbh->get_taxon(-name => 'Heteroptera');
my $tax_coleoptera =  $dbh->get_taxon(-name => 'Coleoptera');
my $tax_copepoda =  $dbh->get_taxon(-name => 'Copepoda');

# Transform to tree objects
my $decapoda_tree = new Bio::Tree::Tree(-node => $tax_decapoda);
my $heteroptera_tree = new Bio::Tree::Tree(-node => $tax_heteroptera);
my $coleoptera_tree = new Bio::Tree::Tree(-node => $tax_coleoptera);
my $copepoda_tree = new Bio::Tree::Tree(-node => $tax_copepoda);

# Reduce the number of nodes to the following ranks
my @ranks = qw(kingdom phylum subphylum superclass class subclass superorder 
order family);

$decapoda_tree->splice(-keep_rank => \@ranks);
$heteroptera_tree->splice(-keep_rank => \@ranks);
$coleoptera_tree->splice(-keep_rank => \@ranks);
$copepoda_tree->splice(-keep_rank => \@ranks);

# Print the trees
my $out = new Bio::TreeIO('-format' => 'newick',
                                   '-file'   => ">four.tree");
$out->write_tree($decapoda_tree);
$out->write_tree($heteroptera_tree);
$out->write_tree($coleoptera_tree);
$out->write_tree($copepoda_tree);

#####END#######

This gives the following "trees":
(((((7524)33340)50557)6960)6656)33208;
(((((7041)33340)50557)6960)6656)33208;
((((((6683)6682)72041)6681)6657)6656)33208;
((((6830)72037)6657)6656)33208;

They are really special trees, as they contain only one leaf. I would like to 
combine them and remove the 'unused' nodes to obtain something like that:

((7524,7041)33340,(6683,6830)6657)6656;

or even better:

((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda;

Any suggestions?

Thanks!

-Tristan


From anjan.purkayastha at gmail.com  Thu Jan 24 18:32:20 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Thu, 24 Jan 2008 18:32:20 -0500
Subject: [Bioperl-l] Question from a bioperl newbie
Message-ID: 

hi,
i recently installed bioperl on my mac-machine.
tried to use it in a simple script with a "use Bio::Perl" command. however,
i get an error message "Can't locate Bio/Perl.pm in @INC".
the BioPerl folder is in my desktop. so i tried use: use lib
"/Users/anjan/Desktop/bioperl-1.5.2_102/Bio";
This time it returned me another error: Undefined subroutine
&main::get_sequence.

so, when BioPerl is installed, which directory does it reside in.( it's not
present in the .cpan/build directory.)

appreciate your prompt reply.

anjan

-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================

From bosborne11 at verizon.net  Thu Jan 24 23:04:50 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 24 Jan 2008 23:04:50 -0500
Subject: [Bioperl-l] Question from a bioperl newbie
In-Reply-To: 
References: 
Message-ID: <3B13E81A-66E1-418A-8915-9E877C2B751D@verizon.net>

Anjan,

use lib "/Users/anjan/Desktop/bioperl-1.5.2_102/";

Brian O.


On Jan 24, 2008, at 6:32 PM, ANJAN PURKAYASTHA wrote:

> hi,
> i recently installed bioperl on my mac-machine.
> tried to use it in a simple script with a "use Bio::Perl" command.  
> however,
> i get an error message "Can't locate Bio/Perl.pm in @INC".
> the BioPerl folder is in my desktop. so i tried use: use lib
> "/Users/anjan/Desktop/bioperl-1.5.2_102/Bio";
> This time it returned me another error: Undefined subroutine
> &main::get_sequence.
>
> so, when BioPerl is installed, which directory does it reside in. 
> ( it's not
> present in the .cpan/build directory.)
>
> appreciate your prompt reply.
>
> anjan
>
> -- 
> ANJAN PURKAYASTHA, PhD.
> Senior Computational Biologist
> ==========================
>
> 1101 King Street, Suite 310,
> Alexandria, VA 22314.
> 703.518.8040 (office)
> 703.740.6939 (mobile)
>
> email:
> anjan at vbi.vt.edu;
> anjan.purkayastha at gmail.com
>
> http://www.vbi.vt.edu
>
> ==========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From n.haigh at sheffield.ac.uk  Fri Jan 25 02:32:10 2008
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Fri, 25 Jan 2008 07:32:10 +0000
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <47991C3E.2010908@sendu.me.uk>
References: <4798E8BD.7030107@umdnj.edu> <47991C3E.2010908@sendu.me.uk>
Message-ID: <4799907A.9060301@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sendu,

Have you thought about using a template for the latest stable release and the latest developer release? That way, any article/link that always needs
to point to the latest version simply has to include the correct template? So once a new release is made, you simply update the one template, and
changes automatically propagate through the wiki - might save some wiki admin each time there's a new release. You could get more intricate, and use a
template to show the latest version of any particular release series so you could do something like:

{{latest release|series=1.5.x|full=y}}
and
{{latest release|series=1.4.x|full=y}}

or even:

{{latest release|series=stable|full=y}}
and
{{latest release|series=dev|full=y}}

these templates could return 1.5.2_102 if the "full" param is set to something or simply 1.5.2 if the "full" param is missing.

Just a thought.
Nath


Sendu Bala wrote:
> Ryan Golhar wrote:
>> Hi,
>>
>> I haven't used Bioperl in a while but recently started using it.  I
>> was using 1.4.0 but see on the website that 1.5.2 has been released.  
>> If I click on the link for 1.5.2
>> (http://www.bioperl.org/wiki/Release_1.5.2), I see a two versions:
>>
>> bioperl-1.5.2_102
>>
>> and
>>
>> bioperl-1.5.2_100
> 
> Where do you see this older version? I did a search on the page and that
> term isn't found. _100 was the first version of 1.5.2 core to go out.
> There were then 2 minor revisions released, as detailed in the 'Updates'
> section of the page.
> 
> 
>> However, If I click on the Downloads link on the left toolbar, then
>> scroll down, I see 1.5.2 Developer Release.  The tar file here points
>> to current_core_unstable.tar.gz.
> 
> Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest
> version happens to be. So that people don't need to worry about the
> actual version, they can just have one static bookmark.
> 
> 
>> Is this supposed to be this way?  It seems a bit confusing.  I think
>> it might be appropriate to put all the download links in one
>> location...just my two cents...
> 
> Well the primary page where all the links are found is the Downloads
> page. The Release_1.5.2 page is specific to 1.5.2 and will remain for
> historic reasons (so at some point there will be 1.5.3 or something and
> the appropriate links on the main Downloads page will be updated to
> that, but if someone specifically wants 1.5.2 they can still find the
> 1.5.2 downloads on its own dedicated page).
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHmZB69gTv6QYzVL4RAnRpAJwOyWjZXzD0UJBNFNP8H1Hrn4c66ACfRyzA
NsJEZydsG+aMzNltrBw+Nx4=
=kHt0
-----END PGP SIGNATURE-----

From derek.fairley at belfasttrust.hscni.net  Fri Jan 25 03:31:28 2008
From: derek.fairley at belfasttrust.hscni.net (Fairley, Derek)
Date: Fri, 25 Jan 2008 08:31:28 -0000
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
Message-ID: 

Johan,

There is currently no Bioperl-run wrapper for this program, but you
might want to have a look at Codon Align 2.0 as well:
http://homepage.mac.com/barryghall/CodonAlign.html

Derek

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Johan Nilsson
Sent: 24 January 2008 22:34
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Quickest Codon Based MSA?

Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign',

but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage

of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ewijaya at gmail.com  Fri Jan 25 04:26:05 2008
From: ewijaya at gmail.com (Edward Wijaya)
Date: Fri, 25 Jan 2008 17:26:05 +0800
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
Message-ID: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>

Dear Experts,

Suppose I have the following list of gene names and Ensemble Ids.

RBL1	ENSG00000080839
RB1	ENSG00000139687
CDC2	ENSG00000170312
CDC25A	ENSG00000164045
CCNA2	ENSG00000145386
E2F3	ENSG00000112242
E2F2	ENSG00000007968
CDK2	ENSG00000123374
...etc...

Is there a way to extract the gene sequence from those list?
And then output them in FASTA format.

- Edward

From bix at sendu.me.uk  Fri Jan 25 05:55:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 25 Jan 2008 10:55:50 +0000
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
References: <47991246.6010106@sh.se>
Message-ID: <4799C036.5060404@sendu.me.uk>

Johan Nilsson wrote:
> Hello,
> 
> I have a question which might not necessarily be related to Bioperl, 
> although I do believe the expertise is available here. I have a couple 
> of thousand FASTA files, each containing 20 CDS sequence orthologues of 
> rather high sequence similarity. I would like to create a codon-based 
> multiple sequence alignment for each of these FASTA files (i.e. a 
> nucleotide sequence alignment inferred from alignment of the translated 
> peptide sequences, to assure that no frame shifts will occur). I first 
> tried running Dialign2, which can perform the 
> translation/back-translation in one go, but this turned out to be far 
> too slow. I next tried to build protein alignments using ClustalW and 
> subsequently built the coding region alignment using EMBOSS 'tranalign', 
> but this also was too slow.
> 
> Is there any method available which significantly speeds up the 
> codon-preserving alignment??? As I mentioned, the sequences to be 
> aligned are in general very conserved, so any heuristic taking advantage 
> of the low divergence would be very helpful! Also, is there any 
> adjustable parameter in dialign2/dialign-T that might speed up the 
> program when looking at highly similar sequences?

Do you know which is the slow part? For example, when using ClustalW, 
are the alignments slower than the creating the codon alignment from the 
protein?

If ClustalW is the problem, you can try using other alignment programs 
famous for their speed, such as Muscle. If it's the protein->codon bit 
that's slow, try using other programs to do that, like Pal2Nal or the 
BioPerl method.

From David.Messina at sbc.su.se  Fri Jan 25 06:35:16 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 25 Jan 2008 12:35:16 +0100
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
Message-ID: <628aabb70801250335l2a2754efn3e73e44a9dae6a35@mail.gmail.com>

Hi Edward,

I don't think there's a direct BioPerl interface to Ensembl, but BioMart at
Ensembl itself will get you sequences (and lots of other things if you want)
given a list of Ensembl IDs.

http://www.ensembl.org/biomart/martview

Note that as of this writing, the Ensembl BioMart server appears to be down
temporarily.

If you want to be able to get Ensembl sequences from a program, there's the
Ensembl API:

http://www.ensembl.org/info/using/api/core/core_tutorial.html



Dave

From bix at sendu.me.uk  Fri Jan 25 06:07:42 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 25 Jan 2008 11:07:42 +0000
Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree,
 and how to combine trees
In-Reply-To: <200801242207.52991.tristan.lefebure@gmail.com>
References: <200801242207.52991.tristan.lefebure@gmail.com>
Message-ID: <4799C2FE.8080700@sendu.me.uk>

Tristan Lefebure wrote:
> Hi,
> 
> I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would 
> like to merge several "one leaf taxonomic trees" into a taxonomic tree with 
> several leafs.
[...]
> or even better:
> 
> ((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda;

The BioPerl script taxonomy2tree.pl generates:

(((Decapoda,Copepoda)Crustacea,(Heteroptera,Coleoptera)Neoptera)Pancrustacea)"cellular 
organisms";

I think you can modify it similar to your own script to only output the 
classes you're interested in.



http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/taxa/taxonomy2tree.PLS

From bosborne11 at verizon.net  Fri Jan 25 08:53:36 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 25 Jan 2008 08:53:36 -0500
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
Message-ID: <9CE20DF3-ED5F-4432-A191-4123896E5815@verizon.net>

Edward,

Various approaches are discussed here:

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

Since you have ENSEMBL ids I'd think that would be the way to go.


Brian O.

On Jan 25, 2008, at 4:26 AM, Edward Wijaya wrote:

> Dear Experts,
>
> Suppose I have the following list of gene names and Ensemble Ids.
>
> RBL1	ENSG00000080839
> RB1	ENSG00000139687
> CDC2	ENSG00000170312
> CDC25A	ENSG00000164045
> CCNA2	ENSG00000145386
> E2F3	ENSG00000112242
> E2F2	ENSG00000007968
> CDK2	ENSG00000123374
> ...etc...
>
> Is there a way to extract the gene sequence from those list?
> And then output them in FASTA format.
>
> - Edward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From snoze.pa at gmail.com  Fri Jan 25 18:30:56 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Fri, 25 Jan 2008 17:30:56 -0600
Subject: [Bioperl-l] bioperl DB error
Message-ID: <10f848910801251530j6eacfcb0x81780ae312cf19c5@mail.gmail.com>

Dear Users,
 I am using bioperl/iosql and trying to install ncbi taxonomy. But I am
getting following error message.
any help? thanks in advance

perl load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser
root
Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568.

From snoze.pa at gmail.com  Fri Jan 25 18:49:28 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Fri, 25 Jan 2008 17:49:28 -0600
Subject: [Bioperl-l] bioseqDB error
Message-ID: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>

Hi Anyone know why i am getting this error message!!

Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568

From wkath83 at vbi.vt.edu  Thu Jan 24 13:19:06 2008
From: wkath83 at vbi.vt.edu (Katherine Wendelsdorf)
Date: Thu, 24 Jan 2008 13:19:06 -0500 (EST)
Subject: [Bioperl-l] bioperl on mac
Message-ID: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>

Dear one who knows,

I have a macbook with Leopard OSX and I am having trouble running scripts
that call for bioperl modules.

Here is my history: Using Fink I installed bioperl-pm586 version 1.5.2-4
and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl-pm586 in
to the command line I get nothing. Spotlight says that the path is
/sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.

1. I tried to run test2.pl script that was literally copied and pasted
from the HOWTO manual, but it wouldnt run. The two attached docs are the
script I tried to run and the output (which is nonexistant). I read
something that said to "go in to" Bioperl to execute a command. I could
not enter the bioperl directory when it was in the sw/shared directory so
I copied the bioperl folder to the Desktop just so I could try executing
the script inside bioperl. Where am I going wrong here?

Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
somewhere else on my computer? Shoudl they be in the same directory as
perl (usr/bin/perl)?

2. How do I know what modules are included in the bioperl-pm586 I
downloaded? Specifically I want to use Bio::SeqIO.

3. What is the best way to download/install new modules as I need them?


Any answers you coudl give me for any of these questions would be greatly
appreciated!

Thank you so much, kind volunteer!
-Kate
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL: 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test2.pl
URL: 

From bosborne11 at verizon.net  Sat Jan 26 11:14:13 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Sat, 26 Jan 2008 11:14:13 -0500
Subject: [Bioperl-l] bioperl on mac
In-Reply-To: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
Message-ID: 

Katherine,

Perl keeps the addresses of all the module directories in its @INC  
array. What do you see when you do:

perl -e 'print @INC'

?

If '/sw/share/bioperl-pm586' is not in @INC then you need to put it  
there, perhaps by adding something like:

setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586

to the .tcshrc file in your home directory (if you use tcsh that is,  
most use bash, .bashrc, and 'set' these days).

You asked some other questions, the general answer is that all the  
modules you'll need are in the 2 packages you've installed, and you  
don't need to move them from /sw.


Brian O.


On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote:

> Dear one who knows,
>
> I have a macbook with Leopard OSX and I am having trouble running  
> scripts
> that call for bioperl modules.
>
> Here is my history: Using Fink I installed bioperl-pm586 version  
> 1.5.2-4
> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- 
> pm586 in
> to the command line I get nothing. Spotlight says that the path is
> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.
>
> 1. I tried to run test2.pl script that was literally copied and pasted
> from the HOWTO manual, but it wouldnt run. The two attached docs are  
> the
> script I tried to run and the output (which is nonexistant). I read
> something that said to "go in to" Bioperl to execute a command. I  
> could
> not enter the bioperl directory when it was in the sw/shared  
> directory so
> I copied the bioperl folder to the Desktop just so I could try  
> executing
> the script inside bioperl. Where am I going wrong here?
>
> Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
> somewhere else on my computer? Shoudl they be in the same directory as
> perl (usr/bin/perl)?
>
> 2. How do I know what modules are included in the bioperl-pm586 I
> downloaded? Specifically I want to use Bio::SeqIO.
>
> 3. What is the best way to download/install new modules as I need  
> them?
>
>
> Any answers you coudl give me for any of these questions would be  
> greatly
> appreciated!
>
> Thank you so much, kind volunteer!
> - 
> Kate 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Sat Jan 26 15:30:11 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 26 Jan 2008 12:30:11 -0800
Subject: [Bioperl-l] bioperl on mac
In-Reply-To: 
References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
	
Message-ID: 

Usually this is done by fink by adding a line to your .tcshrc (if you  
are running that shell) or .bash_profile or .bashrc.

On my machine I have this at the top of my .bash_profile file:
test -r /sw/bin/init.sh && . /sw/bin/init.sh

if that is not there you need to add it to insure that all the fink  
tools are setup properly.

On Jan 26, 2008, at 8:14 AM, Brian Osborne wrote:

> Katherine,
>
> Perl keeps the addresses of all the module directories in its @INC  
> array. What do you see when you do:
>
> perl -e 'print @INC'
>
> ?
>
> If '/sw/share/bioperl-pm586' is not in @INC then you need to put it  
> there, perhaps by adding something like:
>
> setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586
>
> to the .tcshrc file in your home directory (if you use tcsh that  
> is, most use bash, .bashrc, and 'set' these days).
>
> You asked some other questions, the general answer is that all the  
> modules you'll need are in the 2 packages you've installed, and you  
> don't need to move them from /sw.
>
>
> Brian O.
>
>
> On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote:
>
>> Dear one who knows,
>>
>> I have a macbook with Leopard OSX and I am having trouble running  
>> scripts
>> that call for bioperl modules.
>>
>> Here is my history: Using Fink I installed bioperl-pm586 version  
>> 1.5.2-4
>> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- 
>> pm586 in
>> to the command line I get nothing. Spotlight says that the path is
>> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.
>>
>> 1. I tried to run test2.pl script that was literally copied and  
>> pasted
>> from the HOWTO manual, but it wouldnt run. The two attached docs  
>> are the
>> script I tried to run and the output (which is nonexistant). I read
>> something that said to "go in to" Bioperl to execute a command. I  
>> could
>> not enter the bioperl directory when it was in the sw/shared  
>> directory so
>> I copied the bioperl folder to the Desktop just so I could try  
>> executing
>> the script inside bioperl. Where am I going wrong here?
>>
>> Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
>> somewhere else on my computer? Shoudl they be in the same  
>> directory as
>> perl (usr/bin/perl)?
>>
>> 2. How do I know what modules are included in the bioperl-pm586 I
>> downloaded? Specifically I want to use Bio::SeqIO.
>>
>> 3. What is the best way to download/install new modules as I need  
>> them?
>>
>>
>> Any answers you coudl give me for any of these questions would be  
>> greatly
>> appreciated!
>>
>> Thank you so much, kind volunteer!
>> - 
>> Kate_____________________________________________ 
>> __
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Sat Jan 26 19:14:45 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 26 Jan 2008 16:14:45 -0800
Subject: [Bioperl-l] a question on "move_id_to_bootstrap" usage
In-Reply-To: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com>
References: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com>
Message-ID: <8273f6c20801261614p312886d5x562593aa0cde60da@mail.gmail.com>

I'm not sure why you still have the __DATA__ block if you are reading data
in from a file or are you trying to send an example of the code but forgot
to specify a different input point?

If you are reading from a file that looks like the tree in the __DATA__
block you notice that the bootstrap info is encoded as the branch_length,
NOT the id - the move_id_to_bootstrap only moves the ID to the BOOTSTRAP.
you'll have to write a custom routine or just run a simple loop on your tree
to move the data to the bootstrap - it would look just the
move_id_to_bootstrap except you'd use branch_length instead of id to get the
data that you want to set in the bootstrap.  I leave it as an exercise for
the reader, but if you can't figure it out let us know.


In the future please ask your questions on the mailing list as I don't have
much time to answer questions individually when someone else can help.

-jason

On Jan 23, 2008 1:57 PM, Anand  wrote:

> HI Jason,
>
> Thanks a lot. I followed your suggestion and updated both the modules.
>
> I followed the code example on http://www.bioperl.org/wiki/HOWTO:Trees and
> tried to extract bootstrap values for my tree (which is output after
> seqboot, protdist, fitch and consense)
>
> When I try running my script, I am not able to print the bootstrap
> values...and it doesn't throw any error messages. Am I missing something?
>
> ====START of Code====
> #!/usr/bin/perl -w
> use strict;
> use lib "/home/anand/myperlmodules/lib/perl5/";
> use Bio::TreeIO;
> # $usage: $0 
>
> my $infile = shift;
>
> my $treeio = Bio::TreeIO->new(-format => 'newick',
>                          -file => $infile,
>                          -internal_node_id => 'bootstrap',
>                          );
>
> while( my $tree = $treeio->next_tree ) {
>    for my $node ( $tree->get_nodes ) {
>        printf "id: %s bootstrap: %s\n", $node->id || '', $node->bootstrap
> || '', "\n";
>    }
> }
> __END__
> ((5815_1:100.0,(((5815_5:100.0,5815_7:100.0):100.0,5815_6:100.0):97.0
> ,5815_8:100.0):
> 98.0,5815_4:100.0,5815_2:100.0):100.0,5815_3:100.0);
> ====END of Code====
>
> Thanks in advance for your time and help,
>
> Anand
>
> PS: Just to preserve formatting, I have attached the consense_output_file
>
> On Jan 22, 2008 8:02 AM, Jason Stajich  wrote:
>
> > I suspect you may want to update everything in Bio/TreeIO and Bio/
> > Tree to be safe, I'm not exactly sure what was changed - you can look
> > at the commit logs to see what else changed at the time - http://
> > code.open-bio.org/.   You can also use that same server to grab a
> > fresh checkout of what is the current state of the code base.
> >
> > -jason
> > On Jan 22, 2008, at 12:59 AM, Anand wrote:
> >
> > > Hi Jason
> > >
> > > I have a question on the method "move_id_to_bootstrap". From this
> > > post:
> > > http://portal.open-bio.org/pipermail/bioperl-guts-l/2007-May/
> > > 025718.html
> > >
> > > it looks like it has been added very recently. As luck would have
> > > it, the
> > > TreeFunctionsI.pm in my bioperl installation is missing that method.
> > >
> > > My question: What is the best method to update TreeFunctionsI.pm so
> > > that it
> > > can have the "move_id_to_bootstrap" method? Does it have other update
> > > dependencies.
> > >
> > > Thanks in advance for your help and time,
> > >
> > > Anand
> >
> >
>



-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason

From hlapp at duke.edu  Mon Jan 28 00:27:34 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 28 Jan 2008 00:27:34 -0500
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
References: <4795292E.4030401@sdsc.edu>
Message-ID: 

Some folks may remember that CIPRES (http://www.phylo.org) released  
their portal with access to remote execution of several phylogenetic  
tree reconstruction programs in spring last year.

It took a while but they have now also built a really nice REST-based  
API that makes the service fully programmable instead of screen- 
scraping 5 pages:

http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)

It should be relatively straightforward to build the equivalent of  
RemoteBlast on top of this. Would anyone be keen to take this on?

	-hilmar

P.S. Sorry for the cross-posting - I thought this is relevant to both  
communities. When responding in a project-specific way please make  
sure you remove the list that is no longer pertinent.


Begin forwarded message:

> From: Lucie Chan 
> Date: January 21, 2008 6:22:22 PM EST
> To: Hilmar Lapp 
> Cc: Mark Miller , Rutger Vos ,  
> Terri Liebowitz , Paul Hoover ,  
> mtholder at ku.edu
> Subject: Re: REST APIs for Cipres Web Portal
> Reply-To: lcchan at sdsc.edu
>
> Hilmar, et al.,
>
> I just released the first version of our REST Web Services API for  
> job submission, and job status query, and
> job result file retrieval. I'd like to get some feedbacks (issues,  
> problems, improvements, suggestions, etc) from you. For  
> documentation on how to access the services, check it out at:
>
> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
> API" below the "CIPRES PORTAL" banner.
>
> Lucie
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================




From cjfields at uiuc.edu  Mon Jan 28 01:04:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 00:04:46 -0600
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: 
References: <4795292E.4030401@sdsc.edu>
	
Message-ID: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>

We can certainly add it to the to-do list; just need to sort out the  
details (how often to allow posts, etc).  I guess we would want this  
in the Bio::Tools::Run namespace, same as RemoteBlast?

chris

On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:

> Some folks may remember that CIPRES (http://www.phylo.org) released  
> their portal with access to remote execution of several phylogenetic  
> tree reconstruction programs in spring last year.
>
> It took a while but they have now also built a really nice REST- 
> based API that makes the service fully programmable instead of  
> screen-scraping 5 pages:
>
> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>
> It should be relatively straightforward to build the equivalent of  
> RemoteBlast on top of this. Would anyone be keen to take this on?
>
> 	-hilmar
>
> P.S. Sorry for the cross-posting - I thought this is relevant to  
> both communities. When responding in a project-specific way please  
> make sure you remove the list that is no longer pertinent.
>
>
> Begin forwarded message:
>
>> From: Lucie Chan 
>> Date: January 21, 2008 6:22:22 PM EST
>> To: Hilmar Lapp 
>> Cc: Mark Miller , Rutger Vos ,  
>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu
>> Subject: Re: REST APIs for Cipres Web Portal
>> Reply-To: lcchan at sdsc.edu
>>
>> Hilmar, et al.,
>>
>> I just released the first version of our REST Web Services API for  
>> job submission, and job status query, and
>> job result file retrieval. I'd like to get some feedbacks (issues,  
>> problems, improvements, suggestions, etc) from you. For  
>> documentation on how to access the services, check it out at:
>>
>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>> API" below the "CIPRES PORTAL" banner.
>>
>> Lucie
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From hlapp at duke.edu  Mon Jan 28 08:42:39 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 28 Jan 2008 08:42:39 -0500
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
References: <4795292E.4030401@sdsc.edu>
	
	<7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
Message-ID: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>

Yep that's what I was thinking.

BTW the API needs multipart/form-data encoding for input (due to file  
upload); I'm assuming that that's supported well in LWP but if anyone  
knows where to start digging for that the pointer would be appreciated.

	-hilmar

On Jan 28, 2008, at 1:04 AM, Chris Fields wrote:

> We can certainly add it to the to-do list; just need to sort out  
> the details (how often to allow posts, etc).  I guess we would want  
> this in the Bio::Tools::Run namespace, same as RemoteBlast?
>
> chris
>
> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:
>
>> Some folks may remember that CIPRES (http://www.phylo.org)  
>> released their portal with access to remote execution of several  
>> phylogenetic tree reconstruction programs in spring last year.
>>
>> It took a while but they have now also built a really nice REST- 
>> based API that makes the service fully programmable instead of  
>> screen-scraping 5 pages:
>>
>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>>
>> It should be relatively straightforward to build the equivalent of  
>> RemoteBlast on top of this. Would anyone be keen to take this on?
>>
>> 	-hilmar
>>
>> P.S. Sorry for the cross-posting - I thought this is relevant to  
>> both communities. When responding in a project-specific way please  
>> make sure you remove the list that is no longer pertinent.
>>
>>
>> Begin forwarded message:
>>
>>> From: Lucie Chan 
>>> Date: January 21, 2008 6:22:22 PM EST
>>> To: Hilmar Lapp 
>>> Cc: Mark Miller , Rutger Vos ,  
>>> Terri Liebowitz , Paul Hoover ,  
>>> mtholder at ku.edu
>>> Subject: Re: REST APIs for Cipres Web Portal
>>> Reply-To: lcchan at sdsc.edu
>>>
>>> Hilmar, et al.,
>>>
>>> I just released the first version of our REST Web Services API  
>>> for job submission, and job status query, and
>>> job result file retrieval. I'd like to get some feedbacks  
>>> (issues, problems, improvements, suggestions, etc) from you. For  
>>> documentation on how to access the services, check it out at:
>>>
>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>>> API" below the "CIPRES PORTAL" banner.
>>>
>>> Lucie
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================




From cjfields at uiuc.edu  Mon Jan 28 08:50:08 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 07:50:08 -0600
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>
References: <4795292E.4030401@sdsc.edu>
	
	<7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
	<2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>
Message-ID: 

Googled it.

 From http://www.issociate.de/board/post/258535/LWP_-_multipart/form-data_file_upload_from_scalar_rather_than_local_file.html 
  :

my $ua = new LWP::UserAgent;
$response=$ua->request(POST $URL,
Content_Type => 'multipart/form-data',
Content => [ $PARAM => [undef,$FILENAME, Content => $CONTENTS ] ]);

Where $PARAM is the name of the parameter, $FILENAME is what you want
to call the file, and $CONTENTS is a scalar holding the contents of the
file.

Could probably use HTTP::Request in there, but whatever works.

chris

On Jan 28, 2008, at 7:42 AM, Hilmar Lapp wrote:

> Yep that's what I was thinking.
>
> BTW the API needs multipart/form-data encoding for input (due to  
> file upload); I'm assuming that that's supported well in LWP but if  
> anyone knows where to start digging for that the pointer would be  
> appreciated.
>
> 	-hilmar
>
> On Jan 28, 2008, at 1:04 AM, Chris Fields wrote:
>
>> We can certainly add it to the to-do list; just need to sort out  
>> the details (how often to allow posts, etc).  I guess we would want  
>> this in the Bio::Tools::Run namespace, same as RemoteBlast?
>>
>> chris
>>
>> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:
>>
>>> Some folks may remember that CIPRES (http://www.phylo.org)  
>>> released their portal with access to remote execution of several  
>>> phylogenetic tree reconstruction programs in spring last year.
>>>
>>> It took a while but they have now also built a really nice REST- 
>>> based API that makes the service fully programmable instead of  
>>> screen-scraping 5 pages:
>>>
>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>>>
>>> It should be relatively straightforward to build the equivalent of  
>>> RemoteBlast on top of this. Would anyone be keen to take this on?
>>>
>>> 	-hilmar
>>>
>>> P.S. Sorry for the cross-posting - I thought this is relevant to  
>>> both communities. When responding in a project-specific way please  
>>> make sure you remove the list that is no longer pertinent.
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: Lucie Chan 
>>>> Date: January 21, 2008 6:22:22 PM EST
>>>> To: Hilmar Lapp 
>>>> Cc: Mark Miller , Rutger Vos ,  
>>>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu
>>>> Subject: Re: REST APIs for Cipres Web Portal
>>>> Reply-To: lcchan at sdsc.edu
>>>>
>>>> Hilmar, et al.,
>>>>
>>>> I just released the first version of our REST Web Services API  
>>>> for job submission, and job status query, and
>>>> job result file retrieval. I'd like to get some feedbacks  
>>>> (issues, problems, improvements, suggestions, etc) from you. For  
>>>> documentation on how to access the services, check it out at:
>>>>
>>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>>>> API" below the "CIPRES PORTAL" banner.
>>>>
>>>> Lucie
>>>>
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>>> ===========================================================
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From shandar at nibio.go.jp  Sun Jan 27 01:50:40 2008
From: shandar at nibio.go.jp (Shandar Ahmad)
Date: Sun, 27 Jan 2008 15:50:40 +0900
Subject: [Bioperl-l] PRIB 2008
Message-ID: <1201416640.31793.7.camel@boe>

******* Our apologies if you received multiple copies ***********
If you wish not to receive PRIB 2008 related emails, please write to
Madhu Chetty 
and CC to me at shandar at nibio.go.jp
******************************************************************



PRELIMINARY CALL FOR PAPERS AND INVITED SESSIONS

********************************************************************************************
Third IAPR International Conference on Pattern Recognition in 
Bioinformatics (PRIB 2008)
October 15 ? 17, 2008
Melbourne, Australia

http://www.infotech.monash.edu.au/prib08
********************************************************************************************

PRIB 2008 is aimed at bringing together top researchers, practitioners, 
and students from around the world to discuss the applications of 
pattern recognition methods in the field of bioinformatics to solve 
problems in life sciences. Pattern recognition techniques of interest 
include: statistical, syntactic, and structural approaches, Bayesian, 
hidden Markov and graphical models, neural networks, fuzzy and genetic 
algorithms, data mining, and their hybrids. Papers in areas of (but not 
limited to) bio-sequence analysis, gene and protein expression
analysis, 
structure prediction, protein folding, docking, metabolic pathway 
analysis and regulatory networks, system biology, drug design, and 
bioimaging, are solicited for presentation at the conference.

All papers will be peer reviewed and accepted papers will be published 
in the conference proceedings as an edited volume in Lecture Notes in 
Bioinformatics by Springer. Submission of papers will be electronic and 
through the conference website. Proposals for special sessions and 
tutorials at the conference are also invited in all related areas of 
research. Authors of selected papers presented at the conference will 
also be invited for publication in Special Issues of reputed journals.

Location:
Melbourne is a sophisticated city in the south-east corner of mainland 
Australia. It is known for its attractive site seeing places, great 
events, passion for food and wine and fabulous scenery. Boasting as a 
style-setter, Melbourne is home to continuous program of festivals, art 
exhibitions and musical extravaganzas. Warning: you might never want to 
go home.

For latest information on PRIB 2008, visit the conference web site:
http://www.infotech.monash.edu.au/prib08

or email the secretariat at prib2008.melb at infotech.monash.edu.au

Important Deadlines
Paper submission: 15 April 2008
Proposals for Special Sessions/Tutorials: 15 March 2008
Author notification: 15 May 2008
Camera-ready papers: 15 June 2008


Organising Committee, PRIB 2008


From snoze.pa at gmail.com  Mon Jan 28 16:07:37 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Mon, 28 Jan 2008 15:07:37 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
Message-ID: <10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>

Still I am getting the same error message..

My question is:

Do i need to install bioperl-DB for biosql?

When I am using biosql and trying to load NCBI taxonomy then it is working
fine. but when I am trying to install bioperl-DB then it is giving me
following error message when loading NCBI taxonomy.

Any help?



Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568

From susantoroy at gmail.com  Mon Jan 28 16:05:49 2008
From: susantoroy at gmail.com (Susanta Roy)
Date: Tue, 29 Jan 2008 02:35:49 +0530
Subject: [Bioperl-l] Please remove my letter from your site
Message-ID: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>

Dear Sir,
Please remove my letter appearing at your below URL:
http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html
http://bioperl.org/pipermail/bioperl-l/2007-December.txt
http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html


It is not supposed to appear online.
Thanks in advance.

Regards
Suisanta

From cjfields at uiuc.edu  Mon Jan 28 16:53:33 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 15:53:33 -0600
Subject: [Bioperl-l] Please remove my letter from your site
In-Reply-To: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>
References: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>
Message-ID: 

Um, you posted to a public mailing list (hence the list is open to the  
public, for searching, indexing via Google, etc).  Terms of usage are  
here:

http://lists.open-bio.org/mailman/listinfo/bioperl-l

with more info here:

http://www.bioperl.org/wiki/Mailing_lists

BTW, this post will also appear.  C'est la vie!

chris

On Jan 28, 2008, at 3:05 PM, Susanta Roy wrote:

> Dear Sir,
> Please remove my letter appearing at your below URL:
> http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html
> http://bioperl.org/pipermail/bioperl-l/2007-December.txt
> http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html
>
>
> It is not supposed to appear online.
> Thanks in advance.
>
> Regards
> Suisanta
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From snoze.pa at gmail.com  Tue Jan 29 12:15:41 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 29 Jan 2008 11:15:41 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
Message-ID: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>

Dear Users,
I tried the to refresh installation and seems it is working. But when I
loading sequences then it is giving me following warning messages. Am i
doing alright? or i am missing huge chunk of sequences..Thanks in advance
s

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were ("","1") FKs (27,3,4)
Duplicate entry '27-3-4-1' for key 2
---------------------------------------------------
...
...
and so on

From tristan.lefebure at gmail.com  Tue Jan 29 12:19:23 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 29 Jan 2008 12:19:23 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
Message-ID: <200801291219.23172.tristan.lefebure@gmail.com>

Hello,

I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers.
I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. 
My script works well for short request, but it gives the following error with the long request:

 ------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
500 short write
Content-Type: text/plain
Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
Client-Warning: Internal response

500 short write

STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: ./fetch_from_genbank.pl:58
---------------------------------------------------------

Does that mean that we can only fetch 500 sequences at a time?
Should I split my list in 500 ids framents and submit them one after the other?

Any suggestions very welcomed...
Thanks,
-Tristan


Here is the script:

##################################
use strict;
use warnings;
use Bio::DB::GenBank;
# use Bio::DB::EUtilities;
use Bio::SeqIO;
use Getopt::Long;

# 2008-01-22 T Lefebure
# I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank.
# The following procedure is not really good as the stream is first copied to a temporary file,
# and than re-used by BioPerl to generate the final file.

my $db = 'nucleotide';
my $format = 'genbank';
my $help= '';
my $dformat = 'gb';

GetOptions(
	'help|?' => \$help,
	'format=s'  => \$format,
	'database=s'	=> \$db,
);


my $printhelp = "\nUsage: $0 [options]  

Will download the corresponding data from GenBank. BioPerl is required.

Options:
	-h
		print this help
	-format: genbank|fasta|...
		give output format (default=genbank)
	-database: nucleotide|genome|protein|...
		define the database to search in (default=nucleotide)

The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";

if ($#ARGV<1) {
	print $printhelp;
	exit;
}

open LIST, $ARGV[0];
my @list = ;

if ($format eq 'fasta') { $dformat = 'fasta' }

my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
				-format => $dformat,
				-db => $db,
			);
my $seqio = $gb->get_Stream_by_acc(\@list);

my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
				-format => $format,
			);
while (my $seqo = $seqio->next_seq ) {
	print $seqo->id, "\n";
	$seqout->write_seq($seqo);
}

From cjfields at uiuc.edu  Tue Jan 29 13:06:08 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 29 Jan 2008 12:06:08 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>

Yes, you can only retrieve ~500 sequences at a time using either  
Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities  
interact with NCBI's EUtilities (the former module returns raw data  
from the URL to be processed later, the latter module returns Bio::Seq/ 
Bio::SeqIO objects).

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets

You can usually post more IDs using epost and fetch sequence referring  
to the WebEnv/key combo (batch posting).  I try to make this a bit  
easier with EUtilities but it is woefully lacking in documentation (my  
fault), but there is some code up on the wiki which should work.

chris

On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:

> Hello,
>
> I would like to download a large number of sequences from GenBank  
> (122,146 to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and  
> finally used Bio::DB::GenBank.
> My script works well for short request, but it gives the following  
> error with the long request:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after  
> the other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get  
> back to Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first  
> copied to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
> 	'help|?' => \$help,
> 	'format=s'  => \$format,
> 	'database=s'	=> \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is  
> required.
>
> Options:
> 	-h
> 		print this help
> 	-format: genbank|fasta|...
> 		give output format (default=genbank)
> 	-database: nucleotide|genome|protein|...
> 		define the database to search in (default=nucleotide)
>
> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html 
> \n";
>
> if ($#ARGV<1) {
> 	print $printhelp;
> 	exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 				-format => $dformat,
> 				-db => $db,
> 			);
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> 				-format => $format,
> 			);
> while (my $seqo = $seqio->next_seq ) {
> 	print $seqo->id, "\n";
> 	$seqout->write_seq($seqo);
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From snoze.pa at gmail.com  Tue Jan 29 13:22:56 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 29 Jan 2008 12:22:56 -0600
Subject: [Bioperl-l] loading sequence error bioseq
Message-ID: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>

Dear User,

 After successfully creating a database bioseqdb and loading ncbi_taxonomy
successfully I am getting following error message while loading sequences
into database.

load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc

MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were ("","31") FKs
MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were
MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were

Column 'dbname' cannot be null

STACK: /usr/local/bioperl-
db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620
-----------------------------------------------------------

 at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl line
633

Any Idea?

Thanks in advance
s

From cjfields at uiuc.edu  Tue Jan 29 13:44:16 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 29 Jan 2008 12:44:16 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <479F7149.1010203@atgc.org>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
	<479F7149.1010203@atgc.org>
Message-ID: 

Forgot about that one; it's definitely a better way to do it if you  
have the GI/accessions.

chris

On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:

> you don't need to use bioperl to accomplish this task, to download  
> several thousand sequences based on accession ID list.
>
> NCBI batch Entrez can do that:
> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>
> just submit a large list of IDs, select database, and download.
>
> you can submit ~50,000 IDs in one file usually without problems.
> it may not return results if a list is larger than ~100,000 IDs
>
> --
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 Health Sciences Drive
> Genome Center, 4-th floor, room 4302
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email#1: akozik at atgc.org
> email#2: akozik at gmail.com
> web: http://www.atgc.org/
>
>
>
> Chris Fields wrote:
>> Yes, you can only retrieve ~500 sequences at a time using either  
>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities  
>> interact with NCBI's EUtilities (the former module returns raw data  
>> from the URL to be processed later, the latter module returns  
>> Bio::Seq/Bio::SeqIO objects).
>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets 
>>  You can usually post more IDs using epost and fetch sequence  
>> referring to the WebEnv/key combo (batch posting).  I try to make  
>> this a bit easier with EUtilities but it is woefully lacking in  
>> documentation (my fault), but there is some code up on the wiki  
>> which should work.
>> chris
>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>> Hello,
>>>
>>> I would like to download a large number of sequences from GenBank  
>>> (122,146 to be exact) following a list of accession numbers.
>>> I first investigated around Bio::DB::EUtilities, but got lost and  
>>> finally used Bio::DB::GenBank.
>>> My script works well for short request, but it gives the following  
>>> error with the long request:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: WebDBSeqI Request Error:
>>> 500 short write
>>> Content-Type: text/plain
>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>> Client-Warning: Internal response
>>>
>>> 500 short write
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
>>> Root.pm:359
>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ 
>>> Bio/DB/WebDBSeqI.pm:685
>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ 
>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>> STACK: ./fetch_from_genbank.pl:58
>>> ---------------------------------------------------------
>>>
>>> Does that mean that we can only fetch 500 sequences at a time?
>>> Should I split my list in 500 ids framents and submit them one  
>>> after the other?
>>>
>>> Any suggestions very welcomed...
>>> Thanks,
>>> -Tristan
>>>
>>>
>>> Here is the script:
>>>
>>> ##################################
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GenBank;
>>> # use Bio::DB::EUtilities;
>>> use Bio::SeqIO;
>>> use Getopt::Long;
>>>
>>> # 2008-01-22 T Lefebure
>>> # I tried to use Bio::DB::EUtilities without much succes and get  
>>> back to Bio::DB::GenBank.
>>> # The following procedure is not really good as the stream is  
>>> first copied to a temporary file,
>>> # and than re-used by BioPerl to generate the final file.
>>>
>>> my $db = 'nucleotide';
>>> my $format = 'genbank';
>>> my $help= '';
>>> my $dformat = 'gb';
>>>
>>> GetOptions(
>>>    'help|?' => \$help,
>>>    'format=s'  => \$format,
>>>    'database=s'    => \$db,
>>> );
>>>
>>>
>>> my $printhelp = "\nUsage: $0 [options]  
>>>
>>> Will download the corresponding data from GenBank. BioPerl is  
>>> required.
>>>
>>> Options:
>>>    -h
>>>        print this help
>>>    -format: genbank|fasta|...
>>>        give output format (default=genbank)
>>>    -database: nucleotide|genome|protein|...
>>>        define the database to search in (default=nucleotide)
>>>
>>> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html 
>>> \n";
>>>
>>> if ($#ARGV<1) {
>>>    print $printhelp;
>>>    exit;
>>> }
>>>
>>> open LIST, $ARGV[0];
>>> my @list = ;
>>>
>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>
>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>                -format => $dformat,
>>>                -db => $db,
>>>            );
>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>
>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>                -format => $format,
>>>            );
>>> while (my $seqo = $seqio->next_seq ) {
>>>    print $seqo->id, "\n";
>>>    $seqout->write_seq($seqo);
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From akozik at atgc.org  Tue Jan 29 13:32:41 2008
From: akozik at atgc.org (Alexander Kozik)
Date: Tue, 29 Jan 2008 10:32:41 -0800
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
Message-ID: <479F7149.1010203@atgc.org>

you don't need to use bioperl to accomplish this task, to download 
several thousand sequences based on accession ID list.

NCBI batch Entrez can do that:
http://www.ncbi.nlm.nih.gov/sites/batchentrez

just submit a large list of IDs, select database, and download.

you can submit ~50,000 IDs in one file usually without problems.
it may not return results if a list is larger than ~100,000 IDs

--
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 Health Sciences Drive
Genome Center, 4-th floor, room 4302
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/



Chris Fields wrote:
> Yes, you can only retrieve ~500 sequences at a time using either 
> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities 
> interact with NCBI's EUtilities (the former module returns raw data from 
> the URL to be processed later, the latter module returns 
> Bio::Seq/Bio::SeqIO objects).
> 
> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets 
> 
> 
> You can usually post more IDs using epost and fetch sequence referring 
> to the WebEnv/key combo (batch posting).  I try to make this a bit 
> easier with EUtilities but it is woefully lacking in documentation (my 
> fault), but there is some code up on the wiki which should work.
> 
> chris
> 
> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> 
>> Hello,
>>
>> I would like to download a large number of sequences from GenBank 
>> (122,146 to be exact) following a list of accession numbers.
>> I first investigated around Bio::DB::EUtilities, but got lost and 
>> finally used Bio::DB::GenBank.
>> My script works well for short request, but it gives the following 
>> error with the long request:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: WebDBSeqI Request Error:
>> 500 short write
>> Content-Type: text/plain
>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>> Client-Warning: Internal response
>>
>> 500 short write
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::DB::WebDBSeqI::_request 
>> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
>> STACK: Bio::DB::WebDBSeqI::get_seq_stream 
>> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc 
>> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>> STACK: ./fetch_from_genbank.pl:58
>> ---------------------------------------------------------
>>
>> Does that mean that we can only fetch 500 sequences at a time?
>> Should I split my list in 500 ids framents and submit them one after 
>> the other?
>>
>> Any suggestions very welcomed...
>> Thanks,
>> -Tristan
>>
>>
>> Here is the script:
>>
>> ##################################
>> use strict;
>> use warnings;
>> use Bio::DB::GenBank;
>> # use Bio::DB::EUtilities;
>> use Bio::SeqIO;
>> use Getopt::Long;
>>
>> # 2008-01-22 T Lefebure
>> # I tried to use Bio::DB::EUtilities without much succes and get back 
>> to Bio::DB::GenBank.
>> # The following procedure is not really good as the stream is first 
>> copied to a temporary file,
>> # and than re-used by BioPerl to generate the final file.
>>
>> my $db = 'nucleotide';
>> my $format = 'genbank';
>> my $help= '';
>> my $dformat = 'gb';
>>
>> GetOptions(
>>     'help|?' => \$help,
>>     'format=s'  => \$format,
>>     'database=s'    => \$db,
>> );
>>
>>
>> my $printhelp = "\nUsage: $0 [options]  
>>
>> Will download the corresponding data from GenBank. BioPerl is required.
>>
>> Options:
>>     -h
>>         print this help
>>     -format: genbank|fasta|...
>>         give output format (default=genbank)
>>     -database: nucleotide|genome|protein|...
>>         define the database to search in (default=nucleotide)
>>
>> The full description of the options can be find at 
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";
>>
>> if ($#ARGV<1) {
>>     print $printhelp;
>>     exit;
>> }
>>
>> open LIST, $ARGV[0];
>> my @list = ;
>>
>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>
>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>                 -format => $dformat,
>>                 -db => $db,
>>             );
>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>
>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>                 -format => $format,
>>             );
>> while (my $seqo = $seqio->next_seq ) {
>>     print $seqo->id, "\n";
>>     $seqout->write_seq($seqo);
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From hlapp at gmx.net  Tue Jan 29 16:31:47 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 29 Jan 2008 16:31:47 -0500
Subject: [Bioperl-l] loading sequence error bioseq
In-Reply-To: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>
References: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>
Message-ID: 

This looks suspiciously like a data error. Can you please give the  
full command line. This should also show which format your sequences  
are in.

	-hilmar

On Jan 29, 2008, at 1:22 PM, snoze pa wrote:

> Dear User,
>
>  After successfully creating a database bioseqdb and loading  
> ncbi_taxonomy
> successfully I am getting following error message while loading  
> sequences
> into database.
>
> load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc
>
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were ("","31") FKs
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values  
> were
>
> Column 'dbname' cannot be null
>
> STACK: /usr/local/bioperl-
> db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620
> -----------------------------------------------------------
>
>  at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/ 
> load_seqdatabase.pl line
> 633
>
> Any Idea?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






From hlapp at gmx.net  Tue Jan 29 16:40:21 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 29 Jan 2008 16:40:21 -0500
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
	<10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
Message-ID: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net>

This would mean that two or more seqfeatures with the same type for  
the same sequence exist in the input data, each with rank 1.

Normally the rank will be incremented for each seqfeature of a  
sequence, so I'm not sure how this is happening here w/o seeing the  
data.

	-hilmar
On Jan 29, 2008, at 12:15 PM, snoze pa wrote:

> Dear Users,
> I tried the to refresh installation and seems it is working. But  
> when I
> loading sequences then it is giving me following warning messages.  
> Am i
> doing alright? or i am missing huge chunk of sequences..Thanks in  
> advance
> s
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were ("","1") FKs (27,3,4)
> Duplicate entry '27-3-4-1' for key 2
> ---------------------------------------------------
> ...
> ...
> and so on
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






From avilella at gmail.com  Wed Jan 30 04:28:34 2008
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 30 Jan 2008 09:28:34 +0000
Subject: [Bioperl-l] fetch dna seqs from genbank protein ids
Message-ID: <358f4d650801300128q44cf95a0va11799908c4f26a0@mail.gmail.com>

Hi bioperlers,

Got a question here:

>I have a bunch of protein sequences in multi-FastA with their
>accession numbers in the header and I want to retrieve their
>corresponding nucleotide sequences and nucleotide accession numbers.
>I can't seem to find a way to do it. I am looking at eUtils on the
>NCBI site, but they only do really simple stuff.

I had a look at the fetch example scripts, and I could fetch proteins
from Genbank,
but I don't see a clear connection between the protein sequence and
the DNA sequence.
Is this a DBlink? Which type?

Cheers,

    Albert.

From tristan.lefebure at gmail.com  Wed Jan 30 09:56:07 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 30 Jan 2008 09:56:07 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: 
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
Message-ID: <200801300956.07849.tristan.lefebure@gmail.com>

Thank you both!

Just in case it might be usefull for someone else, here are my ramblings:

1. I first tried to adapt my script and fetch 500 sequences at a time. It works, except that ~40% of the time NCBI gives the following error and my script crashed:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
[...]
    The proxy server received an invalid
    response from an upstream server.
[...]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: ./fetch_from_genbank.pl:68
-----------------------------------------------------------

I tried to modify the script so that when the retrieval of a 500 sequence block crashes, it continues with the other blocks, but I was unsuccessfull. It probably needs some better understanding of BioPerl errors...
Here is the section of the script that was modified:
#########
my $n_seq = scalar @list;
my @aborted;

for (my $i=1; $i<=$n_seq; $i += 500) {
	print "Fetching sequences $i to ", $i+499, ": ";
	my $start = $i -1;
	my $end = $i + 500 -1;
	my @red_list = @list[$start .. $end]; 
	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
					-format => $dformat,
					-db => $db,
				);

	my $seqio;
	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
		print "Aborted, resubmit latter\n";
		push @aborted, @red_list;
		next;
	}
	
	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
					-format => $format,
				);
	while (my $seqo = $seqio->next_seq ) {
# 		print $seqo->id, "\n";
		$seqout->write_seq($seqo);
	}
	print "Done\n";
}

if (@aborted) {
	open OUT, ">aborted_fetching.AN";
	foreach (@aborted) { print OUT $_ };
}
##########


2. So I moved to the second solution and tried batchentrez. I cut my 120,000 long AN list into 10,000 long pieces using split:
split -l 10000 full_list.AN splitted_list_

and then submitted the 13 lists one by one. I must say that I don't really like using a web-interface to fetch data, and here the most ennoying part is that you end up with a regular Entrez/GenBank webpage: select your format, export to file, chosse file name... and have to do it many times.
It is too much prone to human and web-browser errors for my taste, but it worked.
Nevertheless there is some caveats: 
- some downloaded files were incomplete (~10%) and you have to restart it
- you can't submit several lists in the same time (otherwise the same cookie will be used and you'll end up with several identical files) 

-Tristan

On Tuesday 29 January 2008 13:44:16 you wrote:
> Forgot about that one; it's definitely a better way to do it if you
> have the GI/accessions.
>
> chris
>
> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
> > you don't need to use bioperl to accomplish this task, to download
> > several thousand sequences based on accession ID list.
> >
> > NCBI batch Entrez can do that:
> > http://www.ncbi.nlm.nih.gov/sites/batchentrez
> >
> > just submit a large list of IDs, select database, and download.
> >
> > you can submit ~50,000 IDs in one file usually without problems.
> > it may not return results if a list is larger than ~100,000 IDs
> >
> > --
> > Alexander Kozik
> > Bioinformatics Specialist
> > Genome and Biomedical Sciences Facility
> > 451 Health Sciences Drive
> > Genome Center, 4-th floor, room 4302
> > University of California
> > Davis, CA 95616-8816
> > Phone: (530) 754-9127
> > email#1: akozik at atgc.org
> > email#2: akozik at gmail.com
> > web: http://www.atgc.org/
> >
> > Chris Fields wrote:
> >> Yes, you can only retrieve ~500 sequences at a time using either
> >> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
> >> interact with NCBI's EUtilities (the former module returns raw data
> >> from the URL to be processed later, the latter module returns
> >> Bio::Seq/Bio::SeqIO objects).
> >> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
> >>atasets You can usually post more IDs using epost and fetch sequence
> >> referring to the WebEnv/key combo (batch posting).  I try to make
> >> this a bit easier with EUtilities but it is woefully lacking in
> >> documentation (my fault), but there is some code up on the wiki
> >> which should work.
> >> chris
> >>
> >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> >>> Hello,
> >>>
> >>> I would like to download a large number of sequences from GenBank
> >>> (122,146 to be exact) following a list of accession numbers.
> >>> I first investigated around Bio::DB::EUtilities, but got lost and
> >>> finally used Bio::DB::GenBank.
> >>> My script works well for short request, but it gives the following
> >>> error with the long request:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: WebDBSeqI Request Error:
> >>> 500 short write
> >>> Content-Type: text/plain
> >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> >>> Client-Warning: Internal response
> >>>
> >>> 500 short write
> >>>
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/
> >>> Root.pm:359
> >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
> >>> Bio/DB/WebDBSeqI.pm:685
> >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
> >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
> >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> >>> STACK: ./fetch_from_genbank.pl:58
> >>> ---------------------------------------------------------
> >>>
> >>> Does that mean that we can only fetch 500 sequences at a time?
> >>> Should I split my list in 500 ids framents and submit them one
> >>> after the other?
> >>>
> >>> Any suggestions very welcomed...
> >>> Thanks,
> >>> -Tristan
> >>>
> >>>
> >>> Here is the script:
> >>>
> >>> ##################################
> >>> use strict;
> >>> use warnings;
> >>> use Bio::DB::GenBank;
> >>> # use Bio::DB::EUtilities;
> >>> use Bio::SeqIO;
> >>> use Getopt::Long;
> >>>
> >>> # 2008-01-22 T Lefebure
> >>> # I tried to use Bio::DB::EUtilities without much succes and get
> >>> back to Bio::DB::GenBank.
> >>> # The following procedure is not really good as the stream is
> >>> first copied to a temporary file,
> >>> # and than re-used by BioPerl to generate the final file.
> >>>
> >>> my $db = 'nucleotide';
> >>> my $format = 'genbank';
> >>> my $help= '';
> >>> my $dformat = 'gb';
> >>>
> >>> GetOptions(
> >>>    'help|?' => \$help,
> >>>    'format=s'  => \$format,
> >>>    'database=s'    => \$db,
> >>> );
> >>>
> >>>
> >>> my $printhelp = "\nUsage: $0 [options]  
> >>>
> >>> Will download the corresponding data from GenBank. BioPerl is
> >>> required.
> >>>
> >>> Options:
> >>>    -h
> >>>        print this help
> >>>    -format: genbank|fasta|...
> >>>        give output format (default=genbank)
> >>>    -database: nucleotide|genome|protein|...
> >>>        define the database to search in (default=nucleotide)
> >>>
> >>> The full description of the options can be find at
> >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
> >>> \n";
> >>>
> >>> if ($#ARGV<1) {
> >>>    print $printhelp;
> >>>    exit;
> >>> }
> >>>
> >>> open LIST, $ARGV[0];
> >>> my @list = ;
> >>>
> >>> if ($format eq 'fasta') { $dformat = 'fasta' }
> >>>
> >>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
> >>>                -format => $dformat,
> >>>                -db => $db,
> >>>            );
> >>> my $seqio = $gb->get_Stream_by_acc(\@list);
> >>>
> >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> >>>                -format => $format,
> >>>            );
> >>> while (my $seqo = $seqio->next_seq ) {
> >>>    print $seqo->id, "\n";
> >>>    $seqout->write_seq($seqo);
> >>> }
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign



From cjfields at uiuc.edu  Wed Jan 30 10:10:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 09:10:14 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: <7143A650-AA84-4331-B55A-A66C3F5BBAB0@uiuc.edu>

You can use an eval {} block to catch the error, then redo the loop  
(so you don't iterate to the next block) or use next and skip the  
current block if an error occurs.  If you use redo then you should use  
a counter to exit the loop after several tries.

chris

On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote:

> Thank you both!
>
> Just in case it might be usefull for someone else, here are my  
> ramblings:
>
> 1. I first tried to adapt my script and fetch 500 sequences at a  
> time. It works, except that ~40% of the time NCBI gives the  
> following error and my script crashed:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>    The proxy server received an invalid
>    response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
>
> I tried to modify the script so that when the retrieval of a 500  
> sequence block crashes, it continues with the other blocks, but I  
> was unsuccessfull. It probably needs some better understanding of  
> BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
>
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
>
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 	
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
>
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
>
>
> 2. So I moved to the second solution and tried batchentrez. I cut my  
> 120,000 long AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
>
> and then submitted the 13 lists one by one. I must say that I don't  
> really like using a web-interface to fetch data, and here the most  
> ennoying part is that you end up with a regular Entrez/GenBank  
> webpage: select your format, export to file, chosse file name... and  
> have to do it many times.
> It is too much prone to human and web-browser errors for my taste,  
> but it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to  
> restart it
> - you can't submit several lists in the same time (otherwise the  
> same cookie will be used and you'll end up with several identical  
> files)
>
> -Tristan
>
> On Tuesday 29 January 2008 13:44:16 you wrote:
>> Forgot about that one; it's definitely a better way to do it if you
>> have the GI/accessions.
>>
>> chris
>>
>> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
>>> you don't need to use bioperl to accomplish this task, to download
>>> several thousand sequences based on accession ID list.
>>>
>>> NCBI batch Entrez can do that:
>>> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>>>
>>> just submit a large list of IDs, select database, and download.
>>>
>>> you can submit ~50,000 IDs in one file usually without problems.
>>> it may not return results if a list is larger than ~100,000 IDs
>>>
>>> --
>>> Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 Health Sciences Drive
>>> Genome Center, 4-th floor, room 4302
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>> Chris Fields wrote:
>>>> Yes, you can only retrieve ~500 sequences at a time using either
>>>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
>>>> interact with NCBI's EUtilities (the former module returns raw data
>>>> from the URL to be processed later, the latter module returns
>>>> Bio::Seq/Bio::SeqIO objects).
>>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
>>>> atasets You can usually post more IDs using epost and fetch  
>>>> sequence
>>>> referring to the WebEnv/key combo (batch posting).  I try to make
>>>> this a bit easier with EUtilities but it is woefully lacking in
>>>> documentation (my fault), but there is some code up on the wiki
>>>> which should work.
>>>> chris
>>>>
>>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to download a large number of sequences from GenBank
>>>>> (122,146 to be exact) following a list of accession numbers.
>>>>> I first investigated around Bio::DB::EUtilities, but got lost and
>>>>> finally used Bio::DB::GenBank.
>>>>> My script works well for short request, but it gives the following
>>>>> error with the long request:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: WebDBSeqI Request Error:
>>>>> 500 short write
>>>>> Content-Type: text/plain
>>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>>>> Client-Warning: Internal response
>>>>>
>>>>> 500 short write
>>>>>
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ 
>>>>> Root/
>>>>> Root.pm:359
>>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
>>>>> Bio/DB/WebDBSeqI.pm:685
>>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
>>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
>>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>>>> STACK: ./fetch_from_genbank.pl:58
>>>>> ---------------------------------------------------------
>>>>>
>>>>> Does that mean that we can only fetch 500 sequences at a time?
>>>>> Should I split my list in 500 ids framents and submit them one
>>>>> after the other?
>>>>>
>>>>> Any suggestions very welcomed...
>>>>> Thanks,
>>>>> -Tristan
>>>>>
>>>>>
>>>>> Here is the script:
>>>>>
>>>>> ##################################
>>>>> use strict;
>>>>> use warnings;
>>>>> use Bio::DB::GenBank;
>>>>> # use Bio::DB::EUtilities;
>>>>> use Bio::SeqIO;
>>>>> use Getopt::Long;
>>>>>
>>>>> # 2008-01-22 T Lefebure
>>>>> # I tried to use Bio::DB::EUtilities without much succes and get
>>>>> back to Bio::DB::GenBank.
>>>>> # The following procedure is not really good as the stream is
>>>>> first copied to a temporary file,
>>>>> # and than re-used by BioPerl to generate the final file.
>>>>>
>>>>> my $db = 'nucleotide';
>>>>> my $format = 'genbank';
>>>>> my $help= '';
>>>>> my $dformat = 'gb';
>>>>>
>>>>> GetOptions(
>>>>>   'help|?' => \$help,
>>>>>   'format=s'  => \$format,
>>>>>   'database=s'    => \$db,
>>>>> );
>>>>>
>>>>>
>>>>> my $printhelp = "\nUsage: $0 [options]   
>>>>> 
>>>>>
>>>>> Will download the corresponding data from GenBank. BioPerl is
>>>>> required.
>>>>>
>>>>> Options:
>>>>>   -h
>>>>>       print this help
>>>>>   -format: genbank|fasta|...
>>>>>       give output format (default=genbank)
>>>>>   -database: nucleotide|genome|protein|...
>>>>>       define the database to search in (default=nucleotide)
>>>>>
>>>>> The full description of the options can be find at
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ 
>>>>> efetchseq_help.html
>>>>> \n";
>>>>>
>>>>> if ($#ARGV<1) {
>>>>>   print $printhelp;
>>>>>   exit;
>>>>> }
>>>>>
>>>>> open LIST, $ARGV[0];
>>>>> my @list = ;
>>>>>
>>>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>>>
>>>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>>>               -format => $dformat,
>>>>>               -db => $db,
>>>>>           );
>>>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>>>
>>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>>>               -format => $format,
>>>>>           );
>>>>> while (my $seqo = $seqio->next_seq ) {
>>>>>   print $seqo->id, "\n";
>>>>>   $seqout->write_seq($seqo);
>>>>> }
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From snoze.pa at gmail.com  Wed Jan 30 12:34:24 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 11:34:24 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
	<10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
	<31534016-91B3-45C0-995D-CE5A82466303@gmx.net>
Message-ID: <10f848910801300934q57e5d45cpbf0e17b45640e3f9@mail.gmail.com>

Hilmar,

The command I am using is following

load_seqdatabase.pl -host localhost -namespace bioperl -dbname bioseqdb
-dbuser root -format genbank sequences.txt

I have no idea why i am getting that error

thanks in advance


On Jan 29, 2008 3:40 PM, Hilmar Lapp  wrote:

> This would mean that two or more seqfeatures with the same type for
> the same sequence exist in the input data, each with rank 1.
>
> Normally the rank will be incremented for each seqfeature of a
> sequence, so I'm not sure how this is happening here w/o seeing the
> data.
>
>        -hilmar
> On Jan 29, 2008, at 12:15 PM, snoze pa wrote:
>
> > Dear Users,
> > I tried the to refresh installation and seems it is working. But
> > when I
> > loading sequences then it is giving me following warning messages.
> > Am i
> > doing alright? or i am missing huge chunk of sequences..Thanks in
> > advance
> > s
> >
> > -------------------- WARNING ---------------------
> > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,
> > values
> > were ("","1") FKs (27,3,4)
> > Duplicate entry '27-3-4-1' for key 2
> > ---------------------------------------------------
> > ...
> > ...
> > and so on
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>

From snoze.pa at gmail.com  Wed Jan 30 13:01:46 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 12:01:46 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: <10f848910801301001k681e1291we0ce468e96d88f57@mail.gmail.com>

U can use LWP one line code to grab sequences..

On Jan 29, 2008 11:19 AM, Tristan Lefebure 
wrote:

> Hello,
>
> I would like to download a large number of sequences from GenBank (122,146
> to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and finally
> used Bio::DB::GenBank.
> My script works well for short request, but it gives the following error
> with the long request:
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after the
> other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get back to
> Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first copied
> to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
>        'help|?' => \$help,
>        'format=s'  => \$format,
>        'database=s'    => \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is required.
>
> Options:
>        -h
>                print this help
>        -format: genbank|fasta|...
>                give output format (default=genbank)
>        -database: nucleotide|genome|protein|...
>                define the database to search in (default=nucleotide)
>
> The full description of the options can be find at
> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n
> ";
>
> if ($#ARGV<1) {
>        print $printhelp;
>        exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(  -retrievaltype => 'tempfile',
>                                -format => $dformat,
>                                -db => $db,
>                        );
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>                                -format => $format,
>                        );
> while (my $seqo = $seqio->next_seq ) {
>        print $seqo->id, "\n";
>        $seqout->write_seq($seqo);
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From snoze.pa at gmail.com  Wed Jan 30 13:38:12 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 12:38:12 -0600
Subject: [Bioperl-l] load_seqdatabase help
Message-ID: <10f848910801301038t1ae296c2o2453728b68dc81f8@mail.gmail.com>

Dear User,
 Is there any alternative way so that I can load following sequence in to
biosql schema. I am trying to use load_seqdatabase.pl but it is not working
in my case and showing numbers of warning/error messages.. I did everything
but unable to load it yet.

http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb



Any help, if i can load above sequence into my bioseqdb database.

Thanks in advance
s

From snoze.pa at gmail.com  Wed Jan 30 14:30:22 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 13:30:22 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
Message-ID: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>

Hi Hilmar,

 After spending lots of time i figure out the error. I am able to load
sequences if the sequences do not have following entry

xrefs (non-sequence databases):

If the Genbank sequence have this entry then script load_seqdatabase.pl is
crashing. I try it in couple of sequences and found it is the culprit line
genbank format.  But this line is important as it contain lots of
information... so I am wondering how to solve this problem

Any help?

Thanks in advance
s

From Russell.Smithies at agresearch.co.nz  Wed Jan 30 14:34:44 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 31 Jan 2008 08:34:44 +1300
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com><479F7149.1010203@atgc.org>
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: 

Take a look at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi
Ebot is an interactive tool that generates a Perl script that implements
an E-utility pipeline.
You can probably hack the resulting script to introduce the required
BioPerly bits.

Russell Smithies 

Bioinformatics Software Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Tristan Lefebure
> Sent: Thursday, 31 January 2008 3:56 a.m.
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::GenBank and large number of requests
> 
> Thank you both!
> 
> Just in case it might be usefull for someone else, here are my
ramblings:
> 
> 1. I first tried to adapt my script and fetch 500 sequences at a time.
It works,
> except that ~40% of the time NCBI gives the following error and my
script crashed:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>     The proxy server received an invalid
>     response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
> 
> I tried to modify the script so that when the retrieval of a 500
sequence block
> crashes, it continues with the other blocks, but I was unsuccessfull.
It probably
> needs some better understanding of BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
> 
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
> 
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
> 
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
> 
> 
> 2. So I moved to the second solution and tried batchentrez. I cut my
120,000 long
> AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
> 
> and then submitted the 13 lists one by one. I must say that I don't
really like using
> a web-interface to fetch data, and here the most ennoying part is that
you end up
> with a regular Entrez/GenBank webpage: select your format, export to
file, chosse
> file name... and have to do it many times.
> It is too much prone to human and web-browser errors for my taste, but
it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to restart
it
> - you can't submit several lists in the same time (otherwise the same
cookie will be
> used and you'll end up with several identical files)
> 
> -Tristan
> 
> On Tuesday 29 January 2008 13:44:16 you wrote:
> > Forgot about that one; it's definitely a better way to do it if you
> > have the GI/accessions.
> >
> > chris
> >
> > On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
> > > you don't need to use bioperl to accomplish this task, to download
> > > several thousand sequences based on accession ID list.
> > >
> > > NCBI batch Entrez can do that:
> > > http://www.ncbi.nlm.nih.gov/sites/batchentrez
> > >
> > > just submit a large list of IDs, select database, and download.
> > >
> > > you can submit ~50,000 IDs in one file usually without problems.
> > > it may not return results if a list is larger than ~100,000 IDs
> > >
> > > --
> > > Alexander Kozik
> > > Bioinformatics Specialist
> > > Genome and Biomedical Sciences Facility
> > > 451 Health Sciences Drive
> > > Genome Center, 4-th floor, room 4302
> > > University of California
> > > Davis, CA 95616-8816
> > > Phone: (530) 754-9127
> > > email#1: akozik at atgc.org
> > > email#2: akozik at gmail.com
> > > web: http://www.atgc.org/
> > >
> > > Chris Fields wrote:
> > >> Yes, you can only retrieve ~500 sequences at a time using either
> > >> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
> > >> interact with NCBI's EUtilities (the former module returns raw
data
> > >> from the URL to be processed later, the latter module returns
> > >> Bio::Seq/Bio::SeqIO objects).
> > >>
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
> > >>atasets You can usually post more IDs using epost and fetch
sequence
> > >> referring to the WebEnv/key combo (batch posting).  I try to make
> > >> this a bit easier with EUtilities but it is woefully lacking in
> > >> documentation (my fault), but there is some code up on the wiki
> > >> which should work.
> > >> chris
> > >>
> > >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> > >>> Hello,
> > >>>
> > >>> I would like to download a large number of sequences from
GenBank
> > >>> (122,146 to be exact) following a list of accession numbers.
> > >>> I first investigated around Bio::DB::EUtilities, but got lost
and
> > >>> finally used Bio::DB::GenBank.
> > >>> My script works well for short request, but it gives the
following
> > >>> error with the long request:
> > >>>
> > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>> MSG: WebDBSeqI Request Error:
> > >>> 500 short write
> > >>> Content-Type: text/plain
> > >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> > >>> Client-Warning: Internal response
> > >>>
> > >>> 500 short write
> > >>>
> > >>> STACK: Error::throw
> > >>> STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/
> > >>> Root.pm:359
> > >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
> > >>> Bio/DB/WebDBSeqI.pm:685
> > >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
> > >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> > >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
> > >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> > >>> STACK: ./fetch_from_genbank.pl:58
> > >>> ---------------------------------------------------------
> > >>>
> > >>> Does that mean that we can only fetch 500 sequences at a time?
> > >>> Should I split my list in 500 ids framents and submit them one
> > >>> after the other?
> > >>>
> > >>> Any suggestions very welcomed...
> > >>> Thanks,
> > >>> -Tristan
> > >>>
> > >>>
> > >>> Here is the script:
> > >>>
> > >>> ##################################
> > >>> use strict;
> > >>> use warnings;
> > >>> use Bio::DB::GenBank;
> > >>> # use Bio::DB::EUtilities;
> > >>> use Bio::SeqIO;
> > >>> use Getopt::Long;
> > >>>
> > >>> # 2008-01-22 T Lefebure
> > >>> # I tried to use Bio::DB::EUtilities without much succes and get
> > >>> back to Bio::DB::GenBank.
> > >>> # The following procedure is not really good as the stream is
> > >>> first copied to a temporary file,
> > >>> # and than re-used by BioPerl to generate the final file.
> > >>>
> > >>> my $db = 'nucleotide';
> > >>> my $format = 'genbank';
> > >>> my $help= '';
> > >>> my $dformat = 'gb';
> > >>>
> > >>> GetOptions(
> > >>>    'help|?' => \$help,
> > >>>    'format=s'  => \$format,
> > >>>    'database=s'    => \$db,
> > >>> );
> > >>>
> > >>>
> > >>> my $printhelp = "\nUsage: $0 [options] 

> > >>>
> > >>> Will download the corresponding data from GenBank. BioPerl is
> > >>> required.
> > >>>
> > >>> Options:
> > >>>    -h
> > >>>        print this help
> > >>>    -format: genbank|fasta|...
> > >>>        give output format (default=genbank)
> > >>>    -database: nucleotide|genome|protein|...
> > >>>        define the database to search in (default=nucleotide)
> > >>>
> > >>> The full description of the options can be find at
> > >>>
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
> > >>> \n";
> > >>>
> > >>> if ($#ARGV<1) {
> > >>>    print $printhelp;
> > >>>    exit;
> > >>> }
> > >>>
> > >>> open LIST, $ARGV[0];
> > >>> my @list = ;
> > >>>
> > >>> if ($format eq 'fasta') { $dformat = 'fasta' }
> > >>>
> > >>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
> > >>>                -format => $dformat,
> > >>>                -db => $db,
> > >>>            );
> > >>> my $seqio = $gb->get_Stream_by_acc(\@list);
> > >>>
> > >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> > >>>                -format => $format,
> > >>>            );
> > >>> while (my $seqo = $seqio->next_seq ) {
> > >>>    print $seqo->id, "\n";
> > >>>    $seqout->write_seq($seqo);
> > >>> }
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher
> > >> Lab of Dr. Robert Switzer
> > >> Dept of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Wed Jan 30 15:04:18 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 14:04:18 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
Message-ID: <0BA39C27-1871-441B-B2DE-F7FECF8570D7@uiuc.edu>

Sounds like a bug in the GenBank parser.  Could you post a bug report  
with an example sequence record and your script?

http://bugzilla.open-bio.org/

chris

On Jan 30, 2008, at 1:30 PM, snoze pa wrote:

> Hi Hilmar,
>
> After spending lots of time i figure out the error. I am able to load
> sequences if the sequences do not have following entry
>
> xrefs (non-sequence databases):
>
> If the Genbank sequence have this entry then script  
> load_seqdatabase.pl is
> crashing. I try it in couple of sequences and found it is the  
> culprit line
> genbank format.  But this line is important as it contain lots of
> information... so I am wondering how to solve this problem
>
> Any help?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From cjfields at uiuc.edu  Wed Jan 30 15:42:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 14:42:14 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: <29768205-F511-4EDB-84D2-BCC36DBA92C7@uiuc.edu>

When using Bio::DB::EUtilities (from bioperl-live) this works for me:

use Bio::DB::EUtilities;

# get array of IDs somehow, in @ids

my ($start, $chunk, $last) = (0, 100, $#ids);

my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
                      -db => 'protein',
                      -rettype => 'genbank');

my $ct = 1; # used to denote separate files
my $tries = 0; # server attempts

while ($start < $last) {
     # want seqs in chunk size of 100 (set above)
     my $end = ($start + $chunk - 1 ) < $last ? ($start + $chunk -  
1) : $last;
     # grab slice of IDs
     my @sub = @ids[$start..$end];

     # pass to agent
     $factory->set_parameters(-id => \@sub );

     eval {
         # check server response, if good send to file
         $factory->get_Response(-file => ">seqs_$ct.gb");
     };

     # ERROR!
     if ($@) {
         $tries++;
         if ($tries <= 10) {
             warn("Server problem on attempt $tries:$@.\nTrying  
again...");
             redo;
         } else {
             die("Repeated server issues after $tries attempts.");
             # could warn and just skip this batch of accs using 'next'
         }
     }

     $start = $end+1;
     $ct++;
     $tries = 0;
}



chris

On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote:

> Thank you both!
>
> Just in case it might be usefull for someone else, here are my  
> ramblings:
>
> 1. I first tried to adapt my script and fetch 500 sequences at a  
> time. It works, except that ~40% of the time NCBI gives the  
> following error and my script crashed:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>    The proxy server received an invalid
>    response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
>
> I tried to modify the script so that when the retrieval of a 500  
> sequence block crashes, it continues with the other blocks, but I  
> was unsuccessfull. It probably needs some better understanding of  
> BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
>
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
>
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 	
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
>
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
>
>
> 2. So I moved to the second solution and tried batchentrez. I cut my  
> 120,000 long AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
>
> and then submitted the 13 lists one by one. I must say that I don't  
> really like using a web-interface to fetch data, and here the most  
> ennoying part is that you end up with a regular Entrez/GenBank  
> webpage: select your format, export to file, chosse file name... and  
> have to do it many times.
> It is too much prone to human and web-browser errors for my taste,  
> but it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to  
> restart it
> - you can't submit several lists in the same time (otherwise the  
> same cookie will be used and you'll end up with several identical  
> files)
>
> -Tristan
>
> On Tuesday 29 January 2008 13:44:16 you wrote:
>> Forgot about that one; it's definitely a better way to do it if you
>> have the GI/accessions.
>>
>> chris
>>
>> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
>>> you don't need to use bioperl to accomplish this task, to download
>>> several thousand sequences based on accession ID list.
>>>
>>> NCBI batch Entrez can do that:
>>> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>>>
>>> just submit a large list of IDs, select database, and download.
>>>
>>> you can submit ~50,000 IDs in one file usually without problems.
>>> it may not return results if a list is larger than ~100,000 IDs
>>>
>>> --
>>> Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 Health Sciences Drive
>>> Genome Center, 4-th floor, room 4302
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>> Chris Fields wrote:
>>>> Yes, you can only retrieve ~500 sequences at a time using either
>>>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
>>>> interact with NCBI's EUtilities (the former module returns raw data
>>>> from the URL to be processed later, the latter module returns
>>>> Bio::Seq/Bio::SeqIO objects).
>>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
>>>> atasets You can usually post more IDs using epost and fetch  
>>>> sequence
>>>> referring to the WebEnv/key combo (batch posting).  I try to make
>>>> this a bit easier with EUtilities but it is woefully lacking in
>>>> documentation (my fault), but there is some code up on the wiki
>>>> which should work.
>>>> chris
>>>>
>>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to download a large number of sequences from GenBank
>>>>> (122,146 to be exact) following a list of accession numbers.
>>>>> I first investigated around Bio::DB::EUtilities, but got lost and
>>>>> finally used Bio::DB::GenBank.
>>>>> My script works well for short request, but it gives the following
>>>>> error with the long request:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: WebDBSeqI Request Error:
>>>>> 500 short write
>>>>> Content-Type: text/plain
>>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>>>> Client-Warning: Internal response
>>>>>
>>>>> 500 short write
>>>>>
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ 
>>>>> Root/
>>>>> Root.pm:359
>>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
>>>>> Bio/DB/WebDBSeqI.pm:685
>>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
>>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
>>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>>>> STACK: ./fetch_from_genbank.pl:58
>>>>> ---------------------------------------------------------
>>>>>
>>>>> Does that mean that we can only fetch 500 sequences at a time?
>>>>> Should I split my list in 500 ids framents and submit them one
>>>>> after the other?
>>>>>
>>>>> Any suggestions very welcomed...
>>>>> Thanks,
>>>>> -Tristan
>>>>>
>>>>>
>>>>> Here is the script:
>>>>>
>>>>> ##################################
>>>>> use strict;
>>>>> use warnings;
>>>>> use Bio::DB::GenBank;
>>>>> # use Bio::DB::EUtilities;
>>>>> use Bio::SeqIO;
>>>>> use Getopt::Long;
>>>>>
>>>>> # 2008-01-22 T Lefebure
>>>>> # I tried to use Bio::DB::EUtilities without much succes and get
>>>>> back to Bio::DB::GenBank.
>>>>> # The following procedure is not really good as the stream is
>>>>> first copied to a temporary file,
>>>>> # and than re-used by BioPerl to generate the final file.
>>>>>
>>>>> my $db = 'nucleotide';
>>>>> my $format = 'genbank';
>>>>> my $help= '';
>>>>> my $dformat = 'gb';
>>>>>
>>>>> GetOptions(
>>>>>   'help|?' => \$help,
>>>>>   'format=s'  => \$format,
>>>>>   'database=s'    => \$db,
>>>>> );
>>>>>
>>>>>
>>>>> my $printhelp = "\nUsage: $0 [options]   
>>>>> 
>>>>>
>>>>> Will download the corresponding data from GenBank. BioPerl is
>>>>> required.
>>>>>
>>>>> Options:
>>>>>   -h
>>>>>       print this help
>>>>>   -format: genbank|fasta|...
>>>>>       give output format (default=genbank)
>>>>>   -database: nucleotide|genome|protein|...
>>>>>       define the database to search in (default=nucleotide)
>>>>>
>>>>> The full description of the options can be find at
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ 
>>>>> efetchseq_help.html
>>>>> \n";
>>>>>
>>>>> if ($#ARGV<1) {
>>>>>   print $printhelp;
>>>>>   exit;
>>>>> }
>>>>>
>>>>> open LIST, $ARGV[0];
>>>>> my @list = ;
>>>>>
>>>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>>>
>>>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>>>               -format => $dformat,
>>>>>               -db => $db,
>>>>>           );
>>>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>>>
>>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>>>               -format => $format,
>>>>>           );
>>>>> while (my $seqo = $seqio->next_seq ) {
>>>>>   print $seqo->id, "\n";
>>>>>   $seqout->write_seq($seqo);
>>>>> }
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From georg.otto at tuebingen.mpg.de  Thu Jan 31 04:34:31 2008
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Thu, 31 Jan 2008 10:34:31 +0100
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: 

Hi,

I succeeded with a similar task using the seqhound database. I had a
list of > 200,000 gid numbers, but I guess it can work in a similar
fashion using accession numbers. Here is the script:

#!/usr/perl

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::Query::GenBank;
use Bio::DB::SeqHound;

my $sh = new Bio::DB::SeqHound();

my($USAGE) = "$0 id_file\n\n";

unless(@ARGV) {
	print $USAGE;
	exit;
}

my $id_file = $ARGV[0];

open ID_FILE, "<$id_file" or die "error: $!";

while () {
  chomp;
  my $id = $_;
  if (defined(my $seq_obj = $sh->get_Seq_by_gi($id))) {
    my $out = Bio::SeqIO->new(-format => 'fasta');
    $out->write_seq($seq_obj);
  } else {
    next;
  }
}


Best,

Georg


Tristan Lefebure  writes:

> Hello,
>
> I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. 
> My script works well for short request, but it gives the following error with the long request:
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after the other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first copied to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
> 	'help|?' => \$help,
> 	'format=s'  => \$format,
> 	'database=s'	=> \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is required.
>
> Options:
> 	-h
> 		print this help
> 	-format: genbank|fasta|...
> 		give output format (default=genbank)
> 	-database: nucleotide|genome|protein|...
> 		define the database to search in (default=nucleotide)
>
> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";
>
> if ($#ARGV<1) {
> 	print $printhelp;
> 	exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 				-format => $dformat,
> 				-db => $db,
> 			);
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> 				-format => $format,
> 			);
> while (my $seqo = $seqio->next_seq ) {
> 	print $seqo->id, "\n";
> 	$seqout->write_seq($seqo);
> }


From bernd.web at gmail.com  Thu Jan 31 05:48:15 2008
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 31 Jan 2008 11:48:15 +0100
Subject: [Bioperl-l] searchio/blast
Message-ID: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>

Hi,

I noticed that the HTMLWriter output for a BLAST report may not be
correct if more than one sequence was "blasted".

After the BLAST report of the first sequence the report is ended with:
Search Parameters
Parameter	Value

Search Statistics
Statistic	Value

Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on
Thu Jan 31 11:35:51 2008
Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37 tseemann Exp $

Then the second HTML blast report follows.
Although maybe generally 1 sequence is blasted by a user requiring
HTML output, this may be nice to fix?
Also for the HTML Writer of FastA reports the statistics section is empty,

An additional issue with HTMLWriter  containing more than 1 BLAST
report is the following:
When a sequence ID occurs more than once, the link (on the E-value) is
to the first occurrence since it is not report specific.

In case the above is regarded as unwanted, I'd be happy to make a
concise example with code.


Best regards,
Bernd

From cjfields at uiuc.edu  Thu Jan 31 07:39:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 31 Jan 2008 06:39:46 -0600
Subject: [Bioperl-l] searchio/blast
In-Reply-To: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>
References: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>
Message-ID: 

The easiest way to take care of these (so we don't forget about them  
and can track changes) is to add them as BioPerl bugs/enhancement  
requests to bugzilla, along with example reports and code.

chris

On Jan 31, 2008, at 4:48 AM, Bernd Web wrote:

> Hi,
>
> I noticed that the HTMLWriter output for a BLAST report may not be
> correct if more than one sequence was "blasted".
>
> After the BLAST report of the first sequence the report is ended with:
> Search Parameters
> Parameter	Value
>
> Search Statistics
> Statistic	Value
>
> Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on
> Thu Jan 31 11:35:51 2008
> Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37  
> tseemann Exp $
>
> Then the second HTML blast report follows.
> Although maybe generally 1 sequence is blasted by a user requiring
> HTML output, this may be nice to fix?
> Also for the HTML Writer of FastA reports the statistics section is  
> empty,
>
> An additional issue with HTMLWriter  containing more than 1 BLAST
> report is the following:
> When a sequence ID occurs more than once, the link (on the E-value) is
> to the first occurrence since it is not report specific.
>
> In case the above is regarded as unwanted, I'd be happy to make a
> concise example with code.
>
>
> Best regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From hlapp at gmx.net  Thu Jan 31 08:12:25 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 31 Jan 2008 08:12:25 -0500
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
Message-ID: 


On Jan 30, 2008, at 2:30 PM, snoze pa wrote:

> Hi Hilmar,
>
>  After spending lots of time i figure out the error. I am able to load
> sequences if the sequences do not have following entry
>
> xrefs (non-sequence databases):

Is this the literal value? I am asking because I can't find this in  
the file at

http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb

which you said was giving you grief. So does the genbank file above  
now load, or how can I identify the critical line in there?

	-hilmar
>
> If the Genbank sequence have this entry then script  
> load_seqdatabase.pl is
> crashing. I try it in couple of sequences and found it is the  
> culprit line
> genbank format.  But this line is important as it contain lots of
> information... so I am wondering how to solve this problem
>
> Any help?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






From snoze.pa at gmail.com  Thu Jan 31 13:46:24 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Thu, 31 Jan 2008 12:46:24 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: 
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
Message-ID: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>

The link i sent was related to my tutorial. I was following that website.
The typical example is one of the following which have *xrefs (non-sequence
databases): line.
thanks
s
*
LOCUS       P27912                   792 aa            linear   VRL
15-JAN-2008
DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
            protein); prM; Peptide pr; Small envelope protein M (Matrix
            protein); Envelope protein E; Non-structural protein 1 (NS1)].
ACCESSION   P27912
VERSION     P27912.1  GI:130422
DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
            class: standard.
            created: Aug 1, 1992.
            sequence updated: Aug 1, 1992.
            annotation updated: Jan 15, 2008.
            xrefs: D00502.1, BAA00394.1, B32401
            *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
            GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
            InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
            InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:2.60.98.10,
            Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
Pfam:PF00869,
            Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
KEYWORDS    Capsid protein; Cleavage on pair of basic residues; Endoplasmic
            reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
            Transmembrane; Viral nucleoprotein; Virion.
SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
  ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
            Viruses; ssRNA positive-strand viruses, no DNA stage;
Flaviviridae;
            Flavivirus; Dengue virus group.
REFERENCE   1  (residues 1 to 792)
  AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
  TITLE     Genetic relatedness among structural protein genes of dengue 1
            virus strains
  JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
   PUBMED   2738579
  REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
            [FUNCTION] Protein C packages viral RNA to form a viral
            nucleocapsid, and promotes virion budding (By similarity).
            [FUNCTION] prM acts as a chaperone for envelope protein E during
            intracellular virion assembly by masking and inactivating
envelope
            protein E fusion peptide. prM is matured in the last step of
virion
            assembly, presumably to avoid catastrophic activation of the
viral
            fusion peptide induced by the acidic pH of the trans-Golgi
network.
            After cleavage by host furin, the pr peptide is released in the
            extracellular medium and small envelope protein M and envelope
            protein E homodimers are dissociated (By similarity).
            [FUNCTION] Envelope protein E binds cell surface receptor and is
            involved in membrane fusion between virion and target cell.
            Synthesized as an homodimer with prM which acts as a chaperone
for
            envelope protein E. After cleavage of prM, envelope protein E
            dissociate from small envelope protein M and homodimerizes (By
            similarity).
            [FUNCTION] Non-structural protein 1 is slowly secreted from
            mammalian cells, but not from mosquito cells. Secreted form
elicits
            protective immune response and plays an essential role in RNA
            replication. Soluble and membrane-associated NS1 may activate
human
            complement and induce host vascular leakage. This effect might
            explain the clinical manifestations of dengue hemorrhagic fever
and
            dengue shock syndrome (By similarity).
            [SUBUNIT] prM and envelope protein E form heterodimers in the
            endoplasmic reticulum and Golgi. Envelope protein E forms
            homodimers. NS1 forms homodimers as well as homohexamers when
            secreted. NS1 may interact with NS4A (By similarity).
            [SUBCELLULAR LOCATION] Note=The virion is assembled in the
            endoplasmic reticulum lumen, transported by vesicles to the
Golgi,
            then transported again to the cell membrane where it is released
            outside the cell.
            [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
            [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
            [SUBCELLULAR LOCATION] Small envelope protein M: Virion
membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
            Endoplasmic reticulum membrane; Peripheral membrane protein;
            Lumenal side (By similarity).
            [DOMAIN] Transmembrane domains of the small envelope protein M
and
            envelope protein E contains an endoplasmic reticulum retention
            signals (By similarity).
            [PTM] Specific enzymatic cleavages in vivo yield mature
proteins.
            The nascent protein C contains a C-terminal hydrophobic domain
that
            act as a signal sequence for translocation of prM into the lumen
of
            the ER. Mature protein C is cleaved at a site upstream of this
            hydrophobic domain by NS3. prM is cleaved in post-Golgi vesicles
by
            a host furin, releasing the mature small envelope protein M, and
            peptide pr (By similarity).
            [PTM] Envelope protein E and non-structural protein 1 are
            N-glycosylated (By similarity).
FEATURES             Location/Qualifiers
     source          1..792
                     /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
                     /specific_host="Aedes aegypti (Yellowfever mosquito)"
                     /specific_host="Homo sapiens (Human)"
                     /db_xref="taxon:11057"
     Protein         1..>792
                     /product="Genome polyprotein [Contains: Protein C"
     Region          1..101
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          1..100
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Protein C. /FTId=PRO_0000037884."
     Region          5..114
                     /region_name="Flavi_capsid"
                     /note="Flavivirus capsid protein C. Flaviviruses are
small
                     enveloped viruses with virions comprised of 3 proteins
                     called C, M and E. Multiple copies of the C protein
form
                     the nucleocapsid, which contains the ssRNA molecule;
                     pfam01003"
                     /db_xref="CDD:85176"
     Site            100..101
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by serine protease NS3 (By
similarity)."
     Region          101..114
                     /region_name="Propeptide"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="ER anchor for the protein C, removed in mature
form
                     by serine protease NS3. /FTId=PRO_0000037885."
     Region          102..122
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            114..115
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          115..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="prM. /FTId=PRO_0000264649."
     Region          115..205
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Peptide pr. /FTId=PRO_0000264650."
     Region          119..204
                     /region_name="Flavi_propep"
                     /note="Flavivirus polyprotein propeptide. The
flaviviruses
                     are small enveloped animal viruses containing a single
                     positive strand genomic RNA. The genome encodes one
large
                     ORF a polyprotein which undergos proteolytic processing
                     into mature viral peptide chains; pfam01570"
                     /db_xref="CDD:65376"
     Region          123..238
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            183
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Site            205..206
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host furin (By similarity)."
     Region          206..280
                     /region_name="Flavi_M"
                     /note="Flavivirus envelope glycoprotein M. Flaviviruses
                     are small enveloped viruses with virions comprised of 3
                     proteins called C, M and E. The envelope glycoprotein M
is
                     made as a precursor, called prM; pfam01004"
                     /db_xref="CDD:85177"
     Region          206..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Small envelope protein M. /FTId=PRO_0000037886."
     Region          239..259
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          260..265
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          266..286
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            280..281
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          281..775
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Envelope protein E. /FTId=PRO_0000037887."
     Region          281..576
                     /region_name="Flavi_glycoprot"
                     /note="Flavivirus glycoprotein, central and
dimerisation
                     domains; pfam00869"
                     /db_xref="CDD:85082"
     Bond            bond(283,310)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          287..725
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Bond            bond(340,401)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            347
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(354,385)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Bond            bond(372,396)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            433
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(465,565)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          578..673
                     /region_name="Flavi_glycop_C"
                     /note="Flavivirus glycoprotein, immunoglobulin-like
                     domain; pfam02832"
                     /db_xref="CDD:66513"
     Bond            bond(582,613)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          726..746
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          747..752
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          753..773
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          774..>792
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            775..776
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          776..>792
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Non-structural protein 1. /FTId=PRO_0000037888."
ORIGIN
        1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf vaflrflaip
       61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp talafhlttr
      121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm teaepddvdc
      181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega wkqiqkvetw
      241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd fveglsgatw
      301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt dsrcptqgea
      361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv qyenlkysvi
      421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg ldfnrvvllt
      481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev vvlgsqegam
      541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek evaetqhgtv
      601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae ppfgesyivv
      661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft svgklihqif
      721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg vmvqadsgcv
      781 inwkgkelkc gs
//


On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:

>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>

From hlapp at gmx.net  Thu Jan 31 15:10:35 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 31 Jan 2008 15:10:35 -0500
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
	<10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
Message-ID: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>

I see. Note that the sequence below is really a UniProt sequence,  
that has been reformatted into GenBank format, and hence aren't in  
your typical genbank sequence format (which usually lacks DBSOURCE,  
for example). (The joys of data integration.)

If you load the same sequence from UniProt, does it still fail to  
parse or to load?

Also, does it or does this not mean that sequences at the link you  
sent load w/o error? I.e., can I close that issue report, or is there  
a bug in bioperl-db?

	-hilmar

On Jan 31, 2008, at 1:46 PM, snoze pa wrote:

> The link i sent was related to my tutorial. I was following that  
> website. The typical example is one of the following which have  
> xrefs (non-sequence databases): line.
> thanks
> s
>
> LOCUS       P27912                   792 aa            linear   VRL  
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein)  
> (Capsid
>             protein); prM; Peptide pr; Small envelope protein M  
> (Matrix
>             protein); Envelope protein E; Non-structural protein 1  
> (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             xrefs (non-sequence databases): HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069,  
> InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA: 
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,  
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;  
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane;  
> Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;  
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of  
> dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein  
> E during
>             intracellular virion assembly by masking and  
> inactivating envelope
>             protein E fusion peptide. prM is matured in the last  
> step of virion
>             assembly, presumably to avoid catastrophic activation  
> of the viral
>             fusion peptide induced by the acidic pH of the trans- 
> Golgi network.
>             After cleavage by host furin, the pr peptide is  
> released in the
>             extracellular medium and small envelope protein M and  
> envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface  
> receptor and is
>             involved in membrane fusion between virion and target  
> cell.
>             Synthesized as an homodimer with prM which acts as a  
> chaperone for
>             envelope protein E. After cleavage of prM, envelope  
> protein E
>             dissociate from small envelope protein M and  
> homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted  
> from
>             mammalian cells, but not from mosquito cells. Secreted  
> form elicits
>             protective immune response and plays an essential role  
> in RNA
>             replication. Soluble and membrane-associated NS1 may  
> activate human
>             complement and induce host vascular leakage. This  
> effect might
>             explain the clinical manifestations of dengue  
> hemorrhagic fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers  
> in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as  
> homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to  
> the Golgi,
>             then transported again to the cell membrane where it is  
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By  
> similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane  
> protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope  
> protein M and
>             envelope protein E contains an endoplasmic reticulum  
> retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature  
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic  
> domain that
>             act as a signal sequence for translocation of prM into  
> the lumen of
>             the ER. Mature protein C is cleaved at a site upstream  
> of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi  
> vesicles by
>             a host furin, releasing the mature small envelope  
> protein M, and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF  
> 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever  
> mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains:  
> Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C.  
> Flaviviruses are small
>                      enveloped viruses with virions comprised of 3  
> proteins
>                      called C, M and E. Multiple copies of the C  
> protein form
>                      the nucleocapsid, which contains the ssRNA  
> molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By  
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in  
> mature form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The  
> flaviviruses
>                      are small enveloped animal viruses containing  
> a single
>                      positive strand genomic RNA. The genome  
> encodes one large
>                      ORF a polyprotein which undergos proteolytic  
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.  
> Flaviviruses
>                      are small enveloped viruses with virions  
> comprised of 3
>                      proteins called C, M and E. The envelope  
> glycoprotein M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Small envelope protein M. / 
> FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and  
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin- 
> like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Non-structural protein 1. / 
> FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf  
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp  
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm  
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega  
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd  
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt  
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv  
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg  
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev  
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek  
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae  
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft  
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg  
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:
>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to  
> load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






From snoze.pa at gmail.com  Thu Jan 31 15:21:18 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Thu, 31 Jan 2008 14:21:18 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
	<10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
	<3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>
Message-ID: <10f848910801311221q2a9f0d02x6c4600048f05adab@mail.gmail.com>

Thanks Hilmar,

 I also thought that they are translated into genbank format. My problem is
i have downloaded tons of sequences from NCBI in gb format. In my flat
file,  i have many sequences in this format so I am unable to load them into
local database using  load_seqdatabase.pl script. So far i am full of
warnings and errors. Any solution to this problem? otherwise i will try to
write some code to load all sequences into local data base. But it seems to
be easy to modify the parsing code so that we can load these sequences.


>format (which usually lacks DBSOURCE, for example

I think if the three dimensional structure of the protein is known then in
ncbi gb format the DBSOURCE is common. I agree with you, the joys of
integration.

The link was related to tutorial i was using.. u can off it.

Thanks for looking into matter..
 s

On Jan 31, 2008 2:10 PM, Hilmar Lapp  wrote:

> I see. Note that the sequence below is really a UniProt sequence, that has
> been reformatted into GenBank format, and hence aren't in your typical
> genbank sequence format (which usually lacks DBSOURCE, for example). (The
> joys of data integration.)
> If you load the same sequence from UniProt, does it still fail to parse or
> to load?
>
> Also, does it or does this not mean that sequences at the link you sent
> load w/o error? I.e., can I close that issue report, or is there a bug in
> bioperl-db?
>
> -hilmar
>
> On Jan 31, 2008, at 1:46 PM, snoze pa wrote:
>
> The link i sent was related to my tutorial. I was following that website.
> The typical example is one of the following which have *xrefs
> (non-sequence databases): line.
> thanks
> s
> *
> LOCUS       P27912                   792 aa            linear   VRL
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
>             protein); prM; Peptide pr; Small envelope protein M (Matrix
>             protein); Envelope protein E; Non-structural protein 1 (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein E
> during
>             intracellular virion assembly by masking and inactivating
> envelope
>             protein E fusion peptide. prM is matured in the last step of
> virion
>             assembly, presumably to avoid catastrophic activation of the
> viral
>             fusion peptide induced by the acidic pH of the trans-Golgi
> network.
>             After cleavage by host furin, the pr peptide is released in
> the
>             extracellular medium and small envelope protein M and envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface receptor and
> is
>             involved in membrane fusion between virion and target cell.
>             Synthesized as an homodimer with prM which acts as a chaperone
> for
>             envelope protein E. After cleavage of prM, envelope protein E
>             dissociate from small envelope protein M and homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted from
>             mammalian cells, but not from mosquito cells. Secreted form
> elicits
>             protective immune response and plays an essential role in RNA
>             replication. Soluble and membrane-associated NS1 may activate
> human
>             complement and induce host vascular leakage. This effect might
>             explain the clinical manifestations of dengue hemorrhagic
> fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to the
> Golgi,
>             then transported again to the cell membrane where it is
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope protein M
> and
>             envelope protein E contains an endoplasmic reticulum retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic domain
> that
>             act as a signal sequence for translocation of prM into the
> lumen of
>             the ER. Mature protein C is cleaved at a site upstream of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi
> vesicles by
>             a host furin, releasing the mature small envelope protein M,
> and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains: Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C. Flaviviruses are
> small
>                      enveloped viruses with virions comprised of 3
> proteins
>                      called C, M and E. Multiple copies of the C protein
> form
>                      the nucleocapsid, which contains the ssRNA molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in mature
> form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The
> flaviviruses
>                      are small enveloped animal viruses containing a
> single
>                      positive strand genomic RNA. The genome encodes one
> large
>                      ORF a polyprotein which undergos proteolytic
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.
> Flaviviruses
>                      are small enveloped viruses with virions comprised of
> 3
>                      proteins called C, M and E. The envelope glycoprotein
> M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Small envelope protein M.
> /FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin-like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Non-structural protein 1.
> /FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:
>
> >
> > On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
> >
> > > Hi Hilmar,
> > >
> > >  After spending lots of time i figure out the error. I am able to load
> > > sequences if the sequences do not have following entry
> > >
> > > xrefs (non-sequence databases):
> >
> > Is this the literal value? I am asking because I can't find this in
> > the file at
> >
> > http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
> >
> > which you said was giving you grief. So does the genbank file above
> > now load, or how can I identify the critical line in there?
> >
> >        -hilmar
> > >
> > > If the Genbank sequence have this entry then script
> > > load_seqdatabase.pl is
> > > crashing. I try it in couple of sequences and found it is the
> > > culprit line
> > > genbank format.  But this line is important as it contain lots of
> > > information... so I am wondering how to solve this problem
> > >
> > > Any help?
> > >
> > > Thanks in advance
> > > s
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>

From Laurence.Amilhat at toulouse.inra.fr  Thu Jan  3 09:29:09 2008
From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat)
Date: Thu, 03 Jan 2008 15:29:09 +0100
Subject: [Bioperl-l] BioPerl and NHX tree
Message-ID: <477CF135.9060104@toulouse.inra.fr>

Dear all,

I am trying to convert a newick tree into an NHX tree, so I can add the 
taxid tag for each leaf.

I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
The idea is
1) to read the newick tree
2) get the leaf, and get the corresponding taxid for it
3) add the nhx species tag
4) write the nhx tree

I was able to do the first 2 steps, and I could create an object 
node_nhx and add the tag T,
but I don't know how to write an nhx Tree with the node_nhx previously 
created...

Does anyone have an idea? any help are welcome.

Thanks,

laurence.


Here are my code and the samples files for better understanding:
newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt

_newick2nhx.pl:_
use strict;
use Bio::TreeIO;
use Bio::Tree::NodeNHX;
use Getopt::Long;


my $tree_file;
my $outfile;
my $codefile;
my %corresp;

GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
=>\$codefile);

open (CODE, "< $codefile");
while ()
{
    chomp;
    my($a, $b)=split (/\t/);
    $corresp{$a}=$b;
}


my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file");
my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");

while (my $tree= $treeio->next_tree)
{
    my @nodes=$tree->get_nodes();
    foreach my $nd(@nodes)
    {
        if ($nd->is_Leaf())
        {
            my $id=$nd->id();
            print "$id TAXID ",$corresp{$id},"\n";
           
            my $nodenhx=new Bio::Tree::NodeNHX();
            $nodenhx->nhx_tag({T=>$corresp{$id}});
        }
    }
    $treeout->write_tree($tree);
}


_test_tree.nwk_:
(((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
(42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,AAEL015662:100.0):100.0,
42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
42558941:100.0);

_seq_taxid.txt:_
AAEL015662      7159
42558969        9606
42558981        10090
42558942        9606
42558970        6239
42558929        10116
42558987        9606
42558930        10116
42558943        9606
148887393       10090
42558958        10090
42558941        9606
56405380        10090
90185247        9606
66774197        6239


_And the tata resulting file:_
(((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.0[&&NHX],AAEL01566
2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);




-- 
====================================================================
= Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan     	   = 
= Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
====================================================================





From aaron.j.mackey at gsk.com  Thu Jan  3 10:12:22 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Thu, 3 Jan 2008 10:12:22 -0500
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: <477CF135.9060104@toulouse.inra.fr>
Message-ID: 

Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that 
way, your tree's nodes are already NodeNHX's.  Instead of creating a new 
$nodenhx, you can use the $node variable directly from the tree ...

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM:

> Dear all,
> 
> I am trying to convert a newick tree into an NHX tree, so I can add the 
> taxid tag for each leaf.
> 
> I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
> The idea is
> 1) to read the newick tree
> 2) get the leaf, and get the corresponding taxid for it
> 3) add the nhx species tag
> 4) write the nhx tree
> 
> I was able to do the first 2 steps, and I could create an object 
> node_nhx and add the tag T,
> but I don't know how to write an nhx Tree with the node_nhx previously 
> created...
> 
> Does anyone have an idea? any help are welcome.
> 
> Thanks,
> 
> laurence.
> 
> 
> Here are my code and the samples files for better understanding:
> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt
> 
> _newick2nhx.pl:_
> use strict;
> use Bio::TreeIO;
> use Bio::Tree::NodeNHX;
> use Getopt::Long;
> 
> 
> my $tree_file;
> my $outfile;
> my $codefile;
> my %corresp;
> 
> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
> =>\$codefile);
> 
> open (CODE, "< $codefile");
> while ()
> {
>     chomp;
>     my($a, $b)=split (/\t/);
>     $corresp{$a}=$b;
> }
> 
> 
> my $treeio = new Bio::TreeIO (-format => 'newick', -file => 
"$tree_file");
> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
> 
> while (my $tree= $treeio->next_tree)
> {
>     my @nodes=$tree->get_nodes();
>     foreach my $nd(@nodes)
>     {
>         if ($nd->is_Leaf())
>         {
>             my $id=$nd->id();
>             print "$id TAXID ",$corresp{$id},"\n";
> 
>             my $nodenhx=new Bio::Tree::NodeNHX();
>             $nodenhx->nhx_tag({T=>$corresp{$id}});
>         }
>     }
>     $treeout->write_tree($tree);
> }
> 
> 
> _test_tree.nwk_:
> 
(((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
> 
42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,
> AAEL015662:100.0):100.0,
> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
> 42558941:100.0);
> 
> _seq_taxid.txt:_
> AAEL015662      7159
> 42558969        9606
> 42558981        10090
> 42558942        9606
> 42558970        6239
> 42558929        10116
> 42558987        9606
> 42558930        10116
> 42558943        9606
> 148887393       10090
> 42558958        10090
> 42558941        9606
> 56405380        10090
> 90185247        9606
> 66774197        6239
> 
> 
> _And the tata resulting file:_
> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,
> 
(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,
> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.
> 0[&&NHX],AAEL01566
> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],
> 
(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);
> 
> 
> 
> 
> -- 
> ====================================================================
> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan           = 
> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
> ====================================================================
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From Laurence.Amilhat at toulouse.inra.fr  Fri Jan  4 03:33:22 2008
From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat)
Date: Fri, 04 Jan 2008 09:33:22 +0100
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: 
References: 
Message-ID: <477DEF52.20802@toulouse.inra.fr>

Thank you Aaron,

it's working now. I've changed to species instead of taxid, so I can 
color the species on my tree using the ATV viewer.
thanks again,

Regards,

Laurence.



aaron.j.mackey at gsk.com a ?crit :
> Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that 
> way, your tree's nodes are already NodeNHX's.  Instead of creating a new 
> $nodenhx, you can use the $node variable directly from the tree ...
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM:
>
>   
>> Dear all,
>>
>> I am trying to convert a newick tree into an NHX tree, so I can add the 
>> taxid tag for each leaf.
>>
>> I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
>> The idea is
>> 1) to read the newick tree
>> 2) get the leaf, and get the corresponding taxid for it
>> 3) add the nhx species tag
>> 4) write the nhx tree
>>
>> I was able to do the first 2 steps, and I could create an object 
>> node_nhx and add the tag T,
>> but I don't know how to write an nhx Tree with the node_nhx previously 
>> created...
>>
>> Does anyone have an idea? any help are welcome.
>>
>> Thanks,
>>
>> laurence.
>>
>>
>> Here are my code and the samples files for better understanding:
>> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt
>>
>> _newick2nhx.pl:_
>> use strict;
>> use Bio::TreeIO;
>> use Bio::Tree::NodeNHX;
>> use Getopt::Long;
>>
>>
>> my $tree_file;
>> my $outfile;
>> my $codefile;
>> my %corresp;
>>
>> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
>> =>\$codefile);
>>
>> open (CODE, "< $codefile");
>> while ()
>> {
>>     chomp;
>>     my($a, $b)=split (/\t/);
>>     $corresp{$a}=$b;
>> }
>>
>>
>> my $treeio = new Bio::TreeIO (-format => 'newick', -file => 
>>     
> "$tree_file");
>   
>> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
>>
>> while (my $tree= $treeio->next_tree)
>> {
>>     my @nodes=$tree->get_nodes();
>>     foreach my $nd(@nodes)
>>     {
>>         if ($nd->is_Leaf())
>>         {
>>             my $id=$nd->id();
>>             print "$id TAXID ",$corresp{$id},"\n";
>>
>>             my $nodenhx=new Bio::Tree::NodeNHX();
>>             $nodenhx->nhx_tag({T=>$corresp{$id}});
>>         }
>>     }
>>     $treeout->write_tree($tree);
>> }
>>
>>
>> _test_tree.nwk_:
>>
>>     
> (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
>   
> 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
>   
>> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,
>> AAEL015662:100.0):100.0,
>> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
>> 42558941:100.0);
>>
>> _seq_taxid.txt:_
>> AAEL015662      7159
>> 42558969        9606
>> 42558981        10090
>> 42558942        9606
>> 42558970        6239
>> 42558929        10116
>> 42558987        9606
>> 42558930        10116
>> 42558943        9606
>> 148887393       10090
>> 42558958        10090
>> 42558941        9606
>> 56405380        10090
>> 90185247        9606
>> 66774197        6239
>>
>>
>> _And the tata resulting file:_
>> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,
>>
>>     
> (42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
>   
>> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,
>> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.
>> 0[&&NHX],AAEL01566
>> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],
>>
>>     
> (42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);
>   
>>
>>
>> -- 
>> ====================================================================
>> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan           = 
>> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
>> ====================================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
>   


-- 
====================================================================
= Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan     	   = 
= Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
====================================================================





From hlapp at gmx.net  Sun Jan  6 22:02:32 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 6 Jan 2008 22:02:32 -0500
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
In-Reply-To: 
References: 
Message-ID: <640890C9-2D34-4C70-9179-26A9EAB397D2@gmx.net>

Hi Zhihua, you didn't ever respond to Marc's link to the Persistent  
Bioperl slides - did that help?

	-hilmar

On Dec 6, 2007, at 11:25 PM, zhihuali wrote:

>
> Hi netters,
>
> I've installed BioSQL and bioperl-db, and successfully created and  
> stored a persistent object:
>
> use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
> my $dbadp=Bio::DB::BioDB->new(- 
> database=>'biosql',                             - 
> user=>'annoymous',                             -dbname=>'bioseqdb');
>
> my $seqobj=Bio::Seq->new(- 
> accession_number=>"test",                      - 
> id=>"test1",                      - 
> seq=>"AGCTAGCT",                      -version=>1);my $dbobj=$dbadp- 
> >create_persistent($seqobj);$dbobj->create;$dbobj->commit;
>
> It's successful because I found corresponding rows in the bioseqdb  
> tables.
>
> Now I want to retrieve the object back from the database. There's  
> not much documents available and I've tried find_by_unique_key/ 
> primary_key but all failed. Maybe I didn't use them correctly.  
> Could anyone give me an example as how to retrieve the stored  
> Bio::Seq object?
>
> Thanks a lot!
>
> Zhihua Li
> _________________________________________________________________
> ? Live Search ???????
> http://www.live.com/?searchOnly=true
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








From cain.cshl at gmail.com  Mon Jan  7 12:24:02 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 12:24:02 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
Message-ID: <1199726642.6374.10.camel@frissell>

Hello,

I was trying to get bioperl-live this morning from either cvs or svn and
failed.  I was wondering if something was going on with the server.

Here are the things I tried:

  cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co bioperl-live

which resulted in this:

cvs checkout: warning: cannot write to history file /home/repository/bioperl/CVSROOT/history: Permission denied
cvs checkout: Updating bioperl-live
cvs checkout: failed to create lock directory for `/home/repository/bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/#cvs.lock): Permission denied
cvs checkout: failed to obtain dir lock in repository `/home/repository/bioperl/bioperl-live'
cvs [checkout aborted]: read lock failed - giving up

Then I thought I'd try the suggested svn checkout method from the
bioperl wiki:

  svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live

which resulted in

svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live'

Finally, I after looking at the openbio server, I thought I'd try this:

   svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/bioperl/bioperl-live

which resulted in repeated requests for my password (which I supplied
correctly at least once out of the several requests).

So, what's up?

Thanks much,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From hlapp at gmx.net  Mon Jan  7 12:36:02 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 7 Jan 2008 12:36:02 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <1199726642.6374.10.camel@frissell>
References: <1199726642.6374.10.camel@frissell>
Message-ID: 

I think we are still migrating to svn. It's probably better to wait  
for the announcement that everything is ready to go. (And then cvs  
won't work anymore except for anonymous checkout - which should  
actually continue to work while this is in progress. Have you tried  
that?)

	-hilmar

On Jan 7, 2008, at 12:24 PM, Scott Cain wrote:

> Hello,
>
> I was trying to get bioperl-live this morning from either cvs or  
> svn and
> failed.  I was wondering if something was going on with the server.
>
> Here are the things I tried:
>
>   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> bioperl-live
>
> which resulted in this:
>
> cvs checkout: warning: cannot write to history file /home/ 
> repository/bioperl/CVSROOT/history: Permission denied
> cvs checkout: Updating bioperl-live
> cvs checkout: failed to create lock directory for `/home/repository/ 
> bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> #cvs.lock): Permission denied
> cvs checkout: failed to obtain dir lock in repository `/home/ 
> repository/bioperl/bioperl-live'
> cvs [checkout aborted]: read lock failed - giving up
>
> Then I thought I'd try the suggested svn checkout method from the
> bioperl wiki:
>
>   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> bioperl-live
>
> which resulted in
>
> svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> hartzell/bioperl/bioperl-live'
>
> Finally, I after looking at the openbio server, I thought I'd try  
> this:
>
>    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> bioperl/bioperl-live
>
> which resulted in repeated requests for my password (which I supplied
> correctly at least once out of the several requests).
>
> So, what's up?
>
> Thanks much,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From jason at bioperl.org  Mon Jan  7 12:43:18 2008
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 7 Jan 2008 09:43:18 -0800
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <1199726642.6374.10.camel@frissell>
References: <1199726642.6374.10.camel@frissell>
Message-ID: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>

CVS r/w is locked because we are transitioning to SVN - you can still  
checkout via anonymous CVS on code.open-bio.org.

The SVN is going to be in /home/svn-repositories/bioperl not George's  
directory, but we are still monkeying around with the directory  
structure.  You can try a checkout but be warned it may change a few  
more times if we add another directory layer in there.

You will get requests for your password at least three times - I  
strongly suggest you use SSH keys to avoid getting prompted each time  
- I don't know why you get asked 3 times as it is a SVN thing I  
assume it is having to make 3 separate requests to do a checkout.

That's what is up for now.  We'll report when the final SVN migration  
is done.

-jason
On Jan 7, 2008, at 9:24 AM, Scott Cain wrote:

> Hello,
>
> I was trying to get bioperl-live this morning from either cvs or  
> svn and
> failed.  I was wondering if something was going on with the server.
>
> Here are the things I tried:
>
>   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> bioperl-live
>
> which resulted in this:
>
> cvs checkout: warning: cannot write to history file /home/ 
> repository/bioperl/CVSROOT/history: Permission denied
> cvs checkout: Updating bioperl-live
> cvs checkout: failed to create lock directory for `/home/repository/ 
> bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> #cvs.lock): Permission denied
> cvs checkout: failed to obtain dir lock in repository `/home/ 
> repository/bioperl/bioperl-live'
> cvs [checkout aborted]: read lock failed - giving up
>
> Then I thought I'd try the suggested svn checkout method from the
> bioperl wiki:
>
>   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> bioperl-live
>
> which resulted in
>
> svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> hartzell/bioperl/bioperl-live'
>
> Finally, I after looking at the openbio server, I thought I'd try  
> this:
>
>    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> bioperl/bioperl-live
>
> which resulted in repeated requests for my password (which I supplied
> correctly at least once out of the several requests).
>
> So, what's up?
>
> Thanks much,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> ______________________________________________



From cain.cshl at gmail.com  Mon Jan  7 12:57:38 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 12:57:38 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>
References: <1199726642.6374.10.camel@frissell>
	<5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>
Message-ID: <1199728658.6374.12.camel@frissell>

Hi Hilmar and Jason,

Thanks--for some reason, I thought svn was done.  I'll remain anonymous
for right now (Kind of difficult to do when you announce it publicly :-)

Thanks,
Scott

On Mon, 2008-01-07 at 09:43 -0800, Jason Stajich wrote:
> CVS r/w is locked because we are transitioning to SVN - you can still  
> checkout via anonymous CVS on code.open-bio.org.
> 
> The SVN is going to be in /home/svn-repositories/bioperl not George's  
> directory, but we are still monkeying around with the directory  
> structure.  You can try a checkout but be warned it may change a few  
> more times if we add another directory layer in there.
> 
> You will get requests for your password at least three times - I  
> strongly suggest you use SSH keys to avoid getting prompted each time  
> - I don't know why you get asked 3 times as it is a SVN thing I  
> assume it is having to make 3 separate requests to do a checkout.
> 
> That's what is up for now.  We'll report when the final SVN migration  
> is done.
> 
> -jason
> On Jan 7, 2008, at 9:24 AM, Scott Cain wrote:
> 
> > Hello,
> >
> > I was trying to get bioperl-live this morning from either cvs or  
> > svn and
> > failed.  I was wondering if something was going on with the server.
> >
> > Here are the things I tried:
> >
> >   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> > bioperl-live
> >
> > which resulted in this:
> >
> > cvs checkout: warning: cannot write to history file /home/ 
> > repository/bioperl/CVSROOT/history: Permission denied
> > cvs checkout: Updating bioperl-live
> > cvs checkout: failed to create lock directory for `/home/repository/ 
> > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> > #cvs.lock): Permission denied
> > cvs checkout: failed to obtain dir lock in repository `/home/ 
> > repository/bioperl/bioperl-live'
> > cvs [checkout aborted]: read lock failed - giving up
> >
> > Then I thought I'd try the suggested svn checkout method from the
> > bioperl wiki:
> >
> >   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> > bioperl-live
> >
> > which resulted in
> >
> > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> > hartzell/bioperl/bioperl-live'
> >
> > Finally, I after looking at the openbio server, I thought I'd try  
> > this:
> >
> >    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> > bioperl/bioperl-live
> >
> > which resulted in repeated requests for my password (which I supplied
> > correctly at least once out of the several requests).
> >
> > So, what's up?
> >
> > Thanks much,
> > Scott
> >
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                    
> > cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> > ______________________________________________
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From cain.cshl at gmail.com  Mon Jan  7 13:34:25 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 13:34:25 -0500
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
Message-ID: <1199730865.6374.18.camel@frissell>

Hello,

I was wanting to implement this myself (and probably still will,
assuming it's not already there...) but I am not a Module::Build guru.
Here's what I'd like to do: add a parameter that I can add when evoking
perl Build.PL so that the default answers will be used when it would
normally ask me a question while running perl Build.PL, something like
this:

  perl Build.PL --yes

Is this sort of thing already built into Module::Build and I can't see
it?  Or can somebody suggest the best way of going about this?

Thanks much,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From cjfields at uiuc.edu  Mon Jan  7 17:22:35 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 7 Jan 2008 16:22:35 -0600
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
In-Reply-To: <1199730865.6374.18.camel@frissell>
References: <1199730865.6374.18.camel@frissell>
Message-ID: <31AD254B-DABA-488D-BDA8-D690F949CC39@uiuc.edu>

I agree it would be nice.  Not sure how hard it would be to implement;  
maybe it would be best to have a mode of installation, say if one  
wanted 'minimal' (no optional module installation, no scripts),  
'full', 'dev', (assume minimal install but don't test), and so on,  
falling back to the query-based approach if nothing is indicated.

chris

On Jan 7, 2008, at 12:34 PM, Scott Cain wrote:

> Hello,
>
> I was wanting to implement this myself (and probably still will,
> assuming it's not already there...) but I am not a Module::Build guru.
> Here's what I'd like to do: add a parameter that I can add when  
> evoking
> perl Build.PL so that the default answers will be used when it would
> normally ask me a question while running perl Build.PL, something like
> this:
>
>  perl Build.PL --yes
>
> Is this sort of thing already built into Module::Build and I can't see
> it?  Or can somebody suggest the best way of going about this?
>
> Thanks much,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From bix at sendu.me.uk  Mon Jan  7 17:37:36 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 07 Jan 2008 22:37:36 +0000
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
In-Reply-To: <1199730865.6374.18.camel@frissell>
References: <1199730865.6374.18.camel@frissell>
Message-ID: <4782A9B0.60203@sendu.me.uk>

Scott Cain wrote:
> Hello,
> 
> I was wanting to implement this myself (and probably still will,
> assuming it's not already there...) but I am not a Module::Build guru.
> Here's what I'd like to do: add a parameter that I can add when evoking
> perl Build.PL so that the default answers will be used when it would
> normally ask me a question while running perl Build.PL, something like
> this:
> 
>   perl Build.PL --yes
> 
> Is this sort of thing already built into Module::Build and I can't see
> it?  Or can somebody suggest the best way of going about this?

You should ask on the Module::Build mailing list. If it already exists I 
don't think it is obvious, however.

If your question is BioPerl related, and you're looking for a fast way 
of installing BioPerl without the annoying questions, I'm sure I could 
hack something into ModuleBuildBioperl.pm


From cain.cshl at gmail.com  Mon Jan  7 22:04:19 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 22:04:19 -0500
Subject: [Bioperl-l] Automatically accepting defaults for `perl	Build.PL`
In-Reply-To: <4782A9B0.60203@sendu.me.uk>
References: <1199730865.6374.18.camel@frissell> <4782A9B0.60203@sendu.me.uk>
Message-ID: <1199761459.6017.1.camel@frissell>

Hi Sendu,

I just hacked something up (I only needed to change a few lines--once I
figured out where everything was).  I like Chris' idea though; before I
commit it back (Ha, no rush there), I'll flesh it out a little more to
give more options.

Scott

On Mon, 2008-01-07 at 22:37 +0000, Sendu Bala wrote:
> Scott Cain wrote:
> > Hello,
> > 
> > I was wanting to implement this myself (and probably still will,
> > assuming it's not already there...) but I am not a Module::Build guru.
> > Here's what I'd like to do: add a parameter that I can add when evoking
> > perl Build.PL so that the default answers will be used when it would
> > normally ask me a question while running perl Build.PL, something like
> > this:
> > 
> >   perl Build.PL --yes
> > 
> > Is this sort of thing already built into Module::Build and I can't see
> > it?  Or can somebody suggest the best way of going about this?
> 
> You should ask on the Module::Build mailing list. If it already exists I 
> don't think it is obvious, however.
> 
> If your question is BioPerl related, and you're looking for a fast way 
> of installing BioPerl without the annoying questions, I'm sure I could 
> hack something into ModuleBuildBioperl.pm
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From granjeau at tagc.univ-mrs.fr  Wed Jan  9 03:30:17 2008
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Wed, 09 Jan 2008 09:30:17 +0100
Subject: [Bioperl-l] Parsing SwissProt annotation in comment
Message-ID: <47848619.40109@tagc.univ-mrs.fr>

Hello,

I would like to retrieve the human reviewed annotation of SwissProt 
entries; these information are in the comment section of the sequence 
file. Here is an example:

CC   -!- FUNCTION: Actins are highly conserved proteins that are involved
CC       in various types of cell motility and are ubiquitously expressed
CC       in all eukaryotic cells.
CC   -!- SUBUNIT: Polymerization of globular actin (G-actin) leads to a
CC       structural filament (F-actin) in the form of a two-stranded helix.
CC       Each actin can bind to 4 others. Found in a complex with XPO6,
CC       Ran, ACTB and PFN1. Component of a complex composed at least of
CC       ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with XPO6.
CC   -!- INTERACTION:
CC       Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668;
CC       P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161;
CC   -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton.

Is there a specific method to do such a job?

Thanks much,
Samuel

-- 

Samuel GRANJEAUD                   granjeau at tagc.univ-mrs.fr
INSERM - ICIM - TAGC               Tel: +33  (0)491 82 87 24
http://tagc.univ-mrs.fr            Fax: +33  (0)491 82 87 01
http://icim.marseille.inserm.fr/proteomique



From robfsouza at gmail.com  Wed Jan  9 08:20:08 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Wed, 9 Jan 2008 11:20:08 -0200
Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs
Message-ID: 

Hello All!

Greetings for everybody and happy new year for those following an
western calendary!

I'm starting a new project to store and analyze distinct sets of
sequence annotation data which are related in a way suitable for
representation in a directed (e.g. transcript splicing) or undirected
(e.g. gene product interaction) graph. Analysis will require frequent
queries based on interval overlaps, feature neighbourhood, annotation
and, most importantly, feature relationships and stored paths.

At first, I thought of build an entire new database structure to store
project specific data (e.g. alternative splicing or protein interaction),
but as I have some experience with Lincon's
Bio::DB::SeqFeature::Store, I'm now considering extending it for the
purpose of storing graphs describing relationships among features.

I'm aware that some other bioperl related databases, specifically
BioSQL and Chado, do have  components which might be suitable for
storing all or some of these data but, since Lincon's feature storage
and interval binning implementations in
Bio::DB::SeqFeature::Store::mysql are both clean, simple and very fast,
perhaps extending it in a seemingly modular way is desirable. A good
extension to Lincon's database could include tables like
feature_relationship and feature_path, for edges and transitive
closures (just like in BioSQL) and feature_stored_path, for exclusion
of biologically irrelevant paths in DAGs, like certain splicing
isoforms. These tables could be used  to store sequence assemblies or
EST alignments efficiently, including scaffolds inferred by connecting
contigs.

Before starting, I would like to know if the BioSQL and Chado schemata
do have accelerators for quering intervals among billions of features
and feature relatioships (some examples using these databases would
also help, if they that these databases are efficient for such tasks).
If these or other databases are not as suitable as Bio::DB::SeqFeature
for feature retrieval based on interval overlap and attributes,  then
again I might consider extending Bio::DB::seqFeature
and contributing such extensions back to bioperl...

Any thoughts?

Best regards,
Robson

PS: sorry if anyone gets two copies of this post, but took me some
time to realize my new e-mail wasn't subscribed to bioperl-l...


From bix at sendu.me.uk  Wed Jan  9 08:59:08 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 09 Jan 2008 13:59:08 +0000
Subject: [Bioperl-l] bioperl based database infrastucture for directed
 graphs
In-Reply-To: 
References: 
Message-ID: <4784D32C.9070807@sendu.me.uk>

Robson Francisco de Souza wrote:
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,

I'm using Bio::DB::SeqFeature for that purpose, but just a warning: I 
found that with millions of features it made a db that was too large in 
terms of disc space and too slow in terms of query time. I had to hack 
out its storage of feature objects in the db, instead generating feature 
objects on request from the stored attributes. Doing this turned out to 
be faster than simply unfreezing certain kinds of feature objects!

(I also had to hack in support for retrieval by source, a patch that 
Lincoln hasn't gotten back to me about yet.)

While I can't answer your main questions, I wish you good luck with your 
project and request that you keep us posted with what you achieve.


From bosborne11 at verizon.net  Wed Jan  9 09:46:42 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 09 Jan 2008 09:46:42 -0500
Subject: [Bioperl-l] Parsing SwissProt annotation in comment
In-Reply-To: <47848619.40109@tagc.univ-mrs.fr>
References: <47848619.40109@tagc.univ-mrs.fr>
Message-ID: <3DAEDA67-B9A5-47A4-8108-0915659F1052@verizon.net>

Samuel,

The Feature-Annotation HOWTO addresses this specifically:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation


Brian O.


On Jan 9, 2008, at 3:30 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello,
>
> I would like to retrieve the human reviewed annotation of SwissProt  
> entries; these information are in the comment section of the  
> sequence file. Here is an example:
>
> CC   -!- FUNCTION: Actins are highly conserved proteins that are  
> involved
> CC       in various types of cell motility and are ubiquitously  
> expressed
> CC       in all eukaryotic cells.
> CC   -!- SUBUNIT: Polymerization of globular actin (G-actin) leads  
> to a
> CC       structural filament (F-actin) in the form of a two-stranded  
> helix.
> CC       Each actin can bind to 4 others. Found in a complex with  
> XPO6,
> CC       Ran, ACTB and PFN1. Component of a complex composed at  
> least of
> CC       ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with  
> XPO6.
> CC   -!- INTERACTION:
> CC       Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668;
> CC       P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161;
> CC   -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton.
>
> Is there a specific method to do such a job?
>
> Thanks much,
> Samuel
>
> -- 
>
> Samuel GRANJEAUD                   granjeau at tagc.univ-mrs.fr
> INSERM - ICIM - TAGC               Tel: +33  (0)491 82 87 24
> http://tagc.univ-mrs.fr            Fax: +33  (0)491 82 87 01
> http://icim.marseille.inserm.fr/proteomique
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From alexanderptok at web.de  Wed Jan  9 10:34:56 2008
From: alexanderptok at web.de (Alexander Ptok)
Date: Wed, 09 Jan 2008 16:34:56 +0100
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN]
Message-ID: <2011210591@web.de>

Hi,

I am a beginner to BioPerl and working through the Beginners HOWTO

Version of BioPerl is 1.4-1 running on Debian etch

In the Howto everything worked fine until the section

Retrieving multiple sequences from a database

from where i copied the following script:

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
 
$query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
$query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  -query => $query );
 
$gb_obj = Bio::DB::GenBank->new;
 
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
 
while ($seq_obj = $stream_obj->next_seq) {    
    # do something with the sequence object    
    print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}


If i cut the 0:3000[SLEN] query it works and returns a lot of sequences, when i alter the query to e.g. 1830[SLEN] it
finds the one sequence that has the length 1830, but i was not able to query a range of lengths.

Please, does anyone know what i am doing wrong.
Greetings
A. Ptok
_________________________________________________________________________
In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! 
Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114



From cjm at fruitfly.org  Wed Jan  9 11:52:21 2008
From: cjm at fruitfly.org (Chris Mungall)
Date: Wed, 9 Jan 2008 08:52:21 -0800
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: 
References: 
Message-ID: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>

[cc-d to gmod-schema]

Chado does have some views and pg functions for interval-based  
retrieval. AFAIK there are no accelerators for deep feature graphs,  
as most chado users have relatively shallow gene-model/SO feature  
graphs. It may not be so hard to extend cvterm code for doing this,  
depending on the characteristics of your graphs (the closure of  
feature neighbourhood graphs may be particularly large)

On Jan 9, 2008, at 5:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From cjfields at uiuc.edu  Wed Jan  9 10:00:38 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 Jan 2008 09:00:38 -0600
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: <4784D32C.9070807@sendu.me.uk>
References: 
	<4784D32C.9070807@sendu.me.uk>
Message-ID: 


On Jan 9, 2008, at 7:59 AM, Sendu Bala wrote:

> Robson Francisco de Souza wrote:
>> Before starting, I would like to know if the BioSQL and Chado  
>> schemata
>> do have accelerators for quering intervals among billions of features
>> and feature relatioships (some examples using these databases would
>> also help, if they that these databases are efficient for such  
>> tasks).
>> If these or other databases are not as suitable as  
>> Bio::DB::SeqFeature
>> for feature retrieval based on interval overlap and attributes,
>
> I'm using Bio::DB::SeqFeature for that purpose, but just a warning:  
> I found that with millions of features it made a db that was too  
> large in terms of disc space and too slow in terms of query time. I  
> had to hack out its storage of feature objects in the db, instead  
> generating feature objects on request from the stored attributes.  
> Doing this turned out to be faster than simply unfreezing certain  
> kinds of feature objects!

Would this be Bio::SF::Annotated objects? If so I bet Storable is  
storing the OntologyStore object information along with the SF (which  
argues for refactoring the FeatureIO/Bio::SF::Annotated stuff in 1.7).

Not sure what can be done about that beyond your hack, though it might  
be worth exploring whether one can optionally set the DB::Store to  
store the object instance.

> (I also had to hack in support for retrieval by source, a patch that  
> Lincoln hasn't gotten back to me about yet.)
>
> While I can't answer your main questions, I wish you good luck with  
> your project and request that you keep us posted with what you  
> achieve.

You can always try Lincoln on the GBrowse list as well.  I would say  
go ahead and commit the patch if it isn't a big deal.

chris


From cjfields at uiuc.edu  Wed Jan  9 13:12:55 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 Jan 2008 12:12:55 -0600
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: 
References: 
Message-ID: <128517E8-3A2A-45DD-83A0-0014863A25BC@uiuc.edu>

cc'ing the gbrowse list in case Lincoln hasn't seen this.

I believe the primary intent for Bio::DB::SeqFeature::Store was as a  
more GFF3-compatible replacement for Bio::DB::GFF (unlimited feature  
nesting, uses any SeqFeatureI, etc) and was streamlined for faster  
lookups by GBrowse.  I don't think adding tables would affect  
performance dramatically, though maybe Lincoln would have a better idea.

chris

On Jan 9, 2008, at 7:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From bosborne11 at verizon.net  Wed Jan  9 13:29:15 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 09 Jan 2008 13:29:15 -0500
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths
	0:3000[SLEN]
In-Reply-To: <2011210591@web.de>
References: <2011210591@web.de>
Message-ID: <0EB96131-7931-4FC3-802F-A8152B474A99@verizon.net>

Alexander,

I don't understand. By using the clause "0:3000[SLEN] " you are  
querying for sequences in the length range of 0 to 3000.


Brian O.


On Jan 9, 2008, at 10:34 AM, Alexander Ptok wrote:

> If i cut the 0:3000[SLEN] query it works and returns a lot of  
> sequences, when i alter the query to e.g. 1830[SLEN] it
> finds the one sequence that has the length 1830, but i was not able  
> to query a range of lengths.



From stefan.kirov at bms.com  Wed Jan  9 14:54:07 2008
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 09 Jan 2008 14:54:07 -0500
Subject: [Bioperl-l] pairwise_kaks.PLS: verbose rquired by PAML
Message-ID: <4785265F.6020500@bms.com>

Jason,
Even this last fix I still had problems with bp_pairwise_kaks.pl. It
turns out, verbose needs to be set on by default for codeml in order for
the sequences to appear in mlc file.\
That being said, we need instead of:
    $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
        (-verbose => $verbose,
         -params => { 'runmode' => -2,
                      'seqtype' => 1,
                  }
         );
this:

    $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
        (-verbose => $verbose,
         -params => { 'runmode' => -2,
                      'seqtype' => 1,
                      'verbose' => 1,
                  }
         );

verbose can 2 as well.... Just got this clarification from Ziheng. He
also offers to change the output so it becomes easier for us. I plan to
ask him to put the sequence in the mlc header by default.
Stefan



From robfsouza at gmail.com  Wed Jan  9 19:28:25 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Wed, 9 Jan 2008 22:28:25 -0200
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
References: 
	<199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
Message-ID: 

Hi,

2008/1/9, Chris Mungall :
> [cc-d to gmod-schema]
>
> Chado does have some views and pg functions for interval-based
> retrieval. AFAIK there are no accelerators for deep feature graphs,
> as most chado users have relatively shallow gene-model/SO feature
> graphs. It may not be so hard to extend cvterm code for doing this,
> depending on the characteristics of your graphs (the closure of
> feature neighbourhood graphs may be particularly large)

Great! I'm studing Chado and I will have a look at the interval optimizations.
Did any of you compared BioSQL and Chado for huge feature and feature
graph storage/retrieval efficiency? As Sendu pointed to limitations in
Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms
(or maybe another one?) would be best suited for these tasks... for
the moment, I will either extend Sendu's hack of Lincon's modules or
adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to
Chado, if it turns out to be more efficient than the pg functions.

Best,
Robson

PS: I could not find the most recent version of gmod by following the
Download link to gmod(Chado) from GMOD's site to the Sourceforge
download page. Did I miss the right link on the download site or is
this unexpected? Is the version available at IUBio's mirror (0.003-10)
the most recent one?


From cain.cshl at gmail.com  Wed Jan  9 22:15:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 09 Jan 2008 22:15:29 -0500
Subject: [Bioperl-l] bioperl based database infrastucture for
	directed	graphs
In-Reply-To: 
References: 
	<199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
	
Message-ID: <1199934929.6229.44.camel@frissell>

Hi Robson,

I seem to be perennially working on the 1.0 release of Chado.  The
schema itself is quite stable but I'm always working on the tools to
make them handle more cases and be as stable as possible.  For the time
being, you need to get Chado from cvs; see 

  http://www.gmod.org/wiki/index.php/Chado_-_Getting_Started#Chado_From_CVS

I removed the 0.003 release from the SourceForge site because the schema
in it is out of date relative to what we've been working on for the last
year.

Scott

On Wed, 2008-01-09 at 22:28 -0200, Robson Francisco de Souza wrote:
> Hi,
> 
> 2008/1/9, Chris Mungall :
> > [cc-d to gmod-schema]
> >
> > Chado does have some views and pg functions for interval-based
> > retrieval. AFAIK there are no accelerators for deep feature graphs,
> > as most chado users have relatively shallow gene-model/SO feature
> > graphs. It may not be so hard to extend cvterm code for doing this,
> > depending on the characteristics of your graphs (the closure of
> > feature neighbourhood graphs may be particularly large)
> 
> Great! I'm studing Chado and I will have a look at the interval optimizations.
> Did any of you compared BioSQL and Chado for huge feature and feature
> graph storage/retrieval efficiency? As Sendu pointed to limitations in
> Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms
> (or maybe another one?) would be best suited for these tasks... for
> the moment, I will either extend Sendu's hack of Lincon's modules or
> adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to
> Chado, if it turns out to be more efficient than the pg functions.
> 
> Best,
> Robson
> 
> PS: I could not find the most recent version of gmod by following the
> Download link to gmod(Chado) from GMOD's site to the Sourceforge
> download page. Did I miss the right link on the download site or is
> this unexpected? Is the version available at IUBio's mirror (0.003-10)
> the most recent one?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From bosborne11 at verizon.net  Thu Jan 10 09:16:16 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 10 Jan 2008 09:16:16 -0500
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths
	0:3000[SLEN]
In-Reply-To: <2013325230@web.de>
References: <2013325230@web.de>
Message-ID: <932550FF-8414-4B3E-92BB-1895FD9658AE@verizon.net>

Alexander,

OK, that is odd (meaning, this did work a while back but it's not  
clear to me what could have changed).

First thing to do, upgrade to Bioperl version 1.52. Can you do this?  
Version 1.4 is very old and you could run into other problems using it.


Brian O.



On Jan 10, 2008, at 8:54 AM, Alexander Ptok wrote:

> Hallo Brian,
>
> thanks for your answer. The principle is clear, but it doesn't work
> like it should, on my computer. So maybe i should repeat what i did
> step by step.
>
> 1. i took the following script:
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
> $query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  - 
> query => $query );
>
> $gb_obj = Bio::DB::GenBank->new;
>
> $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
> while ($seq_obj = $stream_obj->next_seq) {
>    # do something with the sequence object
>    print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
> }
>
> and then on the terminal
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script1.pl
> sv1494 at r04102:~/Desktop/bioperl$
>
> 2. i took out the 0:3000[SLEN]:
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]";
>
> and then on the terminal
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script2.pl
> NM_128760       2775
> NM_125788       2874
> NM_124913       3068
> NM_124912       3117
> NM_124775       871
> NM_120360       1655
> NM_111862       2199
> NM_001036386    2734
> NM_119270       3996
> NM_105072       1656
> NM_113294       4824
> NM_180431       1673
> NM_120495       2515
> NM_120493       2050
> NM_112156       1089
> .
> .
> and a lot more of hits, and one can clearly see, there are some with  
> a lenght between 0 and 3000
>
> 3. to have a look at the [SLEN] i tried another script with e.g.  
> 2199[SLEN]
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 2199[SLEN]";
>
> on the terminal:
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script3.pl
> NM_111862       2199
> sv1494 at r04102:~/Desktop/bioperl$
>
>
>
> It think everthing works fine, except that bioperl or maybe the  
> genbank doesn't understand
> the range clause 0:3000, but in every documentation says i have to  
> do it that way. Did
> i misunterstand something or is it just a problem of my computer/ 
> bioperl installation?
> Maybe you can tell me if the script does what it is suppose to do on  
> your computer?
>
> Thanks and greetings
>
> Alexander Ptok
>>
>> Alexander,
>>
>> I don't understand. By using the clause "0:3000[SLEN] " you are
>> querying for sequences in the length range of 0 to 3000.
>>
>
>
> _______________________________________________________________________
> Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 30 Tage
> kostenlos testen. http://www.pc-sicherheit.web.de/startseite/? 
> mc=022220
>




From pmiguel at purdue.edu  Fri Jan 11 11:22:38 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 11:22:38 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
Message-ID: <478797CE.9050202@purdue.edu>

No problem getting sequence from genbank via a myriad of methods. But as 
the volume of non-finished sequence in genbank increases the importance 
of also obtaining quality values for a given sequence increases. Some 
records include quality values.

I typically use bp_fetch.pl to grab a sequence from genbank:

bp_fetch.pl -fmt fasta net::genbank:AC207960

sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't designed 
to pull down quals evidently:

bp_fetch.pl -fmt qual net::genbank:AC207960

gives:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual object 
to write_seq() as a parameter named "source"
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::qual::write_seq 
/usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205
STACK: /usr/local/perl/bin/bp_fetch.pl:313
-----------------------------------------------------------

(running under bioperl 1.5.2)

The quality values for this accession are in genbank as these URLs 
demonstrate:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual

What is the best way to pull down these qual values? They aren't present 
in "GenBank(Full)" format. They are present in an ASN.1 format.

Advice would be appreciated.

-- 
Phillip
Purdue Genomics Core Facility






From cjfields at uiuc.edu  Fri Jan 11 12:09:40 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 Jan 2008 11:09:40 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <478797CE.9050202@purdue.edu>
References: <478797CE.9050202@purdue.edu>
Message-ID: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>

I don't think this is possible with the current setup for  
Bio::DB::GenBank (which the script uses).  We'll have to investigate  
whether it is possible to retrieve this data via NCBI's eutils; if so  
we can try adding it in.  If you want you can submit this as an  
enhancement request via bugzilla for tracking:

http://bugzilla.open-bio.org/

chris

On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:

> No problem getting sequence from genbank via a myriad of methods.  
> But as the volume of non-finished sequence in genbank increases the  
> importance of also obtaining quality values for a given sequence  
> increases. Some records include quality values.
>
> I typically use bp_fetch.pl to grab a sequence from genbank:
>
> bp_fetch.pl -fmt fasta net::genbank:AC207960
>
> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't  
> designed to pull down quals evidently:
>
> bp_fetch.pl -fmt qual net::genbank:AC207960
>
> gives:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual  
> object to write_seq() as a parameter named "source"
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ 
> 5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::qual::write_seq /usr/local/perl_5.8/lib/site_perl/ 
> 5.8.8/Bio/SeqIO/qual.pm:205
> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> -----------------------------------------------------------
>
> (running under bioperl 1.5.2)
>
> The quality values for this accession are in genbank as these URLs  
> demonstrate:
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual
>
> What is the best way to pull down these qual values? They aren't  
> present in "GenBank(Full)" format. They are present in an ASN.1  
> format.
>
> Advice would be appreciated.
>
> -- 
> Phillip
> Purdue Genomics Core Facility
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From MEC at stowers-institute.org  Fri Jan 11 14:14:10 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 11 Jan 2008 13:14:10 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
Message-ID: 

Indeed eutil is capable of this

The following use of my ncbi_eutil (attached) script yeilds what you
want:

ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

It depends on the version of NCBI_PowerScripting.pm , such as is
included in 

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Friday, January 11, 2008 11:10 AM
> To: Phillip San Miguel
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> I don't think this is possible with the current setup for 
> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate whether it is possible to retrieve this data via 
> NCBI's eutils; if so we can try adding it in.  If you want 
> you can submit this as an enhancement request via bugzilla 
> for tracking:
> 
> http://bugzilla.open-bio.org/
> 
> chris
> 
> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> 
> > No problem getting sequence from genbank via a myriad of methods.  
> > But as the volume of non-finished sequence in genbank increases the 
> > importance of also obtaining quality values for a given sequence 
> > increases. Some records include quality values.
> >
> > I typically use bp_fetch.pl to grab a sequence from genbank:
> >
> > bp_fetch.pl -fmt fasta net::genbank:AC207960
> >
> > sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> > designed to pull down quals evidently:
> >
> > bp_fetch.pl -fmt qual net::genbank:AC207960
> >
> > gives:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> > object to write_seq() as a parameter named "source"
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> > 5.8.8/Bio/Root/Root.pm:359
> > STACK: Bio::SeqIO::qual::write_seq 
> /usr/local/perl_5.8/lib/site_perl/
> > 5.8.8/Bio/SeqIO/qual.pm:205
> > STACK: /usr/local/perl/bin/bp_fetch.pl:313
> > -----------------------------------------------------------
> >
> > (running under bioperl 1.5.2)
> >
> > The quality values for this accession are in genbank as these URLs
> > demonstrate:
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
> > 4937460&dopt=fasta
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
> > 4937460&dopt=qual
> >
> > What is the best way to pull down these qual values? They aren't 
> > present in "GenBank(Full)" format. They are present in an ASN.1 
> > format.
> >
> > Advice would be appreciated.
> >
> > --
> > Phillip
> > Purdue Genomics Core Facility
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From pmiguel at purdue.edu  Fri Jan 11 14:33:13 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 14:33:13 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: 
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
Message-ID: <4787C479.8070600@purdue.edu>

Hi Malcolm,
    Looks like your email was (inadvertantly?) redacted in some way. (No 
attachment and last sentence truncated.) Would it be possible to get a 
complete version so I can be sure I'm following you?
Thanks,
Phillip

Cook, Malcolm wrote:
> Indeed eutil is capable of this
>
> The following use of my ncbi_eutil (attached) script yeilds what you
> want:
>
> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
> AC207960.qual
>
> It depends on the version of NCBI_PowerScripting.pm , such as is
> included in 
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Chris Fields
>> Sent: Friday, January 11, 2008 11:10 AM
>> To: Phillip San Miguel
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>> files from Genbank?
>>
>> I don't think this is possible with the current setup for 
>> Bio::DB::GenBank (which the script uses).  We'll have to 
>> investigate whether it is possible to retrieve this data via 
>> NCBI's eutils; if so we can try adding it in.  If you want 
>> you can submit this as an enhancement request via bugzilla 
>> for tracking:
>>
>> http://bugzilla.open-bio.org/
>>
>> chris
>>
>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>>
>>     
>>> No problem getting sequence from genbank via a myriad of methods.  
>>> But as the volume of non-finished sequence in genbank increases the 
>>> importance of also obtaining quality values for a given sequence 
>>> increases. Some records include quality values.
>>>
>>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>>
>>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>>
>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>>> designed to pull down quals evidently:
>>>
>>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>>
>>> gives:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>>> object to write_seq() as a parameter named "source"
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
>>> 5.8.8/Bio/Root/Root.pm:359
>>> STACK: Bio::SeqIO::qual::write_seq 
>>>       
>> /usr/local/perl_5.8/lib/site_perl/
>>     
>>> 5.8.8/Bio/SeqIO/qual.pm:205
>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>>> -----------------------------------------------------------
>>>
>>> (running under bioperl 1.5.2)
>>>
>>> The quality values for this accession are in genbank as these URLs
>>> demonstrate:
>>>
>>>
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>>     
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
>>     
>>> 4937460&dopt=fasta
>>>
>>>
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
>>     
>>> 4937460&dopt=qual
>>>
>>> What is the best way to pull down these qual values? They aren't 
>>> present in "GenBank(Full)" format. They are present in an ASN.1 
>>> format.
>>>
>>> Advice would be appreciated.
>>>
>>> --
>>> Phillip
>>> Purdue Genomics Core Facility
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>   



From pmiguel at purdue.edu  Fri Jan 11 14:37:24 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 14:37:24 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
Message-ID: <4787C574.8020003@purdue.edu>

Hi Chris,
Thanks. I have submitted this as an enhancement request to bugzilla.
Phillip

Chris Fields wrote:
> I don't think this is possible with the current setup for 
> Bio::DB::GenBank (which the script uses).  We'll have to investigate 
> whether it is possible to retrieve this data via NCBI's eutils; if so 
> we can try adding it in.  If you want you can submit this as an 
> enhancement request via bugzilla for tracking:
>
> http://bugzilla.open-bio.org/
>
> chris
>
> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>
>> No problem getting sequence from genbank via a myriad of methods. But 
>> as the volume of non-finished sequence in genbank increases the 
>> importance of also obtaining quality values for a given sequence 
>> increases. Some records include quality values.
>>
>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>
>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>
>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>> designed to pull down quals evidently:
>>
>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>
>> gives:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>> object to write_seq() as a parameter named "source"
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::SeqIO::qual::write_seq 
>> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205
>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>> -----------------------------------------------------------
>>
>> (running under bioperl 1.5.2)
>>
>> The quality values for this accession are in genbank as these URLs 
>> demonstrate:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta 
>>
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual 
>>
>>
>> What is the best way to pull down these qual values? They aren't 
>> present in "GenBank(Full)" format. They are present in an ASN.1 format.
>>
>> Advice would be appreciated.
>>
>> -- 
>> Phillip
>> Purdue Genomics Core Facility
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>



From pmiguel at purdue.edu  Fri Jan 11 15:46:59 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 15:46:59 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: 
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
	<4787C479.8070600@purdue.edu>
	
Message-ID: <4787D5C3.1030308@purdue.edu>

Hi Malcolm,
Yes that works great!
Well, one caveat:
    If you download both the fasta and the qual files:
ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=fasta > 
AC207960.fasta
ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > 
AC207960.fasta.qual

The "primary IDs" don't match. The fasta comes out:
 >gi|154937460|gb|AC207960.1|

and the qual comes out:
 >AC207960.1

which seems to choke most programs that use seq and qual (eg 
cross_match) because they want the primary IDs of the seq and qual files 
to match.

Otherwise fine, though.
Thanks,
Phillip

Cook, Malcolm wrote:
> Phillip:
>
> Of course - mea culpa - here's the full monty....
>
> Indeed NCBI's eutils can do this:
>
>   
>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
>>     
> AC207960.qual
>
> which uses my script (attached) to wrap NCBI's eutils.
>
> It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
> by NCBI in their "Jul 24-27, 2007" course found at
> http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html
>
> I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
> very beginning so that trace messages are not printed on STDOUT, such as
> this echoed header:
> 	 Retrieving 1 records from nucleotide...
> ... and footer:
> 	Received records 1 - 1.
> 	Wrote data to -.
>
> (otherwise they are interspersed with downloaded qual files)
>
> It also depends on recent version of GetOpt::Long.
>
> Hope it helps.
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
>   
>> -----Original Message-----
>> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
>> Sent: Friday, January 11, 2008 1:33 PM
>> To: Cook, Malcolm
>> Cc: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>> files from Genbank?
>>
>> Hi Malcolm,
>>     Looks like your email was (inadvertantly?) redacted in 
>> some way. (No attachment and last sentence truncated.) Would 
>> it be possible to get a complete version so I can be sure I'm 
>> following you?
>> Thanks,
>> Phillip
>>
>> Cook, Malcolm wrote:
>>     
>>> Indeed eutil is capable of this
>>>
>>> The following use of my ncbi_eutil (attached) script yeilds what you
>>> want:
>>>
>>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
>>>       
>> rettype=qual > 
>>     
>>> AC207960.qual
>>>
>>> It depends on the version of NCBI_PowerScripting.pm , such as is 
>>> included in
>>>
>>> Malcolm Cook
>>> Database Applications Manager - Bioinformatics Stowers 
>>>       
>> Institute for 
>>     
>>> Medical Research - Kansas City, Missouri
>>>   
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
>>>> Fields
>>>> Sent: Friday, January 11, 2008 11:10 AM
>>>> To: Phillip San Miguel
>>>> Cc: bioperl-l
>>>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>>>>         
>> files from 
>>     
>>>> Genbank?
>>>>
>>>> I don't think this is possible with the current setup for 
>>>> Bio::DB::GenBank (which the script uses).  We'll have to 
>>>>         
>> investigate 
>>     
>>>> whether it is possible to retrieve this data via NCBI's 
>>>>         
>> eutils; if so 
>>     
>>>> we can try adding it in.  If you want you can submit this as an 
>>>> enhancement request via bugzilla for tracking:
>>>>
>>>> http://bugzilla.open-bio.org/
>>>>
>>>> chris
>>>>
>>>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>>>>
>>>>     
>>>>         
>>>>> No problem getting sequence from genbank via a myriad of 
>>>>>           
>> methods.  
>>     
>>>>> But as the volume of non-finished sequence in genbank 
>>>>>           
>> increases the 
>>     
>>>>> importance of also obtaining quality values for a given sequence 
>>>>> increases. Some records include quality values.
>>>>>
>>>>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>>>>
>>>>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>>>>
>>>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>>>>> designed to pull down quals evidently:
>>>>>
>>>>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>>>>
>>>>> gives:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>>>>> object to write_seq() as a parameter named "source"
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
>>>>> 5.8.8/Bio/Root/Root.pm:359
>>>>> STACK: Bio::SeqIO::qual::write_seq
>>>>>       
>>>>>           
>>>> /usr/local/perl_5.8/lib/site_perl/
>>>>     
>>>>         
>>>>> 5.8.8/Bio/SeqIO/qual.pm:205
>>>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>>>>> -----------------------------------------------------------
>>>>>
>>>>> (running under bioperl 1.5.2)
>>>>>
>>>>> The quality values for this accession are in genbank as these URLs
>>>>> demonstrate:
>>>>>
>>>>>
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
>>     
>>>> 0
>>>>     
>>>>         
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
>>     
>>>> 5
>>>>     
>>>>         
>>>>> 4937460&dopt=fasta
>>>>>
>>>>>
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
>>     
>>>> 5
>>>>     
>>>>         
>>>>> 4937460&dopt=qual
>>>>>
>>>>> What is the best way to pull down these qual values? They aren't 
>>>>> present in "GenBank(Full)" format. They are present in an ASN.1 
>>>>> format.
>>>>>
>>>>> Advice would be appreciated.
>>>>>
>>>>> --
>>>>> Phillip
>>>>> Purdue Genomics Core Facility
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>     
>>>>         
>>>   
>>>       
>>
>>     



From MEC at stowers-institute.org  Fri Jan 11 14:40:14 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 11 Jan 2008 13:40:14 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <4787C479.8070600@purdue.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
	<4787C479.8070600@purdue.edu>
Message-ID: 

Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
	 Retrieving 1 records from nucleotide...
... and footer:
	Received records 1 - 1.
	Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in 
> some way. (No attachment and last sentence truncated.) Would 
> it be possible to get a complete version so I can be sure I'm 
> following you?
> Thanks,
> Phillip
> 
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
> rettype=qual > 
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is 
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers 
> Institute for 
> > Medical Research - Kansas City, Missouri
> >   
> >
> >   
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from 
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for 
> >> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate 
> >> whether it is possible to retrieve this data via NCBI's 
> eutils; if so 
> >> we can try adding it in.  If you want you can submit this as an 
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>     
> >>> No problem getting sequence from genbank via a myriad of 
> methods.  
> >>> But as the volume of non-finished sequence in genbank 
> increases the 
> >>> importance of also obtaining quality values for a given sequence 
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>       
> >> /usr/local/perl_5.8/lib/site_perl/
> >>     
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>     
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't 
> >>> present in "GenBank(Full)" format. They are present in an ASN.1 
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>       
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >   
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ncbi_eutil
Type: application/octet-stream
Size: 1854 bytes
Desc: ncbi_eutil
URL: 

From cain.cshl at gmail.com  Mon Jan 14 13:46:39 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 14 Jan 2008 13:46:39 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
Message-ID: <1200336399.6056.12.camel@frissell>

Hi all,

Last month, I got a bug report on the GBrowse bug tracker:

  http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291

about a problem with dumping invalid GenBank files.  GBrowse uses
Bio::SeqIO::genbank to create these dumps.  

In his bug report, he claims that feature names over 15 characters long
are invalid, and provided and example GenBank file where a feature is
named 'BAC_cloned_genomic_insert', which is over 15 characters.  What I
want to know is this: is this truly a restriction on the GenBank format,
or is it a software problem with some other package?  Do we need to fix
genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
believe this is really a bug.

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From lstein at cshl.edu  Mon Jan 14 13:53:15 2008
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Jan 2008 13:53:15 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <1200336399.6056.12.camel@frissell>
References: <1200336399.6056.12.camel@frissell>
Message-ID: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>

Hi Scott,

He is correct about the limitation, but we deliberately relaxed it because
we were running into situations where we lost information during
roundtripping from other formats into genbank.

Lincoln

On Jan 14, 2008 1:46 PM, Scott Cain  wrote:

> Hi all,
>
> Last month, I got a bug report on the GBrowse bug tracker:
>
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
>
> about a problem with dumping invalid GenBank files.  GBrowse uses
> Bio::SeqIO::genbank to create these dumps.
>
> In his bug report, he claims that feature names over 15 characters long
> are invalid, and provided and example GenBank file where a feature is
> named 'BAC_cloned_genomic_insert', which is over 15 characters.  What I
> want to know is this: is this truly a restriction on the GenBank format,
> or is it a software problem with some other package?  Do we need to fix
> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> believe this is really a bug.
>
> Thanks,
> Scott
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Mon Jan 14 14:35:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 14 Jan 2008 13:35:46 -0600
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
Message-ID: 

It looks like the keys in the feature table run into the location  
string w/o intervening space, which would probably cause havoc with  
roundtripping from this output.  A few examples:

      BAC_cloned_genomic_insert<1..>1000
      combined_genscanjoin(<1..347,400..498,794..>1000)
      splign_na_dbEST_ncbi<1..>1000

I would think at least a space in between the location and the key  
would be required for round-tripping out of genbank format.

chris

On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:

> Hi Scott,
>
> He is correct about the limitation, but we deliberately relaxed it  
> because
> we were running into situations where we lost information during
> roundtripping from other formats into genbank.
>
> Lincoln
>
> On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
>
>> Hi all,
>>
>> Last month, I got a bug report on the GBrowse bug tracker:
>>
>>
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
>>
>> about a problem with dumping invalid GenBank files.  GBrowse uses
>> Bio::SeqIO::genbank to create these dumps.
>>
>> In his bug report, he claims that feature names over 15 characters  
>> long
>> are invalid, and provided and example GenBank file where a feature is
>> named 'BAC_cloned_genomic_insert', which is over 15 characters.   
>> What I
>> want to know is this: is this truly a restriction on the GenBank  
>> format,
>> or is it a software problem with some other package?  Do we need to  
>> fix
>> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
>> believe this is really a bug.
>>
>> Thanks,
>> Scott
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From lstein at cshl.edu  Mon Jan 14 14:46:20 2008
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Jan 2008 14:46:20 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: 
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
	
Message-ID: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>

That's a new bug. The version I worked on inserted a space after the name.

Lincoln

On Jan 14, 2008 2:35 PM, Chris Fields  wrote:

> It looks like the keys in the feature table run into the location
> string w/o intervening space, which would probably cause havoc with
> roundtripping from this output.  A few examples:
>
>      BAC_cloned_genomic_insert<1..>1000
>      combined_genscanjoin(<1..347,400..498,794..>1000)
>      splign_na_dbEST_ncbi<1..>1000
>
> I would think at least a space in between the location and the key
> would be required for round-tripping out of genbank format.
>
> chris
>
> On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:
>
> > Hi Scott,
> >
> > He is correct about the limitation, but we deliberately relaxed it
> > because
> > we were running into situations where we lost information during
> > roundtripping from other formats into genbank.
> >
> > Lincoln
> >
> > On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
> >
> >> Hi all,
> >>
> >> Last month, I got a bug report on the GBrowse bug tracker:
> >>
> >>
> >>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
> >>
> >> about a problem with dumping invalid GenBank files.  GBrowse uses
> >> Bio::SeqIO::genbank to create these dumps.
> >>
> >> In his bug report, he claims that feature names over 15 characters
> >> long
> >> are invalid, and provided and example GenBank file where a feature is
> >> named 'BAC_cloned_genomic_insert', which is over 15 characters.
> >> What I
> >> want to know is this: is this truly a restriction on the GenBank
> >> format,
> >> or is it a software problem with some other package?  Do we need to
> >> fix
> >> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> >> believe this is really a bug.
> >>
> >> Thanks,
> >> Scott
> >>
> >> --
> >>
> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.
> cain.cshl at gmail.com
> >> GMOD Coordinator (http://www.gmod.org/)
> >> 216-392-3087
> >> Cold Spring Harbor Laboratory
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From diogoat at gmail.com  Tue Jan 15 08:40:10 2008
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 Jan 2008 11:40:10 -0200
Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS
Message-ID: <638512560801150540m108db442r227d82c709a954@mail.gmail.com>

Hello,

I want to extract protein_id and transcript from a CDS tag, from genome in
genbak format but i have one problem, when the sequence in the file don't
have the protein_id or the transcript the script gives me this error:

------------- EXCEPTION  -------------
MSG: asking for tag value that does not exist protein_id
STACK Bio::SeqFeature::Generic::get_tag_values
/usr/share/perl5/Bio/SeqFeature/Generic.pm:504
STACK toplevel parser_cds.pl:25
--------------------------------------

Bellow I past the script

##############################################
use Bio::SeqIO;
use warnings;

my $infile = $ARGV[0];
my $outfile = "$infile.out";
open (OUT, ">>$outfile");

          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                      '-format' => 'Genbank');

         while (my $inseq = $seq_in->next_seq) {

        for my $feat_object ($inseq->get_SeqFeatures){
            if ($feat_object->primary_tag eq "CDS"){
                print OUT $feat_object->get_tag_values('protein_id')," ";
            print OUT $feat_object->get_tag_values('translation'),"\n";
        }
    }
}
###############################################

Somebody can helps me?

Thank

Diogo Tschoeke


From Marc.Logghe at ablynx.com  Tue Jan 15 09:44:54 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Tue, 15 Jan 2008 15:44:54 +0100
Subject: [Bioperl-l] Problem to extract protein_id and transcript from
	CDS
In-Reply-To: <638512560801150540m108db442r227d82c709a954@mail.gmail.com>
Message-ID: <03C512635899144083CADB0EE2220189013E2BEC@alpaca.lan.ablynx.com>

Hi,
Try testing for existence first using the has_tag() method.
It is provided by Bio::AnnotatableI.

print OUT $feat_object->get_tag_values('protein_id')," " if
($feat->has_tag('protein_id'));


HTH,
Marc

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Diogo Tschoeke
> Sent: dinsdag 15 januari 2008 14:40
> To: Bioperl-list
> Subject: [Bioperl-l] Problem to extract protein_id and transcript from
CDS
> 
> Hello,
> 
> I want to extract protein_id and transcript from a CDS tag, from
genome in
> genbak format but i have one problem, when the sequence in the file
don't
> have the protein_id or the transcript the script gives me this error:
> 
> ------------- EXCEPTION  -------------
> MSG: asking for tag value that does not exist protein_id
> STACK Bio::SeqFeature::Generic::get_tag_values
> /usr/share/perl5/Bio/SeqFeature/Generic.pm:504
> STACK toplevel parser_cds.pl:25
> --------------------------------------
> 
> Bellow I past the script
> 
> ##############################################
> use Bio::SeqIO;
> use warnings;
> 
> my $infile = $ARGV[0];
> my $outfile = "$infile.out";
> open (OUT, ">>$outfile");
> 
>           my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => 'Genbank');
> 
>          while (my $inseq = $seq_in->next_seq) {
> 
>         for my $feat_object ($inseq->get_SeqFeatures){
>             if ($feat_object->primary_tag eq "CDS"){
>                 print OUT $feat_object->get_tag_values('protein_id'),"
";
>             print OUT
$feat_object->get_tag_values('translation'),"\n";
>         }
>     }
> }
> ###############################################
> 
> Somebody can helps me?
> 
> Thank
> 
> Diogo Tschoeke
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cuiw at ncbi.nlm.nih.gov  Tue Jan 15 11:50:53 2008
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Tue, 15 Jan 2008 11:50:53 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
References: <478797CE.9050202@purdue.edu><14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu><4787C479.8070600@purdue.edu>
	
Message-ID: <18C407FD4FFB424292D769FBD68C1987048E95CC@NIHCESMLBX8.nih.gov>

There is an alternative way if you can download and compile NCBI C++ Toolkit (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/2007/Aug_27_2007/) . Simply call the binary like:
 
id1_fetch -fmt quality -gi 13508865
 
Wenwu Cui

________________________________

From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Fri 1/11/2008 2:40 PM
To: Phillip San Miguel
Cc: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] Recommended way to download qual files from Genbank?



Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
         Retrieving 1 records from nucleotide...
... and footer:
        Received records 1 - 1.
        Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu]
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from Genbank?
>
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in
> some way. (No attachment and last sentence truncated.) Would
> it be possible to get a complete version so I can be sure I'm
> following you?
> Thanks,
> Phillip
>
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch
> rettype=qual >
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers
> Institute for
> > Medical Research - Kansas City, Missouri
> >  
> >
> >  
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for
> >> Bio::DB::GenBank (which the script uses).  We'll have to
> investigate
> >> whether it is possible to retrieve this data via NCBI's
> eutils; if so
> >> we can try adding it in.  If you want you can submit this as an
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>    
> >>> No problem getting sequence from genbank via a myriad of
> methods. 
> >>> But as the volume of non-finished sequence in genbank
> increases the
> >>> importance of also obtaining quality values for a given sequence
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>      
> >> /usr/local/perl_5.8/lib/site_perl/
> >>    
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>    
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't
> >>> present in "GenBank(Full)" format. They are present in an ASN.1
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>      
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>    
> >
> >  
>
>
>





From singhal at berkeley.edu  Tue Jan 15 17:50:12 2008
From: singhal at berkeley.edu (Sonal Singhal)
Date: Tue, 15 Jan 2008 14:50:12 -0800
Subject: [Bioperl-l] redundant sequences
Message-ID: 

Hi all,

I am mining a few genomes to find all the genes in a gene family, and
of course multiple BLAST searches of different paralogs are returning
a lot of redundant hits.   I have searched the BioPerl documentation,
and I cannot find an easy way to cluster and then purge redundant
sequences.  Any ideas?

Cheers,
sonal


From MEC at stowers-institute.org  Tue Jan 15 18:21:00 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 15 Jan 2008 17:21:00 -0600
Subject: [Bioperl-l] redundant sequences
In-Reply-To: 
References: 
Message-ID: 

Cd-hit: http://bioinformatics.burnham.org/cd-hi/

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Sonal Singhal
> Sent: Tuesday, January 15, 2008 4:50 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] redundant sequences
> 
> Hi all,
> 
> I am mining a few genomes to find all the genes in a gene 
> family, and of course multiple BLAST searches of different 
> paralogs are returning
> a lot of redundant hits.   I have searched the BioPerl documentation,
> and I cannot find an easy way to cluster and then purge 
> redundant sequences.  Any ideas?
> 
> Cheers,
> sonal
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From cain.cshl at gmail.com  Tue Jan 15 21:24:50 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 15 Jan 2008 21:24:50 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
	
	<6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>
Message-ID: <1200450290.7276.3.camel@frissell>

Hi Chris and Lincoln,

I've attached my suggested patch.  So, can I use svn to check it in?  It
only adds a space after the feature type name; I suspect that will be
enough to fix the file format for most uses.

Scott

On Mon, 2008-01-14 at 14:46 -0500, Lincoln Stein wrote:
> That's a new bug. The version I worked on inserted a space after the name.
> 
> Lincoln
> 
> On Jan 14, 2008 2:35 PM, Chris Fields  wrote:
> 
> > It looks like the keys in the feature table run into the location
> > string w/o intervening space, which would probably cause havoc with
> > roundtripping from this output.  A few examples:
> >
> >      BAC_cloned_genomic_insert<1..>1000
> >      combined_genscanjoin(<1..347,400..498,794..>1000)
> >      splign_na_dbEST_ncbi<1..>1000
> >
> > I would think at least a space in between the location and the key
> > would be required for round-tripping out of genbank format.
> >
> > chris
> >
> > On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:
> >
> > > Hi Scott,
> > >
> > > He is correct about the limitation, but we deliberately relaxed it
> > > because
> > > we were running into situations where we lost information during
> > > roundtripping from other formats into genbank.
> > >
> > > Lincoln
> > >
> > > On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
> > >
> > >> Hi all,
> > >>
> > >> Last month, I got a bug report on the GBrowse bug tracker:
> > >>
> > >>
> > >>
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
> > >>
> > >> about a problem with dumping invalid GenBank files.  GBrowse uses
> > >> Bio::SeqIO::genbank to create these dumps.
> > >>
> > >> In his bug report, he claims that feature names over 15 characters
> > >> long
> > >> are invalid, and provided and example GenBank file where a feature is
> > >> named 'BAC_cloned_genomic_insert', which is over 15 characters.
> > >> What I
> > >> want to know is this: is this truly a restriction on the GenBank
> > >> format,
> > >> or is it a software problem with some other package?  Do we need to
> > >> fix
> > >> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> > >> believe this is really a bug.
> > >>
> > >> Thanks,
> > >> Scott
> > >>
> > >> --
> > >>
> > ------------------------------------------------------------------------
> > >> Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > >> GMOD Coordinator (http://www.gmod.org/)
> > >> 216-392-3087
> > >> Cold Spring Harbor Laboratory
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >
> > >
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> >
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: genbank.pm.patch
Type: text/x-patch
Size: 1110 bytes
Desc: not available
URL: 

From cjfields at uiuc.edu  Tue Jan 15 22:15:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 15 Jan 2008 21:15:51 -0600
Subject: [Bioperl-l] Subversion migration complete
Message-ID: 

On behalf of the BioPerl core developers, I am proud to announce that  
the BioPerl SVN migration has been completed.  We would like to thank  
everyone who helped, in particular George Hartzell and Chris  
Dagdigian, both of who played instrumental roles in the CVS->SVN  
conversion and anonymous SVN setup for BioPerl.

Anonymous SVN checkouts for bioperl-live are now possible using:
svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live

Developers can obtain a checkout from:
svn co svn+ssh://USER at dev.open-bio.org/home/svn-repositories/bioperl/ 
bioperl-live/trunk bioperl-live

Browsable repository:
http://code.open-bio.org/svnweb/index.cgi/bioperl/

Basic instructions:
http://www.bioperl.org/wiki/Using_Subversion

We are still in the midst of implementing a few extra details related  
to SVN migration; the status on these can be viewed here:
http://www.bioperl.org/wiki/CVS_to_SVN_Migration

Enjoy!

chris



From bug-bioperl at rt.cpan.org  Wed Jan 16 22:35:30 2008
From: bug-bioperl at rt.cpan.org (Chris Fields via RT)
Date: Wed, 16 Jan 2008 22:35:30 -0500
Subject: [Bioperl-l] [rt.cpan.org #29533] Bio::SeqIO::interpro depends on
	XML::DOM::XPath
In-Reply-To: 
References:   
	
Message-ID: 


       Queue: bioperl
 Ticket 

On Fri Sep 21 10:28:52 2007, support at helpdesk.open-bio.org wrote:
> Hi Mike,
> 
> The proper place to submit this fix is the bioperl-l at lists.open-bio.org
> mailing list or the OBF Bugzilla queue at:
> http://bugzilla.open-bio.org/, this RT system is mainly for sysadmin
> activities rather than for tracking code changes. Would you be so kind
> to re-send your request to one of the places above? Thanks for the heads
> up! :)
> 
> Regards,
> Mauricio.

This has been fixed.  I'll get the CPAN maintainer to close this out.


From vipingjo at gmail.com  Thu Jan 17 03:48:36 2008
From: vipingjo at gmail.com (viping)
Date: Thu, 17 Jan 2008 16:48:36 +0800
Subject: [Bioperl-l] Can't locate object method "is_compatible" via package
	"Bio::Tree::Tree"
Message-ID: <200801171648332965577@gmail.com>

Hi Everyone??

I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + Windows XP SP2.
When running example codes(attched below as t.pl) within Bio\Tree\Compatible.pm , I got this error:

Can't locate object method "is_compatible" via package "Bio::Tree::Tree"

I replaced "$t1->is_compatible($t2)" with "is_compatible Bio::Tree::Compatible ($t1,$t2)", the error changed:
Can't locate object method "get_nodes" via package "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252,  line 1.

I modified Compatible.pm, changed code for "get_nodes" like this "get_nodes Bio::Tree::Tree($self);", new error arised :
Can't use string ("Bio::Tree::Tree") as a HASH ref while "strict refs" in use at i:/Perl/site/lib/Bio\Tree\Tree.pm line 198,  line 1.

I gived up. Any help will be deeply appreciated.




# this is the example script in Bio::Tree::Compatible??t.pl
  use Bio::Tree::Compatible;
  use Bio::TreeIO;
  my $input = new Bio::TreeIO('-format' => 'newick',
                              '-file'   => 'input.tre');
  my $t1 = $input->next_tree;
  my $t2 = $input->next_tree;

  my ($incompat, $ilabels, $inodes) = $t1->is_compatible($t2);
  if ($incompat) {
    my %cluster1 = %{ $t1->cluster_representation };
    my %cluster2 = %{ $t2->cluster_representation };
    print "incompatible trees\n";
    if (scalar(@$ilabels)) {
      foreach my $label (@$ilabels) {
        my $node1 = $t1->find_node(-id => $label);
        my $node2 = $t2->find_node(-id => $label);
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "label $label";
        print " cluster"; map { print " ",$_ } @c1;
        print " cluster"; map { print " ",$_ } @c2; print "\n";
      }
    }
    if (scalar(@$inodes)) {
      while (@$inodes) {
        my $node1 = shift @$inodes;
        my $node2 = shift @$inodes;
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "cluster"; map { print " ",$_ } @c1;
        print " properly intersects cluster";
        map { print " ",$_ } @c2; print "\n";
      }
    }
  } else {
    print "compatible trees\n";
  }

__END__;

# this is the file 'input.tre':
(((A,B)C,D),(E,F,G));
((A,B)H,E,(J,(K)G)I);

# this is the full messages I got running like this: "perl.exe -w t.pl"
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96.
Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145.
Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162.
Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196.
Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211.
Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257.
Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278.
Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314.
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100.
Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152.
Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190.
Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252.
Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300.
Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334.
Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375.
Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399.
Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449.
Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491.
Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505.
Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526.
Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552.
Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577.
Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597.
Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617.
Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637.
Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653.
Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669.
Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685.
Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690.
Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717.
Can't locate object method "is_compatible" via package "Bio::Tree::Tree" at Z:\bp\t.pl line 8,  line 2.




From bix at sendu.me.uk  Thu Jan 17 06:18:56 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 17 Jan 2008 11:18:56 +0000
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
 package "Bio::Tree::Tree"
In-Reply-To: <200801171648332965577@gmail.com>
References: <200801171648332965577@gmail.com>
Message-ID: <478F39A0.2030508@sendu.me.uk>

viping wrote:
> Hi Everyone??
> 
> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + 
> Windows XP SP2. When running example codes(attched below as t.pl) 
> within Bio\Tree\Compatible.pm , I got this error:
> 
> Can't locate object method "is_compatible" via package 
> "Bio::Tree::Tree"
> 
> I replaced "$t1->is_compatible($t2)" with "is_compatible 
> Bio::Tree::Compatible ($t1,$t2)",

Yup, you had the right idea; unfortunately the synopsis code for
Bio::Tree::Compatible is wrong.
I've now fixed it in svn.


> the error changed: Can't locate object method "get_nodes" via package
>  "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm 
> line 252,  line 1.

I didn't get quite that error; instead I had an issue with TreeIO: for
whatever reason it is only returning one tree from your input file (ie.
$t2 is undefined).

I therefore got "Can't call method "get_nodes" on an undefined value [...]"

Can someone look into/confirm that?



From bix at sendu.me.uk  Thu Jan 17 06:35:57 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 17 Jan 2008 11:35:57 +0000
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
 package "Bio::Tree::Tree"
In-Reply-To: <478F39A0.2030508@sendu.me.uk>
References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk>
Message-ID: <478F3D9D.6050306@sendu.me.uk>

Sendu Bala wrote:
>> the error changed: Can't locate object method "get_nodes" via
>> package "Bio::Tree::Compatible" at
>> i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252,  line 1.
> 
> I didn't get quite that error; instead I had an issue with TreeIO:
> for whatever reason it is only returning one tree from your input
> file (ie. $t2 is undefined).
> 
> I therefore got "Can't call method "get_nodes" on an undefined value
> [...]"
> 
> Can someone look into/confirm that?

... Yeah, I think I'm losing my mind. The code below is 'ok' using the
commented out -fh input for TreeIO, but is 'not ok' using the -file
input, where the specified file contains the exact same data as
__DATA__. Huh?


#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::Tree::Compatible;
use Bio::TreeIO;
my $input = new Bio::TreeIO('-format' => 'newick',
                             #-fh      => \*DATA,
                             -file    => 'input.tre'
                             );
my $t1 = $input->next_tree;
my $t2 = $input->next_tree;

if ($t2) {
    print "ok\n";
}
else {
    print "not ok\n";
}

__DATA__
(((A,B)C,D),(E,F,G));
((A,B)H,E,(J,(K)G)I);




From vipingjo at gmail.com  Thu Jan 17 08:23:14 2008
From: vipingjo at gmail.com (viping)
Date: Thu, 17 Jan 2008 21:23:14 +0800
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
	package"Bio::Tree::Tree"
References: <200801171648332965577@gmail.com>, <478F39A0.2030508@sendu.me.uk>
Message-ID: <200801172123112184046@gmail.com>

I got latest  code modified by Sendu Bala vi SVN. It works well while "input.tre" and "t.pl" are in the same directory. Thank you, Sendu Bala.  

This is output:
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96.
Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145.
Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162.
Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196.
Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211.
Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257.
Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278.
Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314.
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100.
Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152.
Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190.
Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252.
Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300.
Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334.
Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375.
Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399.
Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449.
Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491.
Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505.
Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526.
Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552.
Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577.
Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597.
Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617.
Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637.
Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653.
Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669.
Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685.
Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690.
Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717.
incompatible trees
label G cluster G cluster G K
cluster A B C properly intersects cluster A B H
cluster A B C properly intersects cluster A B E G H I J K
cluster A B C D properly intersects cluster A B H
cluster A B C D properly intersects cluster A B E G H I J K
cluster E F G properly intersects cluster G K
cluster E F G properly intersects cluster G I J K
cluster E F G properly intersects cluster A B E G H I J K
cluster A B C D E F G properly intersects cluster A B H
cluster A B C D E F G properly intersects cluster G K
cluster A B C D E F G properly intersects cluster G I J K
cluster A B C D E F G properly intersects cluster A B E G H I J K

#this is latest code:
  use Bio::Tree::Compatible;
  use Bio::TreeIO;
  my $input = Bio::TreeIO->new('-format' => 'newick',
                               '-file'   => 'input.tre');
  my $t1 = $input->next_tree;
  my $t2 = $input->next_tree;

  my ($incompat, $ilabels, $inodes) = Bio::Tree::Compatible::is_compatible($t1,$t2);
  if ($incompat) {
    my %cluster1 = %{ Bio::Tree::Compatible::cluster_representation($t1) };
    my %cluster2 = %{ Bio::Tree::Compatible::cluster_representation($t2) };
    print "incompatible trees\n";
    if (scalar(@$ilabels)) {
      foreach my $label (@$ilabels) {
        my $node1 = $t1->find_node(-id => $label);
        my $node2 = $t2->find_node(-id => $label);
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "label $label";
        print " cluster"; map { print " ",$_ } @c1;
        print " cluster"; map { print " ",$_ } @c2; print "\n";
      }
    }
    if (scalar(@$inodes)) {
      while (@$inodes) {
        my $node1 = shift @$inodes;
        my $node2 = shift @$inodes;
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "cluster"; map { print " ",$_ } @c1;
        print " properly intersects cluster";
        map { print " ",$_ } @c2; print "\n";
      }
    }
  } else {
    print "compatible trees\n";
  }


------------------				 
viping
2008-01-17

-------------------------------------------------------------
From: Sendu Bala
Date: 2008-01-17 19:19:30
To: viping
Cc: bioperl-l
Subject: Re: [Bioperl-l] Can't locate object method "is_compatible" via package"Bio::Tree::Tree"

viping wrote:
> Hi Everyone??
> 
> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + 
> Windows XP SP2. When running example codes(attched below as t.pl) 
> within Bio\Tree\Compatible.pm , I got this error:
> 
> Can't locate object method "is_compatible" via package 
> "Bio::Tree::Tree"
> 
> I replaced "$t1->is_compatible($t2)" with "is_compatible 
> Bio::Tree::Compatible ($t1,$t2)",

Yup, you had the right idea; unfortunately the synopsis code for
Bio::Tree::Compatible is wrong.
I've now fixed it in svn.


> the error changed: Can't locate object method "get_nodes" via package
>  "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm 
> line 252,  line 1.

I didn't get quite that error; instead I had an issue with TreeIO: for
whatever reason it is only returning one tree from your input file (ie.
$t2 is undefined).

I therefore got "Can't call method "get_nodes" on an undefined value [...]"

Can someone look into/confirm that?



From cjfields at uiuc.edu  Thu Jan 17 08:25:41 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 17 Jan 2008 07:25:41 -0600
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
	package "Bio::Tree::Tree"
In-Reply-To: <478F39A0.2030508@sendu.me.uk>
References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk>
Message-ID: <7BF3650B-F1D4-4F21-9C59-3AC13CA35945@uiuc.edu>

Probably need to file this as a bug.  There is a similar issue with  
Bio::TreeIO::nexus, but it probably isn't related unless it is using  
the same parsing logic:

http://bugzilla.open-bio.org/show_bug.cgi?id=2356

chris

On Jan 17, 2008, at 5:18 AM, Sendu Bala wrote:

> viping wrote:
>> Hi Everyone?
>>
>> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 +
>> Windows XP SP2. When running example codes(attched below as t.pl)
>> within Bio\Tree\Compatible.pm , I got this error:
>>
>> Can't locate object method "is_compatible" via package
>> "Bio::Tree::Tree"
>>
>> I replaced "$t1->is_compatible($t2)" with "is_compatible
>> Bio::Tree::Compatible ($t1,$t2)",
>
> Yup, you had the right idea; unfortunately the synopsis code for
> Bio::Tree::Compatible is wrong.
> I've now fixed it in svn.
>
>
>> the error changed: Can't locate object method "get_nodes" via package
>> "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm
>> line 252,  line 1.
>
> I didn't get quite that error; instead I had an issue with TreeIO: for
> whatever reason it is only returning one tree from your input file  
> (ie.
> $t2 is undefined).
>
> I therefore got "Can't call method "get_nodes" on an undefined value  
> [...]"
>
> Can someone look into/confirm that?
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






From N.Haigh at sheffield.ac.uk  Fri Jan 18 07:47:48 2008
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 18 Jan 2008 12:47:48 +0000
Subject: [Bioperl-l] Parsing Primer3 output
Message-ID: <1200660468.47909ff498dd0@webmail.shef.ac.uk>

I might be overlooking something, but is it possible to parse primer3 output?

Cheers
Nath



From cjfields at uiuc.edu  Fri Jan 18 08:27:47 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 Jan 2008 07:27:47 -0600
Subject: [Bioperl-l] Parsing Primer3 output
In-Reply-To: <1200660468.47909ff498dd0@webmail.shef.ac.uk>
References: <1200660468.47909ff498dd0@webmail.shef.ac.uk>
Message-ID: <8C8BF818-FC04-42E3-9210-3FE23F92EA8F@uiuc.edu>

Bio::Tools::Primer3.

chris

On Jan 18, 2008, at 6:47 AM, Nathan S. Haigh wrote:

> I might be overlooking something, but is it possible to parse  
> primer3 output?
>
> Cheers
> Nath
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hangsyin at gmail.com  Sat Jan 19 13:25:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sat, 19 Jan 2008 10:25:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined
 value at BIO::DB::GFF.pl
Message-ID: <14971922.post@talk.nabble.com>


Hi, everyone,

I met this problem when I was running this script to extract features
overlaps with 4:20,000..25,000. It always responds like "Can't call method
"features" on an undefined value at BIO::DB::GFF.pl line XX".
==============================================================
use Bio::DB::GFF;
use Bio::Tools::GFF;
my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
                                        -dsn =>
'dbi:mysql:dmel_gff:localhost',
                                        -user => 'XXXX',
                                        -pass => 'XXXX') || die "database
open failed";

my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
my @features = $segment->features(-types => ['gene', 'exon', 'intron',
'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
print(scalar(@features)."\n");

================================================================
I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
Other methods failed also. 

Any help will be deeply appreciated!

Best,
Jon

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14971922.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain.cshl at gmail.com  Sat Jan 19 22:36:44 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Sat, 19 Jan 2008 22:36:44 -0500
Subject: [Bioperl-l] Problem: Can't call method "features" on
	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <14971922.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com>
Message-ID: <1200800204.6069.5.camel@frissell>

Hi Jon,

I think it's funny that you have "or die" on the database opening line,
"or die" on the @features line, but you didn't put one on the $segment
line.  Try adding "or die: $!" to the $segment line to see what it says,
also add a 'print $segment' after you create it and before you try to
get the features from it.  

Clearly, the problem is that $segment is not defined (that is, nothing
is in it, not that the wrong thing is in it).  The next trick is to find
out why.  My first guess, without looking at the data set, is that the
arm is not really named '4'.

Scott

On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> Hi, everyone,
> 
> I met this problem when I was running this script to extract features
> overlaps with 4:20,000..25,000. It always responds like "Can't call method
> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> ==============================================================
> use Bio::DB::GFF;
> use Bio::Tools::GFF;
> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>                                         -dsn =>
> 'dbi:mysql:dmel_gff:localhost',
>                                         -user => 'XXXX',
>                                         -pass => 'XXXX') || die "database
> open failed";
> 
> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> print(scalar(@features)."\n");
> 
> ================================================================
> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
> Other methods failed also. 
> 
> Any help will be deeply appreciated!
> 
> Best,
> Jon
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hangsyin at gmail.com  Sat Jan 19 22:49:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sat, 19 Jan 2008 19:49:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on
	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <1200800204.6069.5.camel@frissell>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
Message-ID: <14978241.post@talk.nabble.com>


Hi, Scott,

After adding die $!, I know something is wrong at line:
"my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);"

my gff file is like this:
##gff-version 3
##sequence-region 4 1 1351857
4	FlyBase	transposable_element	2	611	.	+	.
ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
4	repeatmasker_dummy	match	2	347	.	+	.
ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
4	repeatmasker_dummy	match_part	2	347	2367	+	.
ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
5860 6210 +;
...
...
I really got confused. Any further suggestion? Thank you!

Jon





Scott Cain-3 wrote:
> 
> Hi Jon,
> 
> I think it's funny that you have "or die" on the database opening line,
> "or die" on the @features line, but you didn't put one on the $segment
> line.  Try adding "or die: $!" to the $segment line to see what it says,
> also add a 'print $segment' after you create it and before you try to
> get the features from it.  
> 
> Clearly, the problem is that $segment is not defined (that is, nothing
> is in it, not that the wrong thing is in it).  The next trick is to find
> out why.  My first guess, without looking at the data set, is that the
> arm is not really named '4'.
> 
> Scott
> 
> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>> Hi, everyone,
>> 
>> I met this problem when I was running this script to extract features
>> overlaps with 4:20,000..25,000. It always responds like "Can't call
>> method
>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>> ==============================================================
>> use Bio::DB::GFF;
>> use Bio::Tools::GFF;
>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>                                         -dsn =>
>> 'dbi:mysql:dmel_gff:localhost',
>>                                         -user => 'XXXX',
>>                                         -pass => 'XXXX') || die "database
>> open failed";
>> 
>> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
>> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
>> print(scalar(@features)."\n");
>> 
>> ================================================================
>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
>> Other methods failed also. 
>> 
>> Any help will be deeply appreciated!
>> 
>> Best,
>> Jon
>> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14978241.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain.cshl at gmail.com  Sat Jan 19 23:08:04 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Sat, 19 Jan 2008 23:08:04 -0500
Subject: [Bioperl-l] Problem: Can't call method "features"
	on	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <14978241.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com>
	<1200800204.6069.5.camel@frissell>  <14978241.post@talk.nabble.com>
Message-ID: <1200802084.6069.11.camel@frissell>

Hi Jon,

Well, seeing the error message would be helpful, but my first guess
without is that there are a few things you can try:

  * removing the "sequence-region" line from the GFF file, adding a line
like this:

  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4

and then reloading the database.

  * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
is, with three levels of features (like gene, mRNA and CDS)).

Scott

On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
> Hi, Scott,
> 
> After adding die $!, I know something is wrong at line:
> "my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);"
> 
> my gff file is like this:
> ##gff-version 3
> ##sequence-region 4 1 1351857
> 4	FlyBase	transposable_element	2	611	.	+	.
> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
> 4	repeatmasker_dummy	match	2	347	.	+	.
> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
> 5860 6210 +;
> ...
> ...
> I really got confused. Any further suggestion? Thank you!
> 
> Jon
> 
> 
> 
> 
> 
> Scott Cain-3 wrote:
> > 
> > Hi Jon,
> > 
> > I think it's funny that you have "or die" on the database opening line,
> > "or die" on the @features line, but you didn't put one on the $segment
> > line.  Try adding "or die: $!" to the $segment line to see what it says,
> > also add a 'print $segment' after you create it and before you try to
> > get the features from it.  
> > 
> > Clearly, the problem is that $segment is not defined (that is, nothing
> > is in it, not that the wrong thing is in it).  The next trick is to find
> > out why.  My first guess, without looking at the data set, is that the
> > arm is not really named '4'.
> > 
> > Scott
> > 
> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> >> Hi, everyone,
> >> 
> >> I met this problem when I was running this script to extract features
> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
> >> method
> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> >> ==============================================================
> >> use Bio::DB::GFF;
> >> use Bio::Tools::GFF;
> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >>                                         -dsn =>
> >> 'dbi:mysql:dmel_gff:localhost',
> >>                                         -user => 'XXXX',
> >>                                         -pass => 'XXXX') || die "database
> >> open failed";
> >> 
> >> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> >> print(scalar(@features)."\n");
> >> 
> >> ================================================================
> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
> >> Other methods failed also. 
> >> 
> >> Any help will be deeply appreciated!
> >> 
> >> Best,
> >> Jon
> >> 
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hangsyin at gmail.com  Sun Jan 20 10:08:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sun, 20 Jan 2008 07:08:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features"
	on	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <1200802084.6069.11.camel@frissell>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
Message-ID: <14982665.post@talk.nabble.com>


Hi, Scott,
I tried to change sequence-region line to "4   FlyBase  chromosome_arm  1 
1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say anything but
"died at line 12".

So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to
load the dmel-all-r5.4.gff(from Flybase) to a test database:
=============================================================
use Bio::DB::SeqFeature::Store;
use Bio::DB::SeqFeature::Store::GFF3Loader;
my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
                                         -dsn     => 'dbi:mysql:test',
                                         -user    => 'root',
                                         -pass    => 'XXXXX',
                                         -write   =>  1 );
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    => $db,
                                                         -verbose  => 1);
$loader->load(./'dmel-all-r5.4.gff');
=============================================================
I got bunch of errors like this:
"DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at
C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316".
The line 1316 in mysql.pm looks like this: $sth->execute($name) or die
$sth->errstr;
I checked the database test after failed loading. There is only one table
created, which call 'meta'. I also tried 'grant all on test to
XXX at localhost' and used that -user and -pass to load gff, it didn't work
either.

Jon


Scott Cain-3 wrote:
> 
> Hi Jon,
> 
> Well, seeing the error message would be helpful, but my first guess
> without is that there are a few things you can try:
> 
>   * removing the "sequence-region" line from the GFF file, adding a line
> like this:
> 
>   4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
> 
> and then reloading the database.
> 
>   * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
> is, with three levels of features (like gene, mRNA and CDS)).
> 
> Scott
> 
> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>> Hi, Scott,
>> 
>> After adding die $!, I know something is wrong at line:
>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>> 25000);"
>> 
>> my gff file is like this:
>> ##gff-version 3
>> ##sequence-region 4 1 1351857
>> 4	FlyBase	transposable_element	2	611	.	+	.
>> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>> 4	repeatmasker_dummy	match	2	347	.	+	.
>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
>> 5860 6210 +;
>> ...
>> ...
>> I really got confused. Any further suggestion? Thank you!
>> 
>> Jon
>> 
>> 
>> 
>> 
>> 
>> Scott Cain-3 wrote:
>> > 
>> > Hi Jon,
>> > 
>> > I think it's funny that you have "or die" on the database opening line,
>> > "or die" on the @features line, but you didn't put one on the $segment
>> > line.  Try adding "or die: $!" to the $segment line to see what it
>> says,
>> > also add a 'print $segment' after you create it and before you try to
>> > get the features from it.  
>> > 
>> > Clearly, the problem is that $segment is not defined (that is, nothing
>> > is in it, not that the wrong thing is in it).  The next trick is to
>> find
>> > out why.  My first guess, without looking at the data set, is that the
>> > arm is not really named '4'.
>> > 
>> > Scott
>> > 
>> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>> >> Hi, everyone,
>> >> 
>> >> I met this problem when I was running this script to extract features
>> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
>> >> method
>> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>> >> ==============================================================
>> >> use Bio::DB::GFF;
>> >> use Bio::Tools::GFF;
>> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>> >>                                         -dsn =>
>> >> 'dbi:mysql:dmel_gff:localhost',
>> >>                                         -user => 'XXXX',
>> >>                                         -pass => 'XXXX') || die
>> "database
>> >> open failed";
>> >> 
>> >> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>> 25000);
>> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
>> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
>> >> print(scalar(@features)."\n");
>> >> 
>> >> ================================================================
>> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
>> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>> error.
>> >> Other methods failed also. 
>> >> 
>> >> Any help will be deeply appreciated!
>> >> 
>> >> Best,
>> >> Jon
>> >> 
>> > -- 
>> >
>> ------------------------------------------------------------------------
>> > Scott Cain, Ph. D.                                        
>> cain at cshl.edu
>> > GMOD Coordinator (http://www.gmod.org/)                    
>> 216-392-3087
>> > Cold Spring Harbor Laboratory
>> > 
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > 
>> > 
>> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain at cshl.edu  Sun Jan 20 10:25:16 2008
From: cain at cshl.edu (Scott Cain)
Date: Sun, 20 Jan 2008 10:25:16 -0500 (EST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an
 undefined value at BIO::DB::GFF.pl
In-Reply-To: <14982665.post@talk.nabble.com>
Message-ID: 

Jon,

There is a script for loading a SeqFeature database just like the GFF
database, though I don't know what it's called off hand (I'm not at my
normal computer right now).  Be sure to read the documentation and you
will probably want to use the 'fast' option (I don't remember what it is
called either).

Scott


----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain at cshl.edu
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Sun, 20 Jan 2008, Hang wrote:

> 
> Hi, Scott,
> I tried to change sequence-region line to "4   FlyBase  chromosome_arm  1 
> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say anything but
> "died at line 12".
> 
> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to
> load the dmel-all-r5.4.gff(from Flybase) to a test database:
> =============================================================
> use Bio::DB::SeqFeature::Store;
> use Bio::DB::SeqFeature::Store::GFF3Loader;
> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>                                          -dsn     => 'dbi:mysql:test',
>                                          -user    => 'root',
>                                          -pass    => 'XXXXX',
>                                          -write   =>  1 );
> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    => $db,
>                                                          -verbose  => 1);
> $loader->load(./'dmel-all-r5.4.gff');
> =============================================================
> I got bunch of errors like this:
> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at
> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316".
> The line 1316 in mysql.pm looks like this: $sth->execute($name) or die
> $sth->errstr;
> I checked the database test after failed loading. There is only one table
> created, which call 'meta'. I also tried 'grant all on test to
> XXX at localhost' and used that -user and -pass to load gff, it didn't work
> either.
> 
> Jon
> 
> 
> Scott Cain-3 wrote:
> > 
> > Hi Jon,
> > 
> > Well, seeing the error message would be helpful, but my first guess
> > without is that there are a few things you can try:
> > 
> >   * removing the "sequence-region" line from the GFF file, adding a line
> > like this:
> > 
> >   4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
> > 
> > and then reloading the database.
> > 
> >   * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
> > Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
> > is, with three levels of features (like gene, mRNA and CDS)).
> > 
> > Scott
> > 
> > On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
> >> Hi, Scott,
> >> 
> >> After adding die $!, I know something is wrong at line:
> >> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
> >> 25000);"
> >> 
> >> my gff file is like this:
> >> ##gff-version 3
> >> ##sequence-region 4 1 1351857
> >> 4	FlyBase	transposable_element	2	611	.	+	.
> >> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
> >> 4	repeatmasker_dummy	match	2	347	.	+	.
> >> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
> >> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
> >> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
> >> 5860 6210 +;
> >> ...
> >> ...
> >> I really got confused. Any further suggestion? Thank you!
> >> 
> >> Jon
> >> 
> >> 
> >> 
> >> 
> >> 
> >> Scott Cain-3 wrote:
> >> > 
> >> > Hi Jon,
> >> > 
> >> > I think it's funny that you have "or die" on the database opening line,
> >> > "or die" on the @features line, but you didn't put one on the $segment
> >> > line.  Try adding "or die: $!" to the $segment line to see what it
> >> says,
> >> > also add a 'print $segment' after you create it and before you try to
> >> > get the features from it.  
> >> > 
> >> > Clearly, the problem is that $segment is not defined (that is, nothing
> >> > is in it, not that the wrong thing is in it).  The next trick is to
> >> find
> >> > out why.  My first guess, without looking at the data set, is that the
> >> > arm is not really named '4'.
> >> > 
> >> > Scott
> >> > 
> >> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> >> >> Hi, everyone,
> >> >> 
> >> >> I met this problem when I was running this script to extract features
> >> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
> >> >> method
> >> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> >> >> ==============================================================
> >> >> use Bio::DB::GFF;
> >> >> use Bio::Tools::GFF;
> >> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >> >>                                         -dsn =>
> >> >> 'dbi:mysql:dmel_gff:localhost',
> >> >>                                         -user => 'XXXX',
> >> >>                                         -pass => 'XXXX') || die
> >> "database
> >> >> open failed";
> >> >> 
> >> >> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
> >> 25000);
> >> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> >> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> >> >> print(scalar(@features)."\n");
> >> >> 
> >> >> ================================================================
> >> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> >> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
> >> error.
> >> >> Other methods failed also. 
> >> >> 
> >> >> Any help will be deeply appreciated!
> >> >> 
> >> >> Best,
> >> >> Jon
> >> >> 
> >> > -- 
> >> >
> >> ------------------------------------------------------------------------
> >> > Scott Cain, Ph. D.                                        
> >> cain at cshl.edu
> >> > GMOD Coordinator (http://www.gmod.org/)                    
> >> 216-392-3087
> >> > Cold Spring Harbor Laboratory
> >> > 
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > 
> >> > 
> >> 
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
> -- 
> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From cjfields at uiuc.edu  Sun Jan 20 12:10:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 20 Jan 2008 11:10:27 -0600
Subject: [Bioperl-l] Problem: Can't call method "features" on an
	undefined value at BIO::DB::GFF.pl
In-Reply-To: 
References: 
Message-ID: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>

It's bp_seqfeature_load.pl (if you have the full bioperl core  
distribution, it's in script/Bio-SeqFeature/Store).  I had some  
problems with the fast-loading option but it was likely just my gff  
formatting; example data loaded just fine.

As for the error, you need to use the '-create' flag when initializing  
a database (or wiping data from a current one):

=============================================================
use Bio::DB::SeqFeature::Store;
use Bio::DB::SeqFeature::Store::GFF3Loader;
my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
                                         -dsn     => 'dbi:mysql:test',
                                         -user    => 'root',
                                         -pass    => 'XXXXX',
                                         -write   =>  1
                                         -create  => 1);
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>  
$db,
                                                         -verbose  =>  
1);
$loader->load(./'dmel-all-r5.4.gff');
=============================================================

chris

On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:

> Jon,
>
> There is a script for loading a SeqFeature database just like the GFF
> database, though I don't know what it's called off hand (I'm not at my
> normal computer right now).  Be sure to read the documentation and you
> will probably want to use the 'fast' option (I don't remember what  
> it is
> called either).
>
> Scott
>
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain at cshl.edu
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Sun, 20 Jan 2008, Hang wrote:
>
>>
>> Hi, Scott,
>> I tried to change sequence-region line to "4   FlyBase   
>> chromosome_arm  1
>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say  
>> anything but
>> "died at line 12".
>>
>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my  
>> code to
>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>> =============================================================
>> use Bio::DB::SeqFeature::Store;
>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>                                         -dsn     => 'dbi:mysql:test',
>>                                         -user    => 'root',
>>                                         -pass    => 'XXXXX',
>>                                         -write   =>  1 );
>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store     
>> => $db,
>>                                                         -verbose   
>> => 1);
>> $loader->load(./'dmel-all-r5.4.gff');
>> =============================================================
>> I got bunch of errors like this:
>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't  
>> exist at
>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line  
>> 1316".
>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or  
>> die
>> $sth->errstr;
>> I checked the database test after failed loading. There is only one  
>> table
>> created, which call 'meta'. I also tried 'grant all on test to
>> XXX at localhost' and used that -user and -pass to load gff, it didn't  
>> work
>> either.
>>
>> Jon
>>
>>
>> Scott Cain-3 wrote:
>>>
>>> Hi Jon,
>>>
>>> Well, seeing the error message would be helpful, but my first guess
>>> without is that there are a few things you can try:
>>>
>>>  * removing the "sequence-region" line from the GFF file, adding a  
>>> line
>>> like this:
>>>
>>>  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>
>>> and then reloading the database.
>>>
>>>  * Or, you may want to consider using Bio::DB::SeqFeature::Store,  
>>> since
>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3  
>>> (that
>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>
>>> Scott
>>>
>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>> Hi, Scott,
>>>>
>>>> After adding die $!, I know something is wrong at line:
>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>> 25000);"
>>>>
>>>> my gff file is like this:
>>>> ##gff-version 3
>>>> ##sequence-region 4 1 1351857
>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>> ID=FBti0062890;Name=ninja-Dsim- 
>>>> like 
>>>> {}4829 
>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- 
>>>> RepeatMasker;
>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>> ID=:5142029_dummy;Name=:5142029;Parent=: 
>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>> 5860 6210 +;
>>>> ...
>>>> ...
>>>> I really got confused. Any further suggestion? Thank you!
>>>>
>>>> Jon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Scott Cain-3 wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> I think it's funny that you have "or die" on the database  
>>>>> opening line,
>>>>> "or die" on the @features line, but you didn't put one on the  
>>>>> $segment
>>>>> line.  Try adding "or die: $!" to the $segment line to see what it
>>>> says,
>>>>> also add a 'print $segment' after you create it and before you  
>>>>> try to
>>>>> get the features from it.
>>>>>
>>>>> Clearly, the problem is that $segment is not defined (that is,  
>>>>> nothing
>>>>> is in it, not that the wrong thing is in it).  The next trick is  
>>>>> to
>>>> find
>>>>> out why.  My first guess, without looking at the data set, is  
>>>>> that the
>>>>> arm is not really named '4'.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>> Hi, everyone,
>>>>>>
>>>>>> I met this problem when I was running this script to extract  
>>>>>> features
>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't  
>>>>>> call
>>>>>> method
>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>> ==============================================================
>>>>>> use Bio::DB::GFF;
>>>>>> use Bio::Tools::GFF;
>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>                                        -dsn =>
>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>                                        -user => 'XXXX',
>>>>>>                                        -pass => 'XXXX') || die
>>>> "database
>>>>>> open failed";
>>>>>>
>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>> 25000);
>>>>>> my @features = $segment->features(-types => ['gene', 'exon',  
>>>>>> 'intron',
>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no  
>>>>>> features";
>>>>>> print(scalar(@features)."\n");
>>>>>>
>>>>>> ================================================================
>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I  
>>>>>> loaded
>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>>>> error.
>>>>>> Other methods failed also.
>>>>>>
>>>>>> Any help will be deeply appreciated!
>>>>>>
>>>>>> Best,
>>>>>> Jon
>>>>>>
>>>>> -- 
>>>>>
>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>> -- 
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From ykumagai at biken.osaka-u.ac.jp  Mon Jan 21 11:56:53 2008
From: ykumagai at biken.osaka-u.ac.jp (Yutaro Kumagai)
Date: Tue, 22 Jan 2008 01:56:53 +0900
Subject: [Bioperl-l] Problem with Bio::ASN1::EntrezGene::Indexer
Message-ID: <4794CED5.3070307@biken.osaka-u.ac.jp>

Hi, everyone,

I'm working on Bio::ASN1::EntrezGene::Indexer as below:

###
use Bio::ASN1::EntrezGene::Indexer
use Bio::ASN1::EntrezGene
use Bio::SeqIO;

my $inx = Bio::ASN1::EntrezGene::Indexer->new(-filename =>
					      'c:/chrm/asn/entrezgene.idx');

# The index file has already been made successfully. I checked it
# by counting the num. of records by $inx -> count_records etc. etc.

my $seq1 = $inx -> fetch_hash(15959);

# The ID 15969 surely exists, because I had no err message and
# by dumpening $seq1, I confirmed that $seq1 contains some data.

my $seq2 = $inx -> fetch(15969);
###

However, the last method returned this error:
"you must pass in a file name or handle through new() or input_file() first
before calling next_seq!
at C:/Perl/site/lib/Bio\SeqIO\entrezgene.pm line 136".

I chased the programm by the debugger, and found that somehow _fh()
in Bio::Index::AbstractSeq failed to pass the filehandle to fetch.

Now, I have two questions:

1) what's wrong with the above methods? Is this a bug? Or just my
fault? If so, what is my fault?

2) If I could'nt work with "fetch", how can I extract the data
of sequences (position in genomic contig, strand etc.) from
the data obtained by "fetch_hash"? Now I can't understand how
the data structure of results by "fetch_hash" is...

Thank you in advance.

Yutaro Kumagai.

-- 
**********************************
Yutaro Kumagai
Dept. of Host Defense
Res. Inst. for Microbial Diseases
Osaka University
Japan
ykumagai at biken.osaka-u.ac.jp
**********************************


From hangsyin at gmail.com  Mon Jan 21 14:22:55 2008
From: hangsyin at gmail.com (Hang)
Date: Mon, 21 Jan 2008 11:22:55 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an
 undefined value at BIO::DB::GFF.pl
In-Reply-To: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
	<14982665.post@talk.nabble.com>
	
	<3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
Message-ID: <15004412.post@talk.nabble.com>


Hi, Chris:

Following your suggestion, I added -create flag and the GFF3loader started
to work. Thanks alot!
When I load dmel-all-5.4.gff into mysql with -fast, I had the following
error:
   Data too long for column 'attribute_value' at c:/../../../mysql.pm line
510
If I don't use -fast, it is OK, except for the annoying slow speed. Do you
have any suggestion on this?

Best,
Hang




Chris Fields wrote:
> 
> It's bp_seqfeature_load.pl (if you have the full bioperl core  
> distribution, it's in script/Bio-SeqFeature/Store).  I had some  
> problems with the fast-loading option but it was likely just my gff  
> formatting; example data loaded just fine.
> 
> As for the error, you need to use the '-create' flag when initializing  
> a database (or wiping data from a current one):
> 
> =============================================================
> use Bio::DB::SeqFeature::Store;
> use Bio::DB::SeqFeature::Store::GFF3Loader;
> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>                                          -dsn     => 'dbi:mysql:test',
>                                          -user    => 'root',
>                                          -pass    => 'XXXXX',
>                                          -write   =>  1
>                                          -create  => 1);
> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>  
> $db,
>                                                          -verbose  =>  
> 1);
> $loader->load(./'dmel-all-r5.4.gff');
> =============================================================
> 
> chris
> 
> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:
> 
>> Jon,
>>
>> There is a script for loading a SeqFeature database just like the GFF
>> database, though I don't know what it's called off hand (I'm not at my
>> normal computer right now).  Be sure to read the documentation and you
>> will probably want to use the 'fast' option (I don't remember what  
>> it is
>> called either).
>>
>> Scott
>>
>>
>> ----------------------------------------------------------------------
>> Scott Cain, Ph. D.				 	 cain at cshl.edu
>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>> ----------------------------------------------------------------------
>>
>>
>> On Sun, 20 Jan 2008, Hang wrote:
>>
>>>
>>> Hi, Scott,
>>> I tried to change sequence-region line to "4   FlyBase   
>>> chromosome_arm  1
>>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say  
>>> anything but
>>> "died at line 12".
>>>
>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my  
>>> code to
>>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>>> =============================================================
>>> use Bio::DB::SeqFeature::Store;
>>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>>                                         -dsn     => 'dbi:mysql:test',
>>>                                         -user    => 'root',
>>>                                         -pass    => 'XXXXX',
>>>                                         -write   =>  1 );
>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store     
>>> => $db,
>>>                                                         -verbose   
>>> => 1);
>>> $loader->load(./'dmel-all-r5.4.gff');
>>> =============================================================
>>> I got bunch of errors like this:
>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't  
>>> exist at
>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line  
>>> 1316".
>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or  
>>> die
>>> $sth->errstr;
>>> I checked the database test after failed loading. There is only one  
>>> table
>>> created, which call 'meta'. I also tried 'grant all on test to
>>> XXX at localhost' and used that -user and -pass to load gff, it didn't  
>>> work
>>> either.
>>>
>>> Jon
>>>
>>>
>>> Scott Cain-3 wrote:
>>>>
>>>> Hi Jon,
>>>>
>>>> Well, seeing the error message would be helpful, but my first guess
>>>> without is that there are a few things you can try:
>>>>
>>>>  * removing the "sequence-region" line from the GFF file, adding a  
>>>> line
>>>> like this:
>>>>
>>>>  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>>
>>>> and then reloading the database.
>>>>
>>>>  * Or, you may want to consider using Bio::DB::SeqFeature::Store,  
>>>> since
>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3  
>>>> (that
>>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>>
>>>> Scott
>>>>
>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>>> Hi, Scott,
>>>>>
>>>>> After adding die $!, I know something is wrong at line:
>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>>> 25000);"
>>>>>
>>>>> my gff file is like this:
>>>>> ##gff-version 3
>>>>> ##sequence-region 4 1 1351857
>>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>>> ID=FBti0062890;Name=ninja-Dsim- 
>>>>> like 
>>>>> {}4829 
>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- 
>>>>> RepeatMasker;
>>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>>> ID=:5142029_dummy;Name=:5142029;Parent=: 
>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>>> 5860 6210 +;
>>>>> ...
>>>>> ...
>>>>> I really got confused. Any further suggestion? Thank you!
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Scott Cain-3 wrote:
>>>>>>
>>>>>> Hi Jon,
>>>>>>
>>>>>> I think it's funny that you have "or die" on the database  
>>>>>> opening line,
>>>>>> "or die" on the @features line, but you didn't put one on the  
>>>>>> $segment
>>>>>> line.  Try adding "or die: $!" to the $segment line to see what it
>>>>> says,
>>>>>> also add a 'print $segment' after you create it and before you  
>>>>>> try to
>>>>>> get the features from it.
>>>>>>
>>>>>> Clearly, the problem is that $segment is not defined (that is,  
>>>>>> nothing
>>>>>> is in it, not that the wrong thing is in it).  The next trick is  
>>>>>> to
>>>>> find
>>>>>> out why.  My first guess, without looking at the data set, is  
>>>>>> that the
>>>>>> arm is not really named '4'.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>>> Hi, everyone,
>>>>>>>
>>>>>>> I met this problem when I was running this script to extract  
>>>>>>> features
>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't  
>>>>>>> call
>>>>>>> method
>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>>> ==============================================================
>>>>>>> use Bio::DB::GFF;
>>>>>>> use Bio::Tools::GFF;
>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>>                                        -dsn =>
>>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>>                                        -user => 'XXXX',
>>>>>>>                                        -pass => 'XXXX') || die
>>>>> "database
>>>>>>> open failed";
>>>>>>>
>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>>> 25000);
>>>>>>> my @features = $segment->features(-types => ['gene', 'exon',  
>>>>>>> 'intron',
>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no  
>>>>>>> features";
>>>>>>> print(scalar(@features)."\n");
>>>>>>>
>>>>>>> ================================================================
>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I  
>>>>>>> loaded
>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>>>>> error.
>>>>>>> Other methods failed also.
>>>>>>>
>>>>>>> Any help will be deeply appreciated!
>>>>>>>
>>>>>>> Best,
>>>>>>> Jon
>>>>>>>
>>>>>> -- 
>>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>> Scott Cain, Ph. D.
>>>>> cain at cshl.edu
>>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>> 216-392-3087
>>>>>> Cold Spring Harbor Laboratory
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                        
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                      
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cjfields at uiuc.edu  Mon Jan 21 23:21:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 Jan 2008 22:21:27 -0600
Subject: [Bioperl-l] Problem: Can't call method "features" on an
	undefined value at BIO::DB::GFF.pl
In-Reply-To: <15004412.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
	<14982665.post@talk.nabble.com>
	
	<3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
	<15004412.post@talk.nabble.com>
Message-ID: <8B1956B2-1380-4E73-8F14-F79CA5435697@uiuc.edu>

I'm cc'ing this to the gbrowse list just in case Lincoln or Scott have  
an idea.  My guess is it's a bug in the fast loader.  Could you file  
this in bugzilla?

http://bugzilla.open-bio.org/

chris

On Jan 21, 2008, at 1:22 PM, Hang wrote:

>
> Hi, Chris:
>
> Following your suggestion, I added -create flag and the GFF3loader  
> started
> to work. Thanks alot!
> When I load dmel-all-5.4.gff into mysql with -fast, I had the  
> following
> error:
>   Data too long for column 'attribute_value' at c:/../../../mysql.pm  
> line
> 510
> If I don't use -fast, it is OK, except for the annoying slow speed.  
> Do you
> have any suggestion on this?
>
> Best,
> Hang
>
>
>
>
> Chris Fields wrote:
>>
>> It's bp_seqfeature_load.pl (if you have the full bioperl core
>> distribution, it's in script/Bio-SeqFeature/Store).  I had some
>> problems with the fast-loading option but it was likely just my gff
>> formatting; example data loaded just fine.
>>
>> As for the error, you need to use the '-create' flag when  
>> initializing
>> a database (or wiping data from a current one):
>>
>> =============================================================
>> use Bio::DB::SeqFeature::Store;
>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>                                         -dsn     => 'dbi:mysql:test',
>>                                         -user    => 'root',
>>                                         -pass    => 'XXXXX',
>>                                         -write   =>  1
>>                                         -create  => 1);
>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>
>> $db,
>>                                                         -verbose  =>
>> 1);
>> $loader->load(./'dmel-all-r5.4.gff');
>> =============================================================
>>
>> chris
>>
>> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:
>>
>>> Jon,
>>>
>>> There is a script for loading a SeqFeature database just like the  
>>> GFF
>>> database, though I don't know what it's called off hand (I'm not  
>>> at my
>>> normal computer right now).  Be sure to read the documentation and  
>>> you
>>> will probably want to use the 'fast' option (I don't remember what
>>> it is
>>> called either).
>>>
>>> Scott
>>>
>>>
>>> ----------------------------------------------------------------------
>>> Scott Cain, Ph. D.				 	 cain at cshl.edu
>>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>>> ----------------------------------------------------------------------
>>>
>>>
>>> On Sun, 20 Jan 2008, Hang wrote:
>>>
>>>>
>>>> Hi, Scott,
>>>> I tried to change sequence-region line to "4   FlyBase
>>>> chromosome_arm  1
>>>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say
>>>> anything but
>>>> "died at line 12".
>>>>
>>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my
>>>> code to
>>>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>>>> =============================================================
>>>> use Bio::DB::SeqFeature::Store;
>>>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>>>                                        -dsn     =>  
>>>> 'dbi:mysql:test',
>>>>                                        -user    => 'root',
>>>>                                        -pass    => 'XXXXX',
>>>>                                        -write   =>  1 );
>>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store
>>>> => $db,
>>>>                                                        -verbose
>>>> => 1);
>>>> $loader->load(./'dmel-all-r5.4.gff');
>>>> =============================================================
>>>> I got bunch of errors like this:
>>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't
>>>> exist at
>>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line
>>>> 1316".
>>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or
>>>> die
>>>> $sth->errstr;
>>>> I checked the database test after failed loading. There is only one
>>>> table
>>>> created, which call 'meta'. I also tried 'grant all on test to
>>>> XXX at localhost' and used that -user and -pass to load gff, it didn't
>>>> work
>>>> either.
>>>>
>>>> Jon
>>>>
>>>>
>>>> Scott Cain-3 wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> Well, seeing the error message would be helpful, but my first  
>>>>> guess
>>>>> without is that there are a few things you can try:
>>>>>
>>>>> * removing the "sequence-region" line from the GFF file, adding a
>>>>> line
>>>>> like this:
>>>>>
>>>>> 4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>>>
>>>>> and then reloading the database.
>>>>>
>>>>> * Or, you may want to consider using Bio::DB::SeqFeature::Store,
>>>>> since
>>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3
>>>>> (that
>>>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>>>
>>>>> Scott
>>>>>
>>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>>>> Hi, Scott,
>>>>>>
>>>>>> After adding die $!, I know something is wrong at line:
>>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end  
>>>>>> =>
>>>>>> 25000);"
>>>>>>
>>>>>> my gff file is like this:
>>>>>> ##gff-version 3
>>>>>> ##sequence-region 4 1 1351857
>>>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>>>> ID=FBti0062890;Name=ninja-Dsim-
>>>>>> like
>>>>>> {}4829
>>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-
>>>>>> RepeatMasker;
>>>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>>>> ID=:5142029_dummy;Name=:5142029;Parent=:
>>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>>>> 5860 6210 +;
>>>>>> ...
>>>>>> ...
>>>>>> I really got confused. Any further suggestion? Thank you!
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Scott Cain-3 wrote:
>>>>>>>
>>>>>>> Hi Jon,
>>>>>>>
>>>>>>> I think it's funny that you have "or die" on the database
>>>>>>> opening line,
>>>>>>> "or die" on the @features line, but you didn't put one on the
>>>>>>> $segment
>>>>>>> line.  Try adding "or die: $!" to the $segment line to see  
>>>>>>> what it
>>>>>> says,
>>>>>>> also add a 'print $segment' after you create it and before you
>>>>>>> try to
>>>>>>> get the features from it.
>>>>>>>
>>>>>>> Clearly, the problem is that $segment is not defined (that is,
>>>>>>> nothing
>>>>>>> is in it, not that the wrong thing is in it).  The next trick is
>>>>>>> to
>>>>>> find
>>>>>>> out why.  My first guess, without looking at the data set, is
>>>>>>> that the
>>>>>>> arm is not really named '4'.
>>>>>>>
>>>>>>> Scott
>>>>>>>
>>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>>>> Hi, everyone,
>>>>>>>>
>>>>>>>> I met this problem when I was running this script to extract
>>>>>>>> features
>>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't
>>>>>>>> call
>>>>>>>> method
>>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>>>> ==============================================================
>>>>>>>> use Bio::DB::GFF;
>>>>>>>> use Bio::Tools::GFF;
>>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>>>                                       -dsn =>
>>>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>>>                                       -user => 'XXXX',
>>>>>>>>                                       -pass => 'XXXX') || die
>>>>>> "database
>>>>>>>> open failed";
>>>>>>>>
>>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, - 
>>>>>>>> end =>
>>>>>> 25000);
>>>>>>>> my @features = $segment->features(-types => ['gene', 'exon',
>>>>>>>> 'intron',
>>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no
>>>>>>>> features";
>>>>>>>> print(scalar(@features)."\n");
>>>>>>>>
>>>>>>>> = 
>>>>>>>> ===============================================================
>>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I
>>>>>>>> loaded
>>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without  
>>>>>>>> any
>>>>>> error.
>>>>>>>> Other methods failed also.
>>>>>>>>
>>>>>>>> Any help will be deeply appreciated!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jon
>>>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>> Scott Cain, Ph. D.
>>>>>> cain at cshl.edu
>>>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>>> 216-392-3087
>>>>>>> Cold Spring Harbor Laboratory
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>> -- 
>>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.
>>>>> cain at cshl.edu
>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>> 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From jason at bioperl.org  Wed Jan 23 03:14:06 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 23 Jan 2008 00:14:06 -0800
Subject: [Bioperl-l] [Bioperl-guts-l] [14455]
	bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm: fixed up the
	gene glyph so that it works properly with CDS-only genes
In-Reply-To: <200801222048.m0MKmhiI007977@dev.open-bio.org>
References: <200801222048.m0MKmhiI007977@dev.open-bio.org>
Message-ID: <91659EDD-B102-47C8-BF93-92576C2CF324@bioperl.org>

Lincoln -- Thank you, Thank you for this fix!  This takes care of  
inconsistency problems I was having with GFF3 and GFF2 data.  It  
works so much more beautifully now!

-jason
On Jan 22, 2008, at 12:48 PM, Lincoln Stein wrote:

> Revision: 14455
> Author:   lstein
> Date:     2008-01-22 15:48:42 -0500 (Tue, 22 Jan 2008)
>
> Log Message:
> -----------
> fixed up the gene glyph so that it works properly with CDS-only genes
>
> Modified Paths:
> --------------
>     bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm
>
> Modified: bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm
> ===================================================================
> --- bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm	2008-01-22  
> 00:16:02 UTC (rev 14454)
> +++ bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm	2008-01-22  
> 20:48:42 UTC (rev 14455)
> @@ -44,7 +44,9 @@
>
>  sub bump {
>    my $self = shift;
> -  return 1 if $self->{level} == 0; # top level bumps, other levels  
> don't unless specified in config
> +  return 1
> +    if $self->{level} == 0
> +      && lc $self->feature->primary_tag eq 'gene'; # top level  
> bumps, other levels don't unless specified in config
>    return $self->SUPER::bump;
>  }
>
> @@ -92,12 +94,16 @@
>  sub _subfeat {
>    my $class   = shift;
>    my $feature = shift;
> -  if ($feature->primary_tag eq 'gene') {
> +  if (lc $feature->primary_tag eq 'gene') {
>      my @transcripts;
>      for my $t (qw/mRNA tRNA snRNA snoRNA miRNA ncRNA pseudogene/) {
>        push @transcripts, $feature->get_SeqFeatures($t);
>      }
>      return @transcripts;
> +  } elsif (lc $feature->primary_tag eq 'cds') {
> +    my @parts = $feature->get_SeqFeatures();
> +    return ($feature) if $class->{level} == 0 and !@parts;
> +    return @parts;
>    }
>
>    my @subparts;
>
>
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l



From ste.ghi at libero.it  Thu Jan 24 08:42:49 2008
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Thu, 24 Jan 2008 14:42:49 +0100
Subject: [Bioperl-l] parsing ACE file
Message-ID: 

Dear All,
    dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job?

Any suggestion about how to start is welcome...
Cheers

Stefano




From pmiguel at purdue.edu  Thu Jan 24 14:06:35 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Thu, 24 Jan 2008 14:06:35 -0500
Subject: [Bioperl-l] parsing ACE file
In-Reply-To: 
References: 
Message-ID: <4798E1BB.2020809@purdue.edu>

Stefano Ghignone wrote:
> Dear All,
>     dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job?
>
> Any suggestion about how to start is welcome...
> Cheers
>
> Stefano
>
>   
 perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace

will give you a list of each the contigs followed by the reads in each 
contig, if "acefile.ace" is a phrap ace file.

There is a bioperl module for handling phrap ace file, but I'm not sure 
what its current status is. Last time I looked (probably a couple of 
years ago) it seemed to have been abandoned half-finished.

-- 
Phillip


From golharam at umdnj.edu  Thu Jan 24 14:36:29 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 24 Jan 2008 14:36:29 -0500
Subject: [Bioperl-l] Wiki inconsistency?
Message-ID: <4798E8BD.7030107@umdnj.edu>

Hi,

I haven't used Bioperl in a while but recently started using it.  I was 
using 1.4.0 but see on the website that 1.5.2 has been released.   If I 
click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), 
I see a two versions:

bioperl-1.5.2_102

and

bioperl-1.5.2_100

However, If I click on the Downloads link on the left toolbar, then 
scroll down, I see 1.5.2 Developer Release.  The tar file here points to 
  current_core_unstable.tar.gz.

Is this supposed to be this way?  It seems a bit confusing.  I think it 
might be appropriate to put all the download links in one 
location...just my two cents...

Ryan



From cjfields at uiuc.edu  Thu Jan 24 15:58:25 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 24 Jan 2008 14:58:25 -0600
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <4798E8BD.7030107@umdnj.edu>
References: <4798E8BD.7030107@umdnj.edu>
Message-ID: 

Maybe Sendu can answer more specifically, but I believe the extra  
designation referred to the release candidate (of which bioperl-core  
was the only one with '102').  You definitely want the core package.   
The other ones with '100' are other bioperl-related distributions  
which require the core package but have additional functionality  
(BioSQL-related functions, wrapper modules, etc.).

chris

On Jan 24, 2008, at 1:36 PM, Ryan Golhar wrote:

> Hi,
>
> I haven't used Bioperl in a while but recently started using it.  I  
> was using 1.4.0 but see on the website that 1.5.2 has been  
> released.   If I click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2 
> ), I see a two versions:
>
> bioperl-1.5.2_102
>
> and
>
> bioperl-1.5.2_100
>
> However, If I click on the Downloads link on the left toolbar, then  
> scroll down, I see 1.5.2 Developer Release.  The tar file here  
> points to  current_core_unstable.tar.gz.
>
> Is this supposed to be this way?  It seems a bit confusing.  I think  
> it might be appropriate to put all the download links in one  
> location...just my two cents...
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From florent.angly at gmail.com  Thu Jan 24 17:06:29 2008
From: florent.angly at gmail.com (Florent Angly)
Date: Thu, 24 Jan 2008 14:06:29 -0800
Subject: [Bioperl-l] parsing ACE file
In-Reply-To: <4798E1BB.2020809@purdue.edu>
References: 
	<4798E1BB.2020809@purdue.edu>
Message-ID: <47990BE5.2010005@gmail.com>

That would be the module Bio::Assembly::IO::ace
It works fine as far as I know.
To parse an assembly, use Bio::Assembly::IO: 
http://doc.bioperl.org/bioperl-live/Bio/Assembly/IO.html
Regards,
Florent

Phillip San Miguel wrote:
> Stefano Ghignone wrote:
>> Dear All,
>>     dealing with an assembly .ace file and a list of contigs (from 
>> that assembly), how can I extract from the .ace file the read names 
>> forming each listed contig? Is there any module doing this job?
>>
>> Any suggestion about how to start is welcome...
>> Cheers
>>
>> Stefano
>>
>>   
> perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace
>
> will give you a list of each the contigs followed by the reads in each 
> contig, if "acefile.ace" is a phrap ace file.
>
> There is a bioperl module for handling phrap ace file, but I'm not 
> sure what its current status is. Last time I looked (probably a couple 
> of years ago) it seemed to have been abandoned half-finished.
>



From golharam at umdnj.edu  Thu Jan 24 16:17:14 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 24 Jan 2008 16:17:14 -0500
Subject: [Bioperl-l] GenBank updated sequence not being retrieved
Message-ID: <4799005A.5030204@umdnj.edu>

I'm using Bioperl 1.4 (and tried with 1.5.1).

I'm trying to download GenBank sequence for which I have accession #'s. 
  One of the sequences has been replaced with a newer version.  I'm 
using get_Seq_by_acc, which returns the warning:

-------------------- WARNING ---------------------
MSG: acc (gb|XM_087386) does not exist
---------------------------------------------------

If I check NCBI's website for the sequence, it has indeed been replaced 
by an NM_ sequence.  How can I get BioPerl to retrieve the latest 
version of a sequence?



From johan.nilsson at sh.se  Thu Jan 24 17:33:42 2008
From: johan.nilsson at sh.se (Johan Nilsson)
Date: Thu, 24 Jan 2008 23:33:42 +0100
Subject: [Bioperl-l] Quickest Codon Based MSA?
Message-ID: <47991246.6010106@sh.se>

Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign', 
but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage 
of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson


From e-just at northwestern.edu  Thu Jan 24 18:07:57 2008
From: e-just at northwestern.edu (Eric Just)
Date: Thu, 24 Jan 2008 17:07:57 -0600
Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago
Message-ID: 

Hello everyone,

We have an opening at dictyBase (Northwestern University in Chicago) for a
Bioinformatics Software Engineer.  This job involves writing and maintaining
software for a genome database using Chado/OO-Perl/ Bioperl and many other
state-of-the-art technologies.

For more information please see:
http://dictybase.org/dictybase_jobs.htm

Thanks,
Eric


From bix at sendu.me.uk  Thu Jan 24 18:16:14 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 24 Jan 2008 23:16:14 +0000
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <4798E8BD.7030107@umdnj.edu>
References: <4798E8BD.7030107@umdnj.edu>
Message-ID: <47991C3E.2010908@sendu.me.uk>

Ryan Golhar wrote:
> Hi,
> 
> I haven't used Bioperl in a while but recently started using it.  I was 
> using 1.4.0 but see on the website that 1.5.2 has been released.   If I 
> click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), 
> I see a two versions:
> 
> bioperl-1.5.2_102
> 
> and
> 
> bioperl-1.5.2_100

Where do you see this older version? I did a search on the page and that 
term isn't found. _100 was the first version of 1.5.2 core to go out. 
There were then 2 minor revisions released, as detailed in the 'Updates' 
section of the page.


> However, If I click on the Downloads link on the left toolbar, then 
> scroll down, I see 1.5.2 Developer Release.  The tar file here points to 
> current_core_unstable.tar.gz.

Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest 
version happens to be. So that people don't need to worry about the 
actual version, they can just have one static bookmark.


> Is this supposed to be this way?  It seems a bit confusing.  I think it 
> might be appropriate to put all the download links in one 
> location...just my two cents...

Well the primary page where all the links are found is the Downloads 
page. The Release_1.5.2 page is specific to 1.5.2 and will remain for 
historic reasons (so at some point there will be 1.5.3 or something and 
the appropriate links on the main Downloads page will be updated to 
that, but if someone specifically wants 1.5.2 they can still find the 
1.5.2 downloads on its own dedicated page).


From jason at bioperl.org  Thu Jan 24 21:17:02 2008
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 24 Jan 2008 18:17:02 -0800
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
References: <47991246.6010106@sh.se>
Message-ID: 

I don't know if it is faster or slower than what you have tried but  
the aa_to_dna_aln translates a protein alignment back to CDS.  You  
can see example code of it in use in the pairwise_kaks script in  
scripts/utilities/pairwise_kaks.PLS

-jason
On Jan 24, 2008, at 2:33 PM, Johan Nilsson wrote:

> Hello,
>
> I have a question which might not necessarily be related to  
> Bioperl, although I do believe the expertise is available here. I  
> have a couple of thousand FASTA files, each containing 20 CDS  
> sequence orthologues of rather high sequence similarity. I would  
> like to create a codon-based multiple sequence alignment for each  
> of these FASTA files (i.e. a nucleotide sequence alignment inferred  
> from alignment of the translated peptide sequences, to assure that  
> no frame shifts will occur). I first tried running Dialign2, which  
> can perform the translation/back-translation in one go, but this  
> turned out to be far too slow. I next tried to build protein  
> alignments using ClustalW and subsequently built the coding region  
> alignment using EMBOSS 'tranalign', but this also was too slow.
>
> Is there any method available which significantly speeds up the  
> codon-preserving alignment??? As I mentioned, the sequences to be  
> aligned are in general very conserved, so any heuristic taking  
> advantage of the low divergence would be very helpful! Also, is  
> there any adjustable parameter in dialign2/dialign-T that might  
> speed up the program when looking at highly similar sequences?
>
> Best regards
> /Johan Nilsson
> _______________________________________________
> Bioperl-l mailing list



From tristan.lefebure at gmail.com  Thu Jan 24 22:07:52 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Thu, 24 Jan 2008 22:07:52 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree, and how to combine trees
Message-ID: <200801242207.52991.tristan.lefebure@gmail.com>

Hi,

I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would 
like to merge several "one leaf taxonomic trees" into a taxonomic tree with 
several leafs. For example:

#####BEGINNING#####
#! /usr/bin/perl

use strict;
use warnings;
use Bio::DB::Taxonomy;
use Bio::TreeIO;

# The taxonomic database
# You might want to switch to a different flatfile or to Entrez 
my $dbh = new Bio::DB::Taxonomy(-source   => 'flatfile',
                                  -directory=> '/tmp',  
                                  -nodesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/nodes.dmp', 
                                  -namesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/names.dmp');

# Fetch 4 taxa for the example
my $tax_decapoda =  $dbh->get_taxon(-name => 'Decapoda');
my $tax_heteroptera =  $dbh->get_taxon(-name => 'Heteroptera');
my $tax_coleoptera =  $dbh->get_taxon(-name => 'Coleoptera');
my $tax_copepoda =  $dbh->get_taxon(-name => 'Copepoda');

# Transform to tree objects
my $decapoda_tree = new Bio::Tree::Tree(-node => $tax_decapoda);
my $heteroptera_tree = new Bio::Tree::Tree(-node => $tax_heteroptera);
my $coleoptera_tree = new Bio::Tree::Tree(-node => $tax_coleoptera);
my $copepoda_tree = new Bio::Tree::Tree(-node => $tax_copepoda);

# Reduce the number of nodes to the following ranks
my @ranks = qw(kingdom phylum subphylum superclass class subclass superorder 
order family);

$decapoda_tree->splice(-keep_rank => \@ranks);
$heteroptera_tree->splice(-keep_rank => \@ranks);
$coleoptera_tree->splice(-keep_rank => \@ranks);
$copepoda_tree->splice(-keep_rank => \@ranks);

# Print the trees
my $out = new Bio::TreeIO('-format' => 'newick',
                                   '-file'   => ">four.tree");
$out->write_tree($decapoda_tree);
$out->write_tree($heteroptera_tree);
$out->write_tree($coleoptera_tree);
$out->write_tree($copepoda_tree);

#####END#######

This gives the following "trees":
(((((7524)33340)50557)6960)6656)33208;
(((((7041)33340)50557)6960)6656)33208;
((((((6683)6682)72041)6681)6657)6656)33208;
((((6830)72037)6657)6656)33208;

They are really special trees, as they contain only one leaf. I would like to 
combine them and remove the 'unused' nodes to obtain something like that:

((7524,7041)33340,(6683,6830)6657)6656;

or even better:

((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda;

Any suggestions?

Thanks!

-Tristan



From anjan.purkayastha at gmail.com  Thu Jan 24 18:32:20 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Thu, 24 Jan 2008 18:32:20 -0500
Subject: [Bioperl-l] Question from a bioperl newbie
Message-ID: 

hi,
i recently installed bioperl on my mac-machine.
tried to use it in a simple script with a "use Bio::Perl" command. however,
i get an error message "Can't locate Bio/Perl.pm in @INC".
the BioPerl folder is in my desktop. so i tried use: use lib
"/Users/anjan/Desktop/bioperl-1.5.2_102/Bio";
This time it returned me another error: Undefined subroutine
&main::get_sequence.

so, when BioPerl is installed, which directory does it reside in.( it's not
present in the .cpan/build directory.)

appreciate your prompt reply.

anjan

-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From bosborne11 at verizon.net  Thu Jan 24 23:04:50 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 24 Jan 2008 23:04:50 -0500
Subject: [Bioperl-l] Question from a bioperl newbie
In-Reply-To: 
References: 
Message-ID: <3B13E81A-66E1-418A-8915-9E877C2B751D@verizon.net>

Anjan,

use lib "/Users/anjan/Desktop/bioperl-1.5.2_102/";

Brian O.


On Jan 24, 2008, at 6:32 PM, ANJAN PURKAYASTHA wrote:

> hi,
> i recently installed bioperl on my mac-machine.
> tried to use it in a simple script with a "use Bio::Perl" command.  
> however,
> i get an error message "Can't locate Bio/Perl.pm in @INC".
> the BioPerl folder is in my desktop. so i tried use: use lib
> "/Users/anjan/Desktop/bioperl-1.5.2_102/Bio";
> This time it returned me another error: Undefined subroutine
> &main::get_sequence.
>
> so, when BioPerl is installed, which directory does it reside in. 
> ( it's not
> present in the .cpan/build directory.)
>
> appreciate your prompt reply.
>
> anjan
>
> -- 
> ANJAN PURKAYASTHA, PhD.
> Senior Computational Biologist
> ==========================
>
> 1101 King Street, Suite 310,
> Alexandria, VA 22314.
> 703.518.8040 (office)
> 703.740.6939 (mobile)
>
> email:
> anjan at vbi.vt.edu;
> anjan.purkayastha at gmail.com
>
> http://www.vbi.vt.edu
>
> ==========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From n.haigh at sheffield.ac.uk  Fri Jan 25 02:32:10 2008
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Fri, 25 Jan 2008 07:32:10 +0000
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <47991C3E.2010908@sendu.me.uk>
References: <4798E8BD.7030107@umdnj.edu> <47991C3E.2010908@sendu.me.uk>
Message-ID: <4799907A.9060301@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sendu,

Have you thought about using a template for the latest stable release and the latest developer release? That way, any article/link that always needs
to point to the latest version simply has to include the correct template? So once a new release is made, you simply update the one template, and
changes automatically propagate through the wiki - might save some wiki admin each time there's a new release. You could get more intricate, and use a
template to show the latest version of any particular release series so you could do something like:

{{latest release|series=1.5.x|full=y}}
and
{{latest release|series=1.4.x|full=y}}

or even:

{{latest release|series=stable|full=y}}
and
{{latest release|series=dev|full=y}}

these templates could return 1.5.2_102 if the "full" param is set to something or simply 1.5.2 if the "full" param is missing.

Just a thought.
Nath


Sendu Bala wrote:
> Ryan Golhar wrote:
>> Hi,
>>
>> I haven't used Bioperl in a while but recently started using it.  I
>> was using 1.4.0 but see on the website that 1.5.2 has been released.  
>> If I click on the link for 1.5.2
>> (http://www.bioperl.org/wiki/Release_1.5.2), I see a two versions:
>>
>> bioperl-1.5.2_102
>>
>> and
>>
>> bioperl-1.5.2_100
> 
> Where do you see this older version? I did a search on the page and that
> term isn't found. _100 was the first version of 1.5.2 core to go out.
> There were then 2 minor revisions released, as detailed in the 'Updates'
> section of the page.
> 
> 
>> However, If I click on the Downloads link on the left toolbar, then
>> scroll down, I see 1.5.2 Developer Release.  The tar file here points
>> to current_core_unstable.tar.gz.
> 
> Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest
> version happens to be. So that people don't need to worry about the
> actual version, they can just have one static bookmark.
> 
> 
>> Is this supposed to be this way?  It seems a bit confusing.  I think
>> it might be appropriate to put all the download links in one
>> location...just my two cents...
> 
> Well the primary page where all the links are found is the Downloads
> page. The Release_1.5.2 page is specific to 1.5.2 and will remain for
> historic reasons (so at some point there will be 1.5.3 or something and
> the appropriate links on the main Downloads page will be updated to
> that, but if someone specifically wants 1.5.2 they can still find the
> 1.5.2 downloads on its own dedicated page).
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHmZB69gTv6QYzVL4RAnRpAJwOyWjZXzD0UJBNFNP8H1Hrn4c66ACfRyzA
NsJEZydsG+aMzNltrBw+Nx4=
=kHt0
-----END PGP SIGNATURE-----


From derek.fairley at belfasttrust.hscni.net  Fri Jan 25 03:31:28 2008
From: derek.fairley at belfasttrust.hscni.net (Fairley, Derek)
Date: Fri, 25 Jan 2008 08:31:28 -0000
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
Message-ID: 

Johan,

There is currently no Bioperl-run wrapper for this program, but you
might want to have a look at Codon Align 2.0 as well:
http://homepage.mac.com/barryghall/CodonAlign.html

Derek

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Johan Nilsson
Sent: 24 January 2008 22:34
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Quickest Codon Based MSA?

Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign',

but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage

of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From ewijaya at gmail.com  Fri Jan 25 04:26:05 2008
From: ewijaya at gmail.com (Edward Wijaya)
Date: Fri, 25 Jan 2008 17:26:05 +0800
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
Message-ID: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>

Dear Experts,

Suppose I have the following list of gene names and Ensemble Ids.

RBL1	ENSG00000080839
RB1	ENSG00000139687
CDC2	ENSG00000170312
CDC25A	ENSG00000164045
CCNA2	ENSG00000145386
E2F3	ENSG00000112242
E2F2	ENSG00000007968
CDK2	ENSG00000123374
...etc...

Is there a way to extract the gene sequence from those list?
And then output them in FASTA format.

- Edward


From bix at sendu.me.uk  Fri Jan 25 05:55:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 25 Jan 2008 10:55:50 +0000
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
References: <47991246.6010106@sh.se>
Message-ID: <4799C036.5060404@sendu.me.uk>

Johan Nilsson wrote:
> Hello,
> 
> I have a question which might not necessarily be related to Bioperl, 
> although I do believe the expertise is available here. I have a couple 
> of thousand FASTA files, each containing 20 CDS sequence orthologues of 
> rather high sequence similarity. I would like to create a codon-based 
> multiple sequence alignment for each of these FASTA files (i.e. a 
> nucleotide sequence alignment inferred from alignment of the translated 
> peptide sequences, to assure that no frame shifts will occur). I first 
> tried running Dialign2, which can perform the 
> translation/back-translation in one go, but this turned out to be far 
> too slow. I next tried to build protein alignments using ClustalW and 
> subsequently built the coding region alignment using EMBOSS 'tranalign', 
> but this also was too slow.
> 
> Is there any method available which significantly speeds up the 
> codon-preserving alignment??? As I mentioned, the sequences to be 
> aligned are in general very conserved, so any heuristic taking advantage 
> of the low divergence would be very helpful! Also, is there any 
> adjustable parameter in dialign2/dialign-T that might speed up the 
> program when looking at highly similar sequences?

Do you know which is the slow part? For example, when using ClustalW, 
are the alignments slower than the creating the codon alignment from the 
protein?

If ClustalW is the problem, you can try using other alignment programs 
famous for their speed, such as Muscle. If it's the protein->codon bit 
that's slow, try using other programs to do that, like Pal2Nal or the 
BioPerl method.


From David.Messina at sbc.su.se  Fri Jan 25 06:35:16 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 25 Jan 2008 12:35:16 +0100
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
Message-ID: <628aabb70801250335l2a2754efn3e73e44a9dae6a35@mail.gmail.com>

Hi Edward,

I don't think there's a direct BioPerl interface to Ensembl, but BioMart at
Ensembl itself will get you sequences (and lots of other things if you want)
given a list of Ensembl IDs.

http://www.ensembl.org/biomart/martview

Note that as of this writing, the Ensembl BioMart server appears to be down
temporarily.

If you want to be able to get Ensembl sequences from a program, there's the
Ensembl API:

http://www.ensembl.org/info/using/api/core/core_tutorial.html



Dave


From bix at sendu.me.uk  Fri Jan 25 06:07:42 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 25 Jan 2008 11:07:42 +0000
Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree,
 and how to combine trees
In-Reply-To: <200801242207.52991.tristan.lefebure@gmail.com>
References: <200801242207.52991.tristan.lefebure@gmail.com>
Message-ID: <4799C2FE.8080700@sendu.me.uk>

Tristan Lefebure wrote:
> Hi,
> 
> I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would 
> like to merge several "one leaf taxonomic trees" into a taxonomic tree with 
> several leafs.
[...]
> or even better:
> 
> ((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda;

The BioPerl script taxonomy2tree.pl generates:

(((Decapoda,Copepoda)Crustacea,(Heteroptera,Coleoptera)Neoptera)Pancrustacea)"cellular 
organisms";

I think you can modify it similar to your own script to only output the 
classes you're interested in.



http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/taxa/taxonomy2tree.PLS


From bosborne11 at verizon.net  Fri Jan 25 08:53:36 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 25 Jan 2008 08:53:36 -0500
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
Message-ID: <9CE20DF3-ED5F-4432-A191-4123896E5815@verizon.net>

Edward,

Various approaches are discussed here:

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

Since you have ENSEMBL ids I'd think that would be the way to go.


Brian O.

On Jan 25, 2008, at 4:26 AM, Edward Wijaya wrote:

> Dear Experts,
>
> Suppose I have the following list of gene names and Ensemble Ids.
>
> RBL1	ENSG00000080839
> RB1	ENSG00000139687
> CDC2	ENSG00000170312
> CDC25A	ENSG00000164045
> CCNA2	ENSG00000145386
> E2F3	ENSG00000112242
> E2F2	ENSG00000007968
> CDK2	ENSG00000123374
> ...etc...
>
> Is there a way to extract the gene sequence from those list?
> And then output them in FASTA format.
>
> - Edward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From snoze.pa at gmail.com  Fri Jan 25 18:30:56 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Fri, 25 Jan 2008 17:30:56 -0600
Subject: [Bioperl-l] bioperl DB error
Message-ID: <10f848910801251530j6eacfcb0x81780ae312cf19c5@mail.gmail.com>

Dear Users,
 I am using bioperl/iosql and trying to install ncbi taxonomy. But I am
getting following error message.
any help? thanks in advance

perl load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser
root
Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568.


From snoze.pa at gmail.com  Fri Jan 25 18:49:28 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Fri, 25 Jan 2008 17:49:28 -0600
Subject: [Bioperl-l] bioseqDB error
Message-ID: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>

Hi Anyone know why i am getting this error message!!

Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568


From wkath83 at vbi.vt.edu  Thu Jan 24 13:19:06 2008
From: wkath83 at vbi.vt.edu (Katherine Wendelsdorf)
Date: Thu, 24 Jan 2008 13:19:06 -0500 (EST)
Subject: [Bioperl-l] bioperl on mac
Message-ID: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>

Dear one who knows,

I have a macbook with Leopard OSX and I am having trouble running scripts
that call for bioperl modules.

Here is my history: Using Fink I installed bioperl-pm586 version 1.5.2-4
and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl-pm586 in
to the command line I get nothing. Spotlight says that the path is
/sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.

1. I tried to run test2.pl script that was literally copied and pasted
from the HOWTO manual, but it wouldnt run. The two attached docs are the
script I tried to run and the output (which is nonexistant). I read
something that said to "go in to" Bioperl to execute a command. I could
not enter the bioperl directory when it was in the sw/shared directory so
I copied the bioperl folder to the Desktop just so I could try executing
the script inside bioperl. Where am I going wrong here?

Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
somewhere else on my computer? Shoudl they be in the same directory as
perl (usr/bin/perl)?

2. How do I know what modules are included in the bioperl-pm586 I
downloaded? Specifically I want to use Bio::SeqIO.

3. What is the best way to download/install new modules as I need them?


Any answers you coudl give me for any of these questions would be greatly
appreciated!

Thank you so much, kind volunteer!
-Kate
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL: 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test2.pl
URL: 

From bosborne11 at verizon.net  Sat Jan 26 11:14:13 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Sat, 26 Jan 2008 11:14:13 -0500
Subject: [Bioperl-l] bioperl on mac
In-Reply-To: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
Message-ID: 

Katherine,

Perl keeps the addresses of all the module directories in its @INC  
array. What do you see when you do:

perl -e 'print @INC'

?

If '/sw/share/bioperl-pm586' is not in @INC then you need to put it  
there, perhaps by adding something like:

setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586

to the .tcshrc file in your home directory (if you use tcsh that is,  
most use bash, .bashrc, and 'set' these days).

You asked some other questions, the general answer is that all the  
modules you'll need are in the 2 packages you've installed, and you  
don't need to move them from /sw.


Brian O.


On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote:

> Dear one who knows,
>
> I have a macbook with Leopard OSX and I am having trouble running  
> scripts
> that call for bioperl modules.
>
> Here is my history: Using Fink I installed bioperl-pm586 version  
> 1.5.2-4
> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- 
> pm586 in
> to the command line I get nothing. Spotlight says that the path is
> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.
>
> 1. I tried to run test2.pl script that was literally copied and pasted
> from the HOWTO manual, but it wouldnt run. The two attached docs are  
> the
> script I tried to run and the output (which is nonexistant). I read
> something that said to "go in to" Bioperl to execute a command. I  
> could
> not enter the bioperl directory when it was in the sw/shared  
> directory so
> I copied the bioperl folder to the Desktop just so I could try  
> executing
> the script inside bioperl. Where am I going wrong here?
>
> Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
> somewhere else on my computer? Shoudl they be in the same directory as
> perl (usr/bin/perl)?
>
> 2. How do I know what modules are included in the bioperl-pm586 I
> downloaded? Specifically I want to use Bio::SeqIO.
>
> 3. What is the best way to download/install new modules as I need  
> them?
>
>
> Any answers you coudl give me for any of these questions would be  
> greatly
> appreciated!
>
> Thank you so much, kind volunteer!
> - 
> Kate 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason at bioperl.org  Sat Jan 26 15:30:11 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 26 Jan 2008 12:30:11 -0800
Subject: [Bioperl-l] bioperl on mac
In-Reply-To: 
References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
	
Message-ID: 

Usually this is done by fink by adding a line to your .tcshrc (if you  
are running that shell) or .bash_profile or .bashrc.

On my machine I have this at the top of my .bash_profile file:
test -r /sw/bin/init.sh && . /sw/bin/init.sh

if that is not there you need to add it to insure that all the fink  
tools are setup properly.

On Jan 26, 2008, at 8:14 AM, Brian Osborne wrote:

> Katherine,
>
> Perl keeps the addresses of all the module directories in its @INC  
> array. What do you see when you do:
>
> perl -e 'print @INC'
>
> ?
>
> If '/sw/share/bioperl-pm586' is not in @INC then you need to put it  
> there, perhaps by adding something like:
>
> setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586
>
> to the .tcshrc file in your home directory (if you use tcsh that  
> is, most use bash, .bashrc, and 'set' these days).
>
> You asked some other questions, the general answer is that all the  
> modules you'll need are in the 2 packages you've installed, and you  
> don't need to move them from /sw.
>
>
> Brian O.
>
>
> On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote:
>
>> Dear one who knows,
>>
>> I have a macbook with Leopard OSX and I am having trouble running  
>> scripts
>> that call for bioperl modules.
>>
>> Here is my history: Using Fink I installed bioperl-pm586 version  
>> 1.5.2-4
>> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- 
>> pm586 in
>> to the command line I get nothing. Spotlight says that the path is
>> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.
>>
>> 1. I tried to run test2.pl script that was literally copied and  
>> pasted
>> from the HOWTO manual, but it wouldnt run. The two attached docs  
>> are the
>> script I tried to run and the output (which is nonexistant). I read
>> something that said to "go in to" Bioperl to execute a command. I  
>> could
>> not enter the bioperl directory when it was in the sw/shared  
>> directory so
>> I copied the bioperl folder to the Desktop just so I could try  
>> executing
>> the script inside bioperl. Where am I going wrong here?
>>
>> Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
>> somewhere else on my computer? Shoudl they be in the same  
>> directory as
>> perl (usr/bin/perl)?
>>
>> 2. How do I know what modules are included in the bioperl-pm586 I
>> downloaded? Specifically I want to use Bio::SeqIO.
>>
>> 3. What is the best way to download/install new modules as I need  
>> them?
>>
>>
>> Any answers you coudl give me for any of these questions would be  
>> greatly
>> appreciated!
>>
>> Thank you so much, kind volunteer!
>> - 
>> Kate_____________________________________________ 
>> __
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason at bioperl.org  Sat Jan 26 19:14:45 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 26 Jan 2008 16:14:45 -0800
Subject: [Bioperl-l] a question on "move_id_to_bootstrap" usage
In-Reply-To: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com>
References: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com>
Message-ID: <8273f6c20801261614p312886d5x562593aa0cde60da@mail.gmail.com>

I'm not sure why you still have the __DATA__ block if you are reading data
in from a file or are you trying to send an example of the code but forgot
to specify a different input point?

If you are reading from a file that looks like the tree in the __DATA__
block you notice that the bootstrap info is encoded as the branch_length,
NOT the id - the move_id_to_bootstrap only moves the ID to the BOOTSTRAP.
you'll have to write a custom routine or just run a simple loop on your tree
to move the data to the bootstrap - it would look just the
move_id_to_bootstrap except you'd use branch_length instead of id to get the
data that you want to set in the bootstrap.  I leave it as an exercise for
the reader, but if you can't figure it out let us know.


In the future please ask your questions on the mailing list as I don't have
much time to answer questions individually when someone else can help.

-jason

On Jan 23, 2008 1:57 PM, Anand  wrote:

> HI Jason,
>
> Thanks a lot. I followed your suggestion and updated both the modules.
>
> I followed the code example on http://www.bioperl.org/wiki/HOWTO:Trees and
> tried to extract bootstrap values for my tree (which is output after
> seqboot, protdist, fitch and consense)
>
> When I try running my script, I am not able to print the bootstrap
> values...and it doesn't throw any error messages. Am I missing something?
>
> ====START of Code====
> #!/usr/bin/perl -w
> use strict;
> use lib "/home/anand/myperlmodules/lib/perl5/";
> use Bio::TreeIO;
> # $usage: $0 
>
> my $infile = shift;
>
> my $treeio = Bio::TreeIO->new(-format => 'newick',
>                          -file => $infile,
>                          -internal_node_id => 'bootstrap',
>                          );
>
> while( my $tree = $treeio->next_tree ) {
>    for my $node ( $tree->get_nodes ) {
>        printf "id: %s bootstrap: %s\n", $node->id || '', $node->bootstrap
> || '', "\n";
>    }
> }
> __END__
> ((5815_1:100.0,(((5815_5:100.0,5815_7:100.0):100.0,5815_6:100.0):97.0
> ,5815_8:100.0):
> 98.0,5815_4:100.0,5815_2:100.0):100.0,5815_3:100.0);
> ====END of Code====
>
> Thanks in advance for your time and help,
>
> Anand
>
> PS: Just to preserve formatting, I have attached the consense_output_file
>
> On Jan 22, 2008 8:02 AM, Jason Stajich  wrote:
>
> > I suspect you may want to update everything in Bio/TreeIO and Bio/
> > Tree to be safe, I'm not exactly sure what was changed - you can look
> > at the commit logs to see what else changed at the time - http://
> > code.open-bio.org/.   You can also use that same server to grab a
> > fresh checkout of what is the current state of the code base.
> >
> > -jason
> > On Jan 22, 2008, at 12:59 AM, Anand wrote:
> >
> > > Hi Jason
> > >
> > > I have a question on the method "move_id_to_bootstrap". From this
> > > post:
> > > http://portal.open-bio.org/pipermail/bioperl-guts-l/2007-May/
> > > 025718.html
> > >
> > > it looks like it has been added very recently. As luck would have
> > > it, the
> > > TreeFunctionsI.pm in my bioperl installation is missing that method.
> > >
> > > My question: What is the best method to update TreeFunctionsI.pm so
> > > that it
> > > can have the "move_id_to_bootstrap" method? Does it have other update
> > > dependencies.
> > >
> > > Thanks in advance for your help and time,
> > >
> > > Anand
> >
> >
>



-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From hlapp at duke.edu  Mon Jan 28 00:27:34 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 28 Jan 2008 00:27:34 -0500
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
References: <4795292E.4030401@sdsc.edu>
Message-ID: 

Some folks may remember that CIPRES (http://www.phylo.org) released  
their portal with access to remote execution of several phylogenetic  
tree reconstruction programs in spring last year.

It took a while but they have now also built a really nice REST-based  
API that makes the service fully programmable instead of screen- 
scraping 5 pages:

http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)

It should be relatively straightforward to build the equivalent of  
RemoteBlast on top of this. Would anyone be keen to take this on?

	-hilmar

P.S. Sorry for the cross-posting - I thought this is relevant to both  
communities. When responding in a project-specific way please make  
sure you remove the list that is no longer pertinent.


Begin forwarded message:

> From: Lucie Chan 
> Date: January 21, 2008 6:22:22 PM EST
> To: Hilmar Lapp 
> Cc: Mark Miller , Rutger Vos ,  
> Terri Liebowitz , Paul Hoover ,  
> mtholder at ku.edu
> Subject: Re: REST APIs for Cipres Web Portal
> Reply-To: lcchan at sdsc.edu
>
> Hilmar, et al.,
>
> I just released the first version of our REST Web Services API for  
> job submission, and job status query, and
> job result file retrieval. I'd like to get some feedbacks (issues,  
> problems, improvements, suggestions, etc) from you. For  
> documentation on how to access the services, check it out at:
>
> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
> API" below the "CIPRES PORTAL" banner.
>
> Lucie
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================





From cjfields at uiuc.edu  Mon Jan 28 01:04:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 00:04:46 -0600
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: 
References: <4795292E.4030401@sdsc.edu>
	
Message-ID: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>

We can certainly add it to the to-do list; just need to sort out the  
details (how often to allow posts, etc).  I guess we would want this  
in the Bio::Tools::Run namespace, same as RemoteBlast?

chris

On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:

> Some folks may remember that CIPRES (http://www.phylo.org) released  
> their portal with access to remote execution of several phylogenetic  
> tree reconstruction programs in spring last year.
>
> It took a while but they have now also built a really nice REST- 
> based API that makes the service fully programmable instead of  
> screen-scraping 5 pages:
>
> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>
> It should be relatively straightforward to build the equivalent of  
> RemoteBlast on top of this. Would anyone be keen to take this on?
>
> 	-hilmar
>
> P.S. Sorry for the cross-posting - I thought this is relevant to  
> both communities. When responding in a project-specific way please  
> make sure you remove the list that is no longer pertinent.
>
>
> Begin forwarded message:
>
>> From: Lucie Chan 
>> Date: January 21, 2008 6:22:22 PM EST
>> To: Hilmar Lapp 
>> Cc: Mark Miller , Rutger Vos ,  
>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu
>> Subject: Re: REST APIs for Cipres Web Portal
>> Reply-To: lcchan at sdsc.edu
>>
>> Hilmar, et al.,
>>
>> I just released the first version of our REST Web Services API for  
>> job submission, and job status query, and
>> job result file retrieval. I'd like to get some feedbacks (issues,  
>> problems, improvements, suggestions, etc) from you. For  
>> documentation on how to access the services, check it out at:
>>
>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>> API" below the "CIPRES PORTAL" banner.
>>
>> Lucie
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From hlapp at duke.edu  Mon Jan 28 08:42:39 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 28 Jan 2008 08:42:39 -0500
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
References: <4795292E.4030401@sdsc.edu>
	
	<7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
Message-ID: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>

Yep that's what I was thinking.

BTW the API needs multipart/form-data encoding for input (due to file  
upload); I'm assuming that that's supported well in LWP but if anyone  
knows where to start digging for that the pointer would be appreciated.

	-hilmar

On Jan 28, 2008, at 1:04 AM, Chris Fields wrote:

> We can certainly add it to the to-do list; just need to sort out  
> the details (how often to allow posts, etc).  I guess we would want  
> this in the Bio::Tools::Run namespace, same as RemoteBlast?
>
> chris
>
> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:
>
>> Some folks may remember that CIPRES (http://www.phylo.org)  
>> released their portal with access to remote execution of several  
>> phylogenetic tree reconstruction programs in spring last year.
>>
>> It took a while but they have now also built a really nice REST- 
>> based API that makes the service fully programmable instead of  
>> screen-scraping 5 pages:
>>
>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>>
>> It should be relatively straightforward to build the equivalent of  
>> RemoteBlast on top of this. Would anyone be keen to take this on?
>>
>> 	-hilmar
>>
>> P.S. Sorry for the cross-posting - I thought this is relevant to  
>> both communities. When responding in a project-specific way please  
>> make sure you remove the list that is no longer pertinent.
>>
>>
>> Begin forwarded message:
>>
>>> From: Lucie Chan 
>>> Date: January 21, 2008 6:22:22 PM EST
>>> To: Hilmar Lapp 
>>> Cc: Mark Miller , Rutger Vos ,  
>>> Terri Liebowitz , Paul Hoover ,  
>>> mtholder at ku.edu
>>> Subject: Re: REST APIs for Cipres Web Portal
>>> Reply-To: lcchan at sdsc.edu
>>>
>>> Hilmar, et al.,
>>>
>>> I just released the first version of our REST Web Services API  
>>> for job submission, and job status query, and
>>> job result file retrieval. I'd like to get some feedbacks  
>>> (issues, problems, improvements, suggestions, etc) from you. For  
>>> documentation on how to access the services, check it out at:
>>>
>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>>> API" below the "CIPRES PORTAL" banner.
>>>
>>> Lucie
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================





From cjfields at uiuc.edu  Mon Jan 28 08:50:08 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 07:50:08 -0600
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>
References: <4795292E.4030401@sdsc.edu>
	
	<7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
	<2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>
Message-ID: 

Googled it.

 From http://www.issociate.de/board/post/258535/LWP_-_multipart/form-data_file_upload_from_scalar_rather_than_local_file.html 
  :

my $ua = new LWP::UserAgent;
$response=$ua->request(POST $URL,
Content_Type => 'multipart/form-data',
Content => [ $PARAM => [undef,$FILENAME, Content => $CONTENTS ] ]);

Where $PARAM is the name of the parameter, $FILENAME is what you want
to call the file, and $CONTENTS is a scalar holding the contents of the
file.

Could probably use HTTP::Request in there, but whatever works.

chris

On Jan 28, 2008, at 7:42 AM, Hilmar Lapp wrote:

> Yep that's what I was thinking.
>
> BTW the API needs multipart/form-data encoding for input (due to  
> file upload); I'm assuming that that's supported well in LWP but if  
> anyone knows where to start digging for that the pointer would be  
> appreciated.
>
> 	-hilmar
>
> On Jan 28, 2008, at 1:04 AM, Chris Fields wrote:
>
>> We can certainly add it to the to-do list; just need to sort out  
>> the details (how often to allow posts, etc).  I guess we would want  
>> this in the Bio::Tools::Run namespace, same as RemoteBlast?
>>
>> chris
>>
>> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:
>>
>>> Some folks may remember that CIPRES (http://www.phylo.org)  
>>> released their portal with access to remote execution of several  
>>> phylogenetic tree reconstruction programs in spring last year.
>>>
>>> It took a while but they have now also built a really nice REST- 
>>> based API that makes the service fully programmable instead of  
>>> screen-scraping 5 pages:
>>>
>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>>>
>>> It should be relatively straightforward to build the equivalent of  
>>> RemoteBlast on top of this. Would anyone be keen to take this on?
>>>
>>> 	-hilmar
>>>
>>> P.S. Sorry for the cross-posting - I thought this is relevant to  
>>> both communities. When responding in a project-specific way please  
>>> make sure you remove the list that is no longer pertinent.
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: Lucie Chan 
>>>> Date: January 21, 2008 6:22:22 PM EST
>>>> To: Hilmar Lapp 
>>>> Cc: Mark Miller , Rutger Vos ,  
>>>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu
>>>> Subject: Re: REST APIs for Cipres Web Portal
>>>> Reply-To: lcchan at sdsc.edu
>>>>
>>>> Hilmar, et al.,
>>>>
>>>> I just released the first version of our REST Web Services API  
>>>> for job submission, and job status query, and
>>>> job result file retrieval. I'd like to get some feedbacks  
>>>> (issues, problems, improvements, suggestions, etc) from you. For  
>>>> documentation on how to access the services, check it out at:
>>>>
>>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>>>> API" below the "CIPRES PORTAL" banner.
>>>>
>>>> Lucie
>>>>
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>>> ===========================================================
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From shandar at nibio.go.jp  Sun Jan 27 01:50:40 2008
From: shandar at nibio.go.jp (Shandar Ahmad)
Date: Sun, 27 Jan 2008 15:50:40 +0900
Subject: [Bioperl-l] PRIB 2008
Message-ID: <1201416640.31793.7.camel@boe>

******* Our apologies if you received multiple copies ***********
If you wish not to receive PRIB 2008 related emails, please write to
Madhu Chetty 
and CC to me at shandar at nibio.go.jp
******************************************************************



PRELIMINARY CALL FOR PAPERS AND INVITED SESSIONS

********************************************************************************************
Third IAPR International Conference on Pattern Recognition in 
Bioinformatics (PRIB 2008)
October 15 ? 17, 2008
Melbourne, Australia

http://www.infotech.monash.edu.au/prib08
********************************************************************************************

PRIB 2008 is aimed at bringing together top researchers, practitioners, 
and students from around the world to discuss the applications of 
pattern recognition methods in the field of bioinformatics to solve 
problems in life sciences. Pattern recognition techniques of interest 
include: statistical, syntactic, and structural approaches, Bayesian, 
hidden Markov and graphical models, neural networks, fuzzy and genetic 
algorithms, data mining, and their hybrids. Papers in areas of (but not 
limited to) bio-sequence analysis, gene and protein expression
analysis, 
structure prediction, protein folding, docking, metabolic pathway 
analysis and regulatory networks, system biology, drug design, and 
bioimaging, are solicited for presentation at the conference.

All papers will be peer reviewed and accepted papers will be published 
in the conference proceedings as an edited volume in Lecture Notes in 
Bioinformatics by Springer. Submission of papers will be electronic and 
through the conference website. Proposals for special sessions and 
tutorials at the conference are also invited in all related areas of 
research. Authors of selected papers presented at the conference will 
also be invited for publication in Special Issues of reputed journals.

Location:
Melbourne is a sophisticated city in the south-east corner of mainland 
Australia. It is known for its attractive site seeing places, great 
events, passion for food and wine and fabulous scenery. Boasting as a 
style-setter, Melbourne is home to continuous program of festivals, art 
exhibitions and musical extravaganzas. Warning: you might never want to 
go home.

For latest information on PRIB 2008, visit the conference web site:
http://www.infotech.monash.edu.au/prib08

or email the secretariat at prib2008.melb at infotech.monash.edu.au

Important Deadlines
Paper submission: 15 April 2008
Proposals for Special Sessions/Tutorials: 15 March 2008
Author notification: 15 May 2008
Camera-ready papers: 15 June 2008


Organising Committee, PRIB 2008



From snoze.pa at gmail.com  Mon Jan 28 16:07:37 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Mon, 28 Jan 2008 15:07:37 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
Message-ID: <10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>

Still I am getting the same error message..

My question is:

Do i need to install bioperl-DB for biosql?

When I am using biosql and trying to load NCBI taxonomy then it is working
fine. but when I am trying to install bioperl-DB then it is giving me
following error message when loading NCBI taxonomy.

Any help?



Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568


From susantoroy at gmail.com  Mon Jan 28 16:05:49 2008
From: susantoroy at gmail.com (Susanta Roy)
Date: Tue, 29 Jan 2008 02:35:49 +0530
Subject: [Bioperl-l] Please remove my letter from your site
Message-ID: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>

Dear Sir,
Please remove my letter appearing at your below URL:
http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html
http://bioperl.org/pipermail/bioperl-l/2007-December.txt
http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html


It is not supposed to appear online.
Thanks in advance.

Regards
Suisanta


From cjfields at uiuc.edu  Mon Jan 28 16:53:33 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 15:53:33 -0600
Subject: [Bioperl-l] Please remove my letter from your site
In-Reply-To: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>
References: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>
Message-ID: 

Um, you posted to a public mailing list (hence the list is open to the  
public, for searching, indexing via Google, etc).  Terms of usage are  
here:

http://lists.open-bio.org/mailman/listinfo/bioperl-l

with more info here:

http://www.bioperl.org/wiki/Mailing_lists

BTW, this post will also appear.  C'est la vie!

chris

On Jan 28, 2008, at 3:05 PM, Susanta Roy wrote:

> Dear Sir,
> Please remove my letter appearing at your below URL:
> http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html
> http://bioperl.org/pipermail/bioperl-l/2007-December.txt
> http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html
>
>
> It is not supposed to appear online.
> Thanks in advance.
>
> Regards
> Suisanta
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Tue Jan 29 12:15:41 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 29 Jan 2008 11:15:41 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
Message-ID: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>

Dear Users,
I tried the to refresh installation and seems it is working. But when I
loading sequences then it is giving me following warning messages. Am i
doing alright? or i am missing huge chunk of sequences..Thanks in advance
s

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were ("","1") FKs (27,3,4)
Duplicate entry '27-3-4-1' for key 2
---------------------------------------------------
...
...
and so on


From tristan.lefebure at gmail.com  Tue Jan 29 12:19:23 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 29 Jan 2008 12:19:23 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
Message-ID: <200801291219.23172.tristan.lefebure@gmail.com>

Hello,

I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers.
I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. 
My script works well for short request, but it gives the following error with the long request:

 ------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
500 short write
Content-Type: text/plain
Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
Client-Warning: Internal response

500 short write

STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: ./fetch_from_genbank.pl:58
---------------------------------------------------------

Does that mean that we can only fetch 500 sequences at a time?
Should I split my list in 500 ids framents and submit them one after the other?

Any suggestions very welcomed...
Thanks,
-Tristan


Here is the script:

##################################
use strict;
use warnings;
use Bio::DB::GenBank;
# use Bio::DB::EUtilities;
use Bio::SeqIO;
use Getopt::Long;

# 2008-01-22 T Lefebure
# I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank.
# The following procedure is not really good as the stream is first copied to a temporary file,
# and than re-used by BioPerl to generate the final file.

my $db = 'nucleotide';
my $format = 'genbank';
my $help= '';
my $dformat = 'gb';

GetOptions(
	'help|?' => \$help,
	'format=s'  => \$format,
	'database=s'	=> \$db,
);


my $printhelp = "\nUsage: $0 [options]  

Will download the corresponding data from GenBank. BioPerl is required.

Options:
	-h
		print this help
	-format: genbank|fasta|...
		give output format (default=genbank)
	-database: nucleotide|genome|protein|...
		define the database to search in (default=nucleotide)

The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";

if ($#ARGV<1) {
	print $printhelp;
	exit;
}

open LIST, $ARGV[0];
my @list = ;

if ($format eq 'fasta') { $dformat = 'fasta' }

my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
				-format => $dformat,
				-db => $db,
			);
my $seqio = $gb->get_Stream_by_acc(\@list);

my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
				-format => $format,
			);
while (my $seqo = $seqio->next_seq ) {
	print $seqo->id, "\n";
	$seqout->write_seq($seqo);
}


From cjfields at uiuc.edu  Tue Jan 29 13:06:08 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 29 Jan 2008 12:06:08 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>

Yes, you can only retrieve ~500 sequences at a time using either  
Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities  
interact with NCBI's EUtilities (the former module returns raw data  
from the URL to be processed later, the latter module returns Bio::Seq/ 
Bio::SeqIO objects).

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets

You can usually post more IDs using epost and fetch sequence referring  
to the WebEnv/key combo (batch posting).  I try to make this a bit  
easier with EUtilities but it is woefully lacking in documentation (my  
fault), but there is some code up on the wiki which should work.

chris

On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:

> Hello,
>
> I would like to download a large number of sequences from GenBank  
> (122,146 to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and  
> finally used Bio::DB::GenBank.
> My script works well for short request, but it gives the following  
> error with the long request:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after  
> the other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get  
> back to Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first  
> copied to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
> 	'help|?' => \$help,
> 	'format=s'  => \$format,
> 	'database=s'	=> \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is  
> required.
>
> Options:
> 	-h
> 		print this help
> 	-format: genbank|fasta|...
> 		give output format (default=genbank)
> 	-database: nucleotide|genome|protein|...
> 		define the database to search in (default=nucleotide)
>
> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html 
> \n";
>
> if ($#ARGV<1) {
> 	print $printhelp;
> 	exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 				-format => $dformat,
> 				-db => $db,
> 			);
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> 				-format => $format,
> 			);
> while (my $seqo = $seqio->next_seq ) {
> 	print $seqo->id, "\n";
> 	$seqout->write_seq($seqo);
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Tue Jan 29 13:22:56 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 29 Jan 2008 12:22:56 -0600
Subject: [Bioperl-l] loading sequence error bioseq
Message-ID: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>

Dear User,

 After successfully creating a database bioseqdb and loading ncbi_taxonomy
successfully I am getting following error message while loading sequences
into database.

load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc

MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were ("","31") FKs
MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were
MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were

Column 'dbname' cannot be null

STACK: /usr/local/bioperl-
db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620
-----------------------------------------------------------

 at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl line
633

Any Idea?

Thanks in advance
s


From cjfields at uiuc.edu  Tue Jan 29 13:44:16 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 29 Jan 2008 12:44:16 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <479F7149.1010203@atgc.org>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
	<479F7149.1010203@atgc.org>
Message-ID: 

Forgot about that one; it's definitely a better way to do it if you  
have the GI/accessions.

chris

On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:

> you don't need to use bioperl to accomplish this task, to download  
> several thousand sequences based on accession ID list.
>
> NCBI batch Entrez can do that:
> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>
> just submit a large list of IDs, select database, and download.
>
> you can submit ~50,000 IDs in one file usually without problems.
> it may not return results if a list is larger than ~100,000 IDs
>
> --
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 Health Sciences Drive
> Genome Center, 4-th floor, room 4302
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email#1: akozik at atgc.org
> email#2: akozik at gmail.com
> web: http://www.atgc.org/
>
>
>
> Chris Fields wrote:
>> Yes, you can only retrieve ~500 sequences at a time using either  
>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities  
>> interact with NCBI's EUtilities (the former module returns raw data  
>> from the URL to be processed later, the latter module returns  
>> Bio::Seq/Bio::SeqIO objects).
>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets 
>>  You can usually post more IDs using epost and fetch sequence  
>> referring to the WebEnv/key combo (batch posting).  I try to make  
>> this a bit easier with EUtilities but it is woefully lacking in  
>> documentation (my fault), but there is some code up on the wiki  
>> which should work.
>> chris
>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>> Hello,
>>>
>>> I would like to download a large number of sequences from GenBank  
>>> (122,146 to be exact) following a list of accession numbers.
>>> I first investigated around Bio::DB::EUtilities, but got lost and  
>>> finally used Bio::DB::GenBank.
>>> My script works well for short request, but it gives the following  
>>> error with the long request:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: WebDBSeqI Request Error:
>>> 500 short write
>>> Content-Type: text/plain
>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>> Client-Warning: Internal response
>>>
>>> 500 short write
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
>>> Root.pm:359
>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ 
>>> Bio/DB/WebDBSeqI.pm:685
>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ 
>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>> STACK: ./fetch_from_genbank.pl:58
>>> ---------------------------------------------------------
>>>
>>> Does that mean that we can only fetch 500 sequences at a time?
>>> Should I split my list in 500 ids framents and submit them one  
>>> after the other?
>>>
>>> Any suggestions very welcomed...
>>> Thanks,
>>> -Tristan
>>>
>>>
>>> Here is the script:
>>>
>>> ##################################
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GenBank;
>>> # use Bio::DB::EUtilities;
>>> use Bio::SeqIO;
>>> use Getopt::Long;
>>>
>>> # 2008-01-22 T Lefebure
>>> # I tried to use Bio::DB::EUtilities without much succes and get  
>>> back to Bio::DB::GenBank.
>>> # The following procedure is not really good as the stream is  
>>> first copied to a temporary file,
>>> # and than re-used by BioPerl to generate the final file.
>>>
>>> my $db = 'nucleotide';
>>> my $format = 'genbank';
>>> my $help= '';
>>> my $dformat = 'gb';
>>>
>>> GetOptions(
>>>    'help|?' => \$help,
>>>    'format=s'  => \$format,
>>>    'database=s'    => \$db,
>>> );
>>>
>>>
>>> my $printhelp = "\nUsage: $0 [options]  
>>>
>>> Will download the corresponding data from GenBank. BioPerl is  
>>> required.
>>>
>>> Options:
>>>    -h
>>>        print this help
>>>    -format: genbank|fasta|...
>>>        give output format (default=genbank)
>>>    -database: nucleotide|genome|protein|...
>>>        define the database to search in (default=nucleotide)
>>>
>>> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html 
>>> \n";
>>>
>>> if ($#ARGV<1) {
>>>    print $printhelp;
>>>    exit;
>>> }
>>>
>>> open LIST, $ARGV[0];
>>> my @list = ;
>>>
>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>
>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>                -format => $dformat,
>>>                -db => $db,
>>>            );
>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>
>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>                -format => $format,
>>>            );
>>> while (my $seqo = $seqio->next_seq ) {
>>>    print $seqo->id, "\n";
>>>    $seqout->write_seq($seqo);
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From akozik at atgc.org  Tue Jan 29 13:32:41 2008
From: akozik at atgc.org (Alexander Kozik)
Date: Tue, 29 Jan 2008 10:32:41 -0800
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
Message-ID: <479F7149.1010203@atgc.org>

you don't need to use bioperl to accomplish this task, to download 
several thousand sequences based on accession ID list.

NCBI batch Entrez can do that:
http://www.ncbi.nlm.nih.gov/sites/batchentrez

just submit a large list of IDs, select database, and download.

you can submit ~50,000 IDs in one file usually without problems.
it may not return results if a list is larger than ~100,000 IDs

--
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 Health Sciences Drive
Genome Center, 4-th floor, room 4302
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/



Chris Fields wrote:
> Yes, you can only retrieve ~500 sequences at a time using either 
> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities 
> interact with NCBI's EUtilities (the former module returns raw data from 
> the URL to be processed later, the latter module returns 
> Bio::Seq/Bio::SeqIO objects).
> 
> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets 
> 
> 
> You can usually post more IDs using epost and fetch sequence referring 
> to the WebEnv/key combo (batch posting).  I try to make this a bit 
> easier with EUtilities but it is woefully lacking in documentation (my 
> fault), but there is some code up on the wiki which should work.
> 
> chris
> 
> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> 
>> Hello,
>>
>> I would like to download a large number of sequences from GenBank 
>> (122,146 to be exact) following a list of accession numbers.
>> I first investigated around Bio::DB::EUtilities, but got lost and 
>> finally used Bio::DB::GenBank.
>> My script works well for short request, but it gives the following 
>> error with the long request:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: WebDBSeqI Request Error:
>> 500 short write
>> Content-Type: text/plain
>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>> Client-Warning: Internal response
>>
>> 500 short write
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::DB::WebDBSeqI::_request 
>> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
>> STACK: Bio::DB::WebDBSeqI::get_seq_stream 
>> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc 
>> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>> STACK: ./fetch_from_genbank.pl:58
>> ---------------------------------------------------------
>>
>> Does that mean that we can only fetch 500 sequences at a time?
>> Should I split my list in 500 ids framents and submit them one after 
>> the other?
>>
>> Any suggestions very welcomed...
>> Thanks,
>> -Tristan
>>
>>
>> Here is the script:
>>
>> ##################################
>> use strict;
>> use warnings;
>> use Bio::DB::GenBank;
>> # use Bio::DB::EUtilities;
>> use Bio::SeqIO;
>> use Getopt::Long;
>>
>> # 2008-01-22 T Lefebure
>> # I tried to use Bio::DB::EUtilities without much succes and get back 
>> to Bio::DB::GenBank.
>> # The following procedure is not really good as the stream is first 
>> copied to a temporary file,
>> # and than re-used by BioPerl to generate the final file.
>>
>> my $db = 'nucleotide';
>> my $format = 'genbank';
>> my $help= '';
>> my $dformat = 'gb';
>>
>> GetOptions(
>>     'help|?' => \$help,
>>     'format=s'  => \$format,
>>     'database=s'    => \$db,
>> );
>>
>>
>> my $printhelp = "\nUsage: $0 [options]  
>>
>> Will download the corresponding data from GenBank. BioPerl is required.
>>
>> Options:
>>     -h
>>         print this help
>>     -format: genbank|fasta|...
>>         give output format (default=genbank)
>>     -database: nucleotide|genome|protein|...
>>         define the database to search in (default=nucleotide)
>>
>> The full description of the options can be find at 
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";
>>
>> if ($#ARGV<1) {
>>     print $printhelp;
>>     exit;
>> }
>>
>> open LIST, $ARGV[0];
>> my @list = ;
>>
>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>
>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>                 -format => $dformat,
>>                 -db => $db,
>>             );
>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>
>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>                 -format => $format,
>>             );
>> while (my $seqo = $seqio->next_seq ) {
>>     print $seqo->id, "\n";
>>     $seqout->write_seq($seqo);
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Tue Jan 29 16:31:47 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 29 Jan 2008 16:31:47 -0500
Subject: [Bioperl-l] loading sequence error bioseq
In-Reply-To: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>
References: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>
Message-ID: 

This looks suspiciously like a data error. Can you please give the  
full command line. This should also show which format your sequences  
are in.

	-hilmar

On Jan 29, 2008, at 1:22 PM, snoze pa wrote:

> Dear User,
>
>  After successfully creating a database bioseqdb and loading  
> ncbi_taxonomy
> successfully I am getting following error message while loading  
> sequences
> into database.
>
> load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc
>
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were ("","31") FKs
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values  
> were
>
> Column 'dbname' cannot be null
>
> STACK: /usr/local/bioperl-
> db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620
> -----------------------------------------------------------
>
>  at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/ 
> load_seqdatabase.pl line
> 633
>
> Any Idea?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From hlapp at gmx.net  Tue Jan 29 16:40:21 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 29 Jan 2008 16:40:21 -0500
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
	<10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
Message-ID: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net>

This would mean that two or more seqfeatures with the same type for  
the same sequence exist in the input data, each with rank 1.

Normally the rank will be incremented for each seqfeature of a  
sequence, so I'm not sure how this is happening here w/o seeing the  
data.

	-hilmar
On Jan 29, 2008, at 12:15 PM, snoze pa wrote:

> Dear Users,
> I tried the to refresh installation and seems it is working. But  
> when I
> loading sequences then it is giving me following warning messages.  
> Am i
> doing alright? or i am missing huge chunk of sequences..Thanks in  
> advance
> s
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were ("","1") FKs (27,3,4)
> Duplicate entry '27-3-4-1' for key 2
> ---------------------------------------------------
> ...
> ...
> and so on
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From avilella at gmail.com  Wed Jan 30 04:28:34 2008
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 30 Jan 2008 09:28:34 +0000
Subject: [Bioperl-l] fetch dna seqs from genbank protein ids
Message-ID: <358f4d650801300128q44cf95a0va11799908c4f26a0@mail.gmail.com>

Hi bioperlers,

Got a question here:

>I have a bunch of protein sequences in multi-FastA with their
>accession numbers in the header and I want to retrieve their
>corresponding nucleotide sequences and nucleotide accession numbers.
>I can't seem to find a way to do it. I am looking at eUtils on the
>NCBI site, but they only do really simple stuff.

I had a look at the fetch example scripts, and I could fetch proteins
from Genbank,
but I don't see a clear connection between the protein sequence and
the DNA sequence.
Is this a DBlink? Which type?

Cheers,

    Albert.


From tristan.lefebure at gmail.com  Wed Jan 30 09:56:07 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 30 Jan 2008 09:56:07 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: 
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
Message-ID: <200801300956.07849.tristan.lefebure@gmail.com>

Thank you both!

Just in case it might be usefull for someone else, here are my ramblings:

1. I first tried to adapt my script and fetch 500 sequences at a time. It works, except that ~40% of the time NCBI gives the following error and my script crashed:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
[...]
    The proxy server received an invalid
    response from an upstream server.
[...]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: ./fetch_from_genbank.pl:68
-----------------------------------------------------------

I tried to modify the script so that when the retrieval of a 500 sequence block crashes, it continues with the other blocks, but I was unsuccessfull. It probably needs some better understanding of BioPerl errors...
Here is the section of the script that was modified:
#########
my $n_seq = scalar @list;
my @aborted;

for (my $i=1; $i<=$n_seq; $i += 500) {
	print "Fetching sequences $i to ", $i+499, ": ";
	my $start = $i -1;
	my $end = $i + 500 -1;
	my @red_list = @list[$start .. $end]; 
	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
					-format => $dformat,
					-db => $db,
				);

	my $seqio;
	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
		print "Aborted, resubmit latter\n";
		push @aborted, @red_list;
		next;
	}
	
	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
					-format => $format,
				);
	while (my $seqo = $seqio->next_seq ) {
# 		print $seqo->id, "\n";
		$seqout->write_seq($seqo);
	}
	print "Done\n";
}

if (@aborted) {
	open OUT, ">aborted_fetching.AN";
	foreach (@aborted) { print OUT $_ };
}
##########


2. So I moved to the second solution and tried batchentrez. I cut my 120,000 long AN list into 10,000 long pieces using split:
split -l 10000 full_list.AN splitted_list_

and then submitted the 13 lists one by one. I must say that I don't really like using a web-interface to fetch data, and here the most ennoying part is that you end up with a regular Entrez/GenBank webpage: select your format, export to file, chosse file name... and have to do it many times.
It is too much prone to human and web-browser errors for my taste, but it worked.
Nevertheless there is some caveats: 
- some downloaded files were incomplete (~10%) and you have to restart it
- you can't submit several lists in the same time (otherwise the same cookie will be used and you'll end up with several identical files) 

-Tristan

On Tuesday 29 January 2008 13:44:16 you wrote:
> Forgot about that one; it's definitely a better way to do it if you
> have the GI/accessions.
>
> chris
>
> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
> > you don't need to use bioperl to accomplish this task, to download
> > several thousand sequences based on accession ID list.
> >
> > NCBI batch Entrez can do that:
> > http://www.ncbi.nlm.nih.gov/sites/batchentrez
> >
> > just submit a large list of IDs, select database, and download.
> >
> > you can submit ~50,000 IDs in one file usually without problems.
> > it may not return results if a list is larger than ~100,000 IDs
> >
> > --
> > Alexander Kozik
> > Bioinformatics Specialist
> > Genome and Biomedical Sciences Facility
> > 451 Health Sciences Drive
> > Genome Center, 4-th floor, room 4302
> > University of California
> > Davis, CA 95616-8816
> > Phone: (530) 754-9127
> > email#1: akozik at atgc.org
> > email#2: akozik at gmail.com
> > web: http://www.atgc.org/
> >
> > Chris Fields wrote:
> >> Yes, you can only retrieve ~500 sequences at a time using either
> >> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
> >> interact with NCBI's EUtilities (the former module returns raw data
> >> from the URL to be processed later, the latter module returns
> >> Bio::Seq/Bio::SeqIO objects).
> >> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
> >>atasets You can usually post more IDs using epost and fetch sequence
> >> referring to the WebEnv/key combo (batch posting).  I try to make
> >> this a bit easier with EUtilities but it is woefully lacking in
> >> documentation (my fault), but there is some code up on the wiki
> >> which should work.
> >> chris
> >>
> >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> >>> Hello,
> >>>
> >>> I would like to download a large number of sequences from GenBank
> >>> (122,146 to be exact) following a list of accession numbers.
> >>> I first investigated around Bio::DB::EUtilities, but got lost and
> >>> finally used Bio::DB::GenBank.
> >>> My script works well for short request, but it gives the following
> >>> error with the long request:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: WebDBSeqI Request Error:
> >>> 500 short write
> >>> Content-Type: text/plain
> >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> >>> Client-Warning: Internal response
> >>>
> >>> 500 short write
> >>>
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/
> >>> Root.pm:359
> >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
> >>> Bio/DB/WebDBSeqI.pm:685
> >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
> >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
> >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> >>> STACK: ./fetch_from_genbank.pl:58
> >>> ---------------------------------------------------------
> >>>
> >>> Does that mean that we can only fetch 500 sequences at a time?
> >>> Should I split my list in 500 ids framents and submit them one
> >>> after the other?
> >>>
> >>> Any suggestions very welcomed...
> >>> Thanks,
> >>> -Tristan
> >>>
> >>>
> >>> Here is the script:
> >>>
> >>> ##################################
> >>> use strict;
> >>> use warnings;
> >>> use Bio::DB::GenBank;
> >>> # use Bio::DB::EUtilities;
> >>> use Bio::SeqIO;
> >>> use Getopt::Long;
> >>>
> >>> # 2008-01-22 T Lefebure
> >>> # I tried to use Bio::DB::EUtilities without much succes and get
> >>> back to Bio::DB::GenBank.
> >>> # The following procedure is not really good as the stream is
> >>> first copied to a temporary file,
> >>> # and than re-used by BioPerl to generate the final file.
> >>>
> >>> my $db = 'nucleotide';
> >>> my $format = 'genbank';
> >>> my $help= '';
> >>> my $dformat = 'gb';
> >>>
> >>> GetOptions(
> >>>    'help|?' => \$help,
> >>>    'format=s'  => \$format,
> >>>    'database=s'    => \$db,
> >>> );
> >>>
> >>>
> >>> my $printhelp = "\nUsage: $0 [options]  
> >>>
> >>> Will download the corresponding data from GenBank. BioPerl is
> >>> required.
> >>>
> >>> Options:
> >>>    -h
> >>>        print this help
> >>>    -format: genbank|fasta|...
> >>>        give output format (default=genbank)
> >>>    -database: nucleotide|genome|protein|...
> >>>        define the database to search in (default=nucleotide)
> >>>
> >>> The full description of the options can be find at
> >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
> >>> \n";
> >>>
> >>> if ($#ARGV<1) {
> >>>    print $printhelp;
> >>>    exit;
> >>> }
> >>>
> >>> open LIST, $ARGV[0];
> >>> my @list = ;
> >>>
> >>> if ($format eq 'fasta') { $dformat = 'fasta' }
> >>>
> >>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
> >>>                -format => $dformat,
> >>>                -db => $db,
> >>>            );
> >>> my $seqio = $gb->get_Stream_by_acc(\@list);
> >>>
> >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> >>>                -format => $format,
> >>>            );
> >>> while (my $seqo = $seqio->next_seq ) {
> >>>    print $seqo->id, "\n";
> >>>    $seqout->write_seq($seqo);
> >>> }
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign




From cjfields at uiuc.edu  Wed Jan 30 10:10:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 09:10:14 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: <7143A650-AA84-4331-B55A-A66C3F5BBAB0@uiuc.edu>

You can use an eval {} block to catch the error, then redo the loop  
(so you don't iterate to the next block) or use next and skip the  
current block if an error occurs.  If you use redo then you should use  
a counter to exit the loop after several tries.

chris

On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote:

> Thank you both!
>
> Just in case it might be usefull for someone else, here are my  
> ramblings:
>
> 1. I first tried to adapt my script and fetch 500 sequences at a  
> time. It works, except that ~40% of the time NCBI gives the  
> following error and my script crashed:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>    The proxy server received an invalid
>    response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
>
> I tried to modify the script so that when the retrieval of a 500  
> sequence block crashes, it continues with the other blocks, but I  
> was unsuccessfull. It probably needs some better understanding of  
> BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
>
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
>
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 	
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
>
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
>
>
> 2. So I moved to the second solution and tried batchentrez. I cut my  
> 120,000 long AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
>
> and then submitted the 13 lists one by one. I must say that I don't  
> really like using a web-interface to fetch data, and here the most  
> ennoying part is that you end up with a regular Entrez/GenBank  
> webpage: select your format, export to file, chosse file name... and  
> have to do it many times.
> It is too much prone to human and web-browser errors for my taste,  
> but it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to  
> restart it
> - you can't submit several lists in the same time (otherwise the  
> same cookie will be used and you'll end up with several identical  
> files)
>
> -Tristan
>
> On Tuesday 29 January 2008 13:44:16 you wrote:
>> Forgot about that one; it's definitely a better way to do it if you
>> have the GI/accessions.
>>
>> chris
>>
>> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
>>> you don't need to use bioperl to accomplish this task, to download
>>> several thousand sequences based on accession ID list.
>>>
>>> NCBI batch Entrez can do that:
>>> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>>>
>>> just submit a large list of IDs, select database, and download.
>>>
>>> you can submit ~50,000 IDs in one file usually without problems.
>>> it may not return results if a list is larger than ~100,000 IDs
>>>
>>> --
>>> Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 Health Sciences Drive
>>> Genome Center, 4-th floor, room 4302
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>> Chris Fields wrote:
>>>> Yes, you can only retrieve ~500 sequences at a time using either
>>>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
>>>> interact with NCBI's EUtilities (the former module returns raw data
>>>> from the URL to be processed later, the latter module returns
>>>> Bio::Seq/Bio::SeqIO objects).
>>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
>>>> atasets You can usually post more IDs using epost and fetch  
>>>> sequence
>>>> referring to the WebEnv/key combo (batch posting).  I try to make
>>>> this a bit easier with EUtilities but it is woefully lacking in
>>>> documentation (my fault), but there is some code up on the wiki
>>>> which should work.
>>>> chris
>>>>
>>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to download a large number of sequences from GenBank
>>>>> (122,146 to be exact) following a list of accession numbers.
>>>>> I first investigated around Bio::DB::EUtilities, but got lost and
>>>>> finally used Bio::DB::GenBank.
>>>>> My script works well for short request, but it gives the following
>>>>> error with the long request:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: WebDBSeqI Request Error:
>>>>> 500 short write
>>>>> Content-Type: text/plain
>>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>>>> Client-Warning: Internal response
>>>>>
>>>>> 500 short write
>>>>>
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ 
>>>>> Root/
>>>>> Root.pm:359
>>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
>>>>> Bio/DB/WebDBSeqI.pm:685
>>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
>>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
>>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>>>> STACK: ./fetch_from_genbank.pl:58
>>>>> ---------------------------------------------------------
>>>>>
>>>>> Does that mean that we can only fetch 500 sequences at a time?
>>>>> Should I split my list in 500 ids framents and submit them one
>>>>> after the other?
>>>>>
>>>>> Any suggestions very welcomed...
>>>>> Thanks,
>>>>> -Tristan
>>>>>
>>>>>
>>>>> Here is the script:
>>>>>
>>>>> ##################################
>>>>> use strict;
>>>>> use warnings;
>>>>> use Bio::DB::GenBank;
>>>>> # use Bio::DB::EUtilities;
>>>>> use Bio::SeqIO;
>>>>> use Getopt::Long;
>>>>>
>>>>> # 2008-01-22 T Lefebure
>>>>> # I tried to use Bio::DB::EUtilities without much succes and get
>>>>> back to Bio::DB::GenBank.
>>>>> # The following procedure is not really good as the stream is
>>>>> first copied to a temporary file,
>>>>> # and than re-used by BioPerl to generate the final file.
>>>>>
>>>>> my $db = 'nucleotide';
>>>>> my $format = 'genbank';
>>>>> my $help= '';
>>>>> my $dformat = 'gb';
>>>>>
>>>>> GetOptions(
>>>>>   'help|?' => \$help,
>>>>>   'format=s'  => \$format,
>>>>>   'database=s'    => \$db,
>>>>> );
>>>>>
>>>>>
>>>>> my $printhelp = "\nUsage: $0 [options]   
>>>>> 
>>>>>
>>>>> Will download the corresponding data from GenBank. BioPerl is
>>>>> required.
>>>>>
>>>>> Options:
>>>>>   -h
>>>>>       print this help
>>>>>   -format: genbank|fasta|...
>>>>>       give output format (default=genbank)
>>>>>   -database: nucleotide|genome|protein|...
>>>>>       define the database to search in (default=nucleotide)
>>>>>
>>>>> The full description of the options can be find at
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ 
>>>>> efetchseq_help.html
>>>>> \n";
>>>>>
>>>>> if ($#ARGV<1) {
>>>>>   print $printhelp;
>>>>>   exit;
>>>>> }
>>>>>
>>>>> open LIST, $ARGV[0];
>>>>> my @list = ;
>>>>>
>>>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>>>
>>>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>>>               -format => $dformat,
>>>>>               -db => $db,
>>>>>           );
>>>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>>>
>>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>>>               -format => $format,
>>>>>           );
>>>>> while (my $seqo = $seqio->next_seq ) {
>>>>>   print $seqo->id, "\n";
>>>>>   $seqout->write_seq($seqo);
>>>>> }
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Wed Jan 30 12:34:24 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 11:34:24 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
	<10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
	<31534016-91B3-45C0-995D-CE5A82466303@gmx.net>
Message-ID: <10f848910801300934q57e5d45cpbf0e17b45640e3f9@mail.gmail.com>

Hilmar,

The command I am using is following

load_seqdatabase.pl -host localhost -namespace bioperl -dbname bioseqdb
-dbuser root -format genbank sequences.txt

I have no idea why i am getting that error

thanks in advance


On Jan 29, 2008 3:40 PM, Hilmar Lapp  wrote:

> This would mean that two or more seqfeatures with the same type for
> the same sequence exist in the input data, each with rank 1.
>
> Normally the rank will be incremented for each seqfeature of a
> sequence, so I'm not sure how this is happening here w/o seeing the
> data.
>
>        -hilmar
> On Jan 29, 2008, at 12:15 PM, snoze pa wrote:
>
> > Dear Users,
> > I tried the to refresh installation and seems it is working. But
> > when I
> > loading sequences then it is giving me following warning messages.
> > Am i
> > doing alright? or i am missing huge chunk of sequences..Thanks in
> > advance
> > s
> >
> > -------------------- WARNING ---------------------
> > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,
> > values
> > were ("","1") FKs (27,3,4)
> > Duplicate entry '27-3-4-1' for key 2
> > ---------------------------------------------------
> > ...
> > ...
> > and so on
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>


From snoze.pa at gmail.com  Wed Jan 30 13:01:46 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 12:01:46 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: <10f848910801301001k681e1291we0ce468e96d88f57@mail.gmail.com>

U can use LWP one line code to grab sequences..

On Jan 29, 2008 11:19 AM, Tristan Lefebure 
wrote:

> Hello,
>
> I would like to download a large number of sequences from GenBank (122,146
> to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and finally
> used Bio::DB::GenBank.
> My script works well for short request, but it gives the following error
> with the long request:
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after the
> other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get back to
> Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first copied
> to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
>        'help|?' => \$help,
>        'format=s'  => \$format,
>        'database=s'    => \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is required.
>
> Options:
>        -h
>                print this help
>        -format: genbank|fasta|...
>                give output format (default=genbank)
>        -database: nucleotide|genome|protein|...
>                define the database to search in (default=nucleotide)
>
> The full description of the options can be find at
> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n
> ";
>
> if ($#ARGV<1) {
>        print $printhelp;
>        exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(  -retrievaltype => 'tempfile',
>                                -format => $dformat,
>                                -db => $db,
>                        );
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>                                -format => $format,
>                        );
> while (my $seqo = $seqio->next_seq ) {
>        print $seqo->id, "\n";
>        $seqout->write_seq($seqo);
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From snoze.pa at gmail.com  Wed Jan 30 13:38:12 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 12:38:12 -0600
Subject: [Bioperl-l] load_seqdatabase help
Message-ID: <10f848910801301038t1ae296c2o2453728b68dc81f8@mail.gmail.com>

Dear User,
 Is there any alternative way so that I can load following sequence in to
biosql schema. I am trying to use load_seqdatabase.pl but it is not working
in my case and showing numbers of warning/error messages.. I did everything
but unable to load it yet.

http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb



Any help, if i can load above sequence into my bioseqdb database.

Thanks in advance
s


From snoze.pa at gmail.com  Wed Jan 30 14:30:22 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 13:30:22 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
Message-ID: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>

Hi Hilmar,

 After spending lots of time i figure out the error. I am able to load
sequences if the sequences do not have following entry

xrefs (non-sequence databases):

If the Genbank sequence have this entry then script load_seqdatabase.pl is
crashing. I try it in couple of sequences and found it is the culprit line
genbank format.  But this line is important as it contain lots of
information... so I am wondering how to solve this problem

Any help?

Thanks in advance
s


From Russell.Smithies at agresearch.co.nz  Wed Jan 30 14:34:44 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 31 Jan 2008 08:34:44 +1300
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com><479F7149.1010203@atgc.org>
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: 

Take a look at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi
Ebot is an interactive tool that generates a Perl script that implements
an E-utility pipeline.
You can probably hack the resulting script to introduce the required
BioPerly bits.

Russell Smithies 

Bioinformatics Software Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Tristan Lefebure
> Sent: Thursday, 31 January 2008 3:56 a.m.
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::GenBank and large number of requests
> 
> Thank you both!
> 
> Just in case it might be usefull for someone else, here are my
ramblings:
> 
> 1. I first tried to adapt my script and fetch 500 sequences at a time.
It works,
> except that ~40% of the time NCBI gives the following error and my
script crashed:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>     The proxy server received an invalid
>     response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
> 
> I tried to modify the script so that when the retrieval of a 500
sequence block
> crashes, it continues with the other blocks, but I was unsuccessfull.
It probably
> needs some better understanding of BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
> 
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
> 
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
> 
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
> 
> 
> 2. So I moved to the second solution and tried batchentrez. I cut my
120,000 long
> AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
> 
> and then submitted the 13 lists one by one. I must say that I don't
really like using
> a web-interface to fetch data, and here the most ennoying part is that
you end up
> with a regular Entrez/GenBank webpage: select your format, export to
file, chosse
> file name... and have to do it many times.
> It is too much prone to human and web-browser errors for my taste, but
it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to restart
it
> - you can't submit several lists in the same time (otherwise the same
cookie will be
> used and you'll end up with several identical files)
> 
> -Tristan
> 
> On Tuesday 29 January 2008 13:44:16 you wrote:
> > Forgot about that one; it's definitely a better way to do it if you
> > have the GI/accessions.
> >
> > chris
> >
> > On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
> > > you don't need to use bioperl to accomplish this task, to download
> > > several thousand sequences based on accession ID list.
> > >
> > > NCBI batch Entrez can do that:
> > > http://www.ncbi.nlm.nih.gov/sites/batchentrez
> > >
> > > just submit a large list of IDs, select database, and download.
> > >
> > > you can submit ~50,000 IDs in one file usually without problems.
> > > it may not return results if a list is larger than ~100,000 IDs
> > >
> > > --
> > > Alexander Kozik
> > > Bioinformatics Specialist
> > > Genome and Biomedical Sciences Facility
> > > 451 Health Sciences Drive
> > > Genome Center, 4-th floor, room 4302
> > > University of California
> > > Davis, CA 95616-8816
> > > Phone: (530) 754-9127
> > > email#1: akozik at atgc.org
> > > email#2: akozik at gmail.com
> > > web: http://www.atgc.org/
> > >
> > > Chris Fields wrote:
> > >> Yes, you can only retrieve ~500 sequences at a time using either
> > >> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
> > >> interact with NCBI's EUtilities (the former module returns raw
data
> > >> from the URL to be processed later, the latter module returns
> > >> Bio::Seq/Bio::SeqIO objects).
> > >>
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
> > >>atasets You can usually post more IDs using epost and fetch
sequence
> > >> referring to the WebEnv/key combo (batch posting).  I try to make
> > >> this a bit easier with EUtilities but it is woefully lacking in
> > >> documentation (my fault), but there is some code up on the wiki
> > >> which should work.
> > >> chris
> > >>
> > >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> > >>> Hello,
> > >>>
> > >>> I would like to download a large number of sequences from
GenBank
> > >>> (122,146 to be exact) following a list of accession numbers.
> > >>> I first investigated around Bio::DB::EUtilities, but got lost
and
> > >>> finally used Bio::DB::GenBank.
> > >>> My script works well for short request, but it gives the
following
> > >>> error with the long request:
> > >>>
> > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>> MSG: WebDBSeqI Request Error:
> > >>> 500 short write
> > >>> Content-Type: text/plain
> > >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> > >>> Client-Warning: Internal response
> > >>>
> > >>> 500 short write
> > >>>
> > >>> STACK: Error::throw
> > >>> STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/
> > >>> Root.pm:359
> > >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
> > >>> Bio/DB/WebDBSeqI.pm:685
> > >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
> > >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> > >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
> > >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> > >>> STACK: ./fetch_from_genbank.pl:58
> > >>> ---------------------------------------------------------
> > >>>
> > >>> Does that mean that we can only fetch 500 sequences at a time?
> > >>> Should I split my list in 500 ids framents and submit them one
> > >>> after the other?
> > >>>
> > >>> Any suggestions very welcomed...
> > >>> Thanks,
> > >>> -Tristan
> > >>>
> > >>>
> > >>> Here is the script:
> > >>>
> > >>> ##################################
> > >>> use strict;
> > >>> use warnings;
> > >>> use Bio::DB::GenBank;
> > >>> # use Bio::DB::EUtilities;
> > >>> use Bio::SeqIO;
> > >>> use Getopt::Long;
> > >>>
> > >>> # 2008-01-22 T Lefebure
> > >>> # I tried to use Bio::DB::EUtilities without much succes and get
> > >>> back to Bio::DB::GenBank.
> > >>> # The following procedure is not really good as the stream is
> > >>> first copied to a temporary file,
> > >>> # and than re-used by BioPerl to generate the final file.
> > >>>
> > >>> my $db = 'nucleotide';
> > >>> my $format = 'genbank';
> > >>> my $help= '';
> > >>> my $dformat = 'gb';
> > >>>
> > >>> GetOptions(
> > >>>    'help|?' => \$help,
> > >>>    'format=s'  => \$format,
> > >>>    'database=s'    => \$db,
> > >>> );
> > >>>
> > >>>
> > >>> my $printhelp = "\nUsage: $0 [options] 

> > >>>
> > >>> Will download the corresponding data from GenBank. BioPerl is
> > >>> required.
> > >>>
> > >>> Options:
> > >>>    -h
> > >>>        print this help
> > >>>    -format: genbank|fasta|...
> > >>>        give output format (default=genbank)
> > >>>    -database: nucleotide|genome|protein|...
> > >>>        define the database to search in (default=nucleotide)
> > >>>
> > >>> The full description of the options can be find at
> > >>>
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
> > >>> \n";
> > >>>
> > >>> if ($#ARGV<1) {
> > >>>    print $printhelp;
> > >>>    exit;
> > >>> }
> > >>>
> > >>> open LIST, $ARGV[0];
> > >>> my @list = ;
> > >>>
> > >>> if ($format eq 'fasta') { $dformat = 'fasta' }
> > >>>
> > >>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
> > >>>                -format => $dformat,
> > >>>                -db => $db,
> > >>>            );
> > >>> my $seqio = $gb->get_Stream_by_acc(\@list);
> > >>>
> > >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> > >>>                -format => $format,
> > >>>            );
> > >>> while (my $seqo = $seqio->next_seq ) {
> > >>>    print $seqo->id, "\n";
> > >>>    $seqout->write_seq($seqo);
> > >>> }
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher
> > >> Lab of Dr. Robert Switzer
> > >> Dept of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



From cjfields at uiuc.edu  Wed Jan 30 15:04:18 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 14:04:18 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
Message-ID: <0BA39C27-1871-441B-B2DE-F7FECF8570D7@uiuc.edu>

Sounds like a bug in the GenBank parser.  Could you post a bug report  
with an example sequence record and your script?

http://bugzilla.open-bio.org/

chris

On Jan 30, 2008, at 1:30 PM, snoze pa wrote:

> Hi Hilmar,
>
> After spending lots of time i figure out the error. I am able to load
> sequences if the sequences do not have following entry
>
> xrefs (non-sequence databases):
>
> If the Genbank sequence have this entry then script  
> load_seqdatabase.pl is
> crashing. I try it in couple of sequences and found it is the  
> culprit line
> genbank format.  But this line is important as it contain lots of
> information... so I am wondering how to solve this problem
>
> Any help?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Wed Jan 30 15:42:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 14:42:14 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: <29768205-F511-4EDB-84D2-BCC36DBA92C7@uiuc.edu>

When using Bio::DB::EUtilities (from bioperl-live) this works for me:

use Bio::DB::EUtilities;

# get array of IDs somehow, in @ids

my ($start, $chunk, $last) = (0, 100, $#ids);

my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
                      -db => 'protein',
                      -rettype => 'genbank');

my $ct = 1; # used to denote separate files
my $tries = 0; # server attempts

while ($start < $last) {
     # want seqs in chunk size of 100 (set above)
     my $end = ($start + $chunk - 1 ) < $last ? ($start + $chunk -  
1) : $last;
     # grab slice of IDs
     my @sub = @ids[$start..$end];

     # pass to agent
     $factory->set_parameters(-id => \@sub );

     eval {
         # check server response, if good send to file
         $factory->get_Response(-file => ">seqs_$ct.gb");
     };

     # ERROR!
     if ($@) {
         $tries++;
         if ($tries <= 10) {
             warn("Server problem on attempt $tries:$@.\nTrying  
again...");
             redo;
         } else {
             die("Repeated server issues after $tries attempts.");
             # could warn and just skip this batch of accs using 'next'
         }
     }

     $start = $end+1;
     $ct++;
     $tries = 0;
}



chris

On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote:

> Thank you both!
>
> Just in case it might be usefull for someone else, here are my  
> ramblings:
>
> 1. I first tried to adapt my script and fetch 500 sequences at a  
> time. It works, except that ~40% of the time NCBI gives the  
> following error and my script crashed:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>    The proxy server received an invalid
>    response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
>
> I tried to modify the script so that when the retrieval of a 500  
> sequence block crashes, it continues with the other blocks, but I  
> was unsuccessfull. It probably needs some better understanding of  
> BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
>
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
>
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 	
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
>
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
>
>
> 2. So I moved to the second solution and tried batchentrez. I cut my  
> 120,000 long AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
>
> and then submitted the 13 lists one by one. I must say that I don't  
> really like using a web-interface to fetch data, and here the most  
> ennoying part is that you end up with a regular Entrez/GenBank  
> webpage: select your format, export to file, chosse file name... and  
> have to do it many times.
> It is too much prone to human and web-browser errors for my taste,  
> but it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to  
> restart it
> - you can't submit several lists in the same time (otherwise the  
> same cookie will be used and you'll end up with several identical  
> files)
>
> -Tristan
>
> On Tuesday 29 January 2008 13:44:16 you wrote:
>> Forgot about that one; it's definitely a better way to do it if you
>> have the GI/accessions.
>>
>> chris
>>
>> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
>>> you don't need to use bioperl to accomplish this task, to download
>>> several thousand sequences based on accession ID list.
>>>
>>> NCBI batch Entrez can do that:
>>> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>>>
>>> just submit a large list of IDs, select database, and download.
>>>
>>> you can submit ~50,000 IDs in one file usually without problems.
>>> it may not return results if a list is larger than ~100,000 IDs
>>>
>>> --
>>> Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 Health Sciences Drive
>>> Genome Center, 4-th floor, room 4302
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>> Chris Fields wrote:
>>>> Yes, you can only retrieve ~500 sequences at a time using either
>>>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
>>>> interact with NCBI's EUtilities (the former module returns raw data
>>>> from the URL to be processed later, the latter module returns
>>>> Bio::Seq/Bio::SeqIO objects).
>>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
>>>> atasets You can usually post more IDs using epost and fetch  
>>>> sequence
>>>> referring to the WebEnv/key combo (batch posting).  I try to make
>>>> this a bit easier with EUtilities but it is woefully lacking in
>>>> documentation (my fault), but there is some code up on the wiki
>>>> which should work.
>>>> chris
>>>>
>>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to download a large number of sequences from GenBank
>>>>> (122,146 to be exact) following a list of accession numbers.
>>>>> I first investigated around Bio::DB::EUtilities, but got lost and
>>>>> finally used Bio::DB::GenBank.
>>>>> My script works well for short request, but it gives the following
>>>>> error with the long request:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: WebDBSeqI Request Error:
>>>>> 500 short write
>>>>> Content-Type: text/plain
>>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>>>> Client-Warning: Internal response
>>>>>
>>>>> 500 short write
>>>>>
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ 
>>>>> Root/
>>>>> Root.pm:359
>>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
>>>>> Bio/DB/WebDBSeqI.pm:685
>>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
>>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
>>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>>>> STACK: ./fetch_from_genbank.pl:58
>>>>> ---------------------------------------------------------
>>>>>
>>>>> Does that mean that we can only fetch 500 sequences at a time?
>>>>> Should I split my list in 500 ids framents and submit them one
>>>>> after the other?
>>>>>
>>>>> Any suggestions very welcomed...
>>>>> Thanks,
>>>>> -Tristan
>>>>>
>>>>>
>>>>> Here is the script:
>>>>>
>>>>> ##################################
>>>>> use strict;
>>>>> use warnings;
>>>>> use Bio::DB::GenBank;
>>>>> # use Bio::DB::EUtilities;
>>>>> use Bio::SeqIO;
>>>>> use Getopt::Long;
>>>>>
>>>>> # 2008-01-22 T Lefebure
>>>>> # I tried to use Bio::DB::EUtilities without much succes and get
>>>>> back to Bio::DB::GenBank.
>>>>> # The following procedure is not really good as the stream is
>>>>> first copied to a temporary file,
>>>>> # and than re-used by BioPerl to generate the final file.
>>>>>
>>>>> my $db = 'nucleotide';
>>>>> my $format = 'genbank';
>>>>> my $help= '';
>>>>> my $dformat = 'gb';
>>>>>
>>>>> GetOptions(
>>>>>   'help|?' => \$help,
>>>>>   'format=s'  => \$format,
>>>>>   'database=s'    => \$db,
>>>>> );
>>>>>
>>>>>
>>>>> my $printhelp = "\nUsage: $0 [options]   
>>>>> 
>>>>>
>>>>> Will download the corresponding data from GenBank. BioPerl is
>>>>> required.
>>>>>
>>>>> Options:
>>>>>   -h
>>>>>       print this help
>>>>>   -format: genbank|fasta|...
>>>>>       give output format (default=genbank)
>>>>>   -database: nucleotide|genome|protein|...
>>>>>       define the database to search in (default=nucleotide)
>>>>>
>>>>> The full description of the options can be find at
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ 
>>>>> efetchseq_help.html
>>>>> \n";
>>>>>
>>>>> if ($#ARGV<1) {
>>>>>   print $printhelp;
>>>>>   exit;
>>>>> }
>>>>>
>>>>> open LIST, $ARGV[0];
>>>>> my @list = ;
>>>>>
>>>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>>>
>>>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>>>               -format => $dformat,
>>>>>               -db => $db,
>>>>>           );
>>>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>>>
>>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>>>               -format => $format,
>>>>>           );
>>>>> while (my $seqo = $seqio->next_seq ) {
>>>>>   print $seqo->id, "\n";
>>>>>   $seqout->write_seq($seqo);
>>>>> }
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From georg.otto at tuebingen.mpg.de  Thu Jan 31 04:34:31 2008
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Thu, 31 Jan 2008 10:34:31 +0100
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: 

Hi,

I succeeded with a similar task using the seqhound database. I had a
list of > 200,000 gid numbers, but I guess it can work in a similar
fashion using accession numbers. Here is the script:

#!/usr/perl

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::Query::GenBank;
use Bio::DB::SeqHound;

my $sh = new Bio::DB::SeqHound();

my($USAGE) = "$0 id_file\n\n";

unless(@ARGV) {
	print $USAGE;
	exit;
}

my $id_file = $ARGV[0];

open ID_FILE, "<$id_file" or die "error: $!";

while () {
  chomp;
  my $id = $_;
  if (defined(my $seq_obj = $sh->get_Seq_by_gi($id))) {
    my $out = Bio::SeqIO->new(-format => 'fasta');
    $out->write_seq($seq_obj);
  } else {
    next;
  }
}


Best,

Georg


Tristan Lefebure  writes:

> Hello,
>
> I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. 
> My script works well for short request, but it gives the following error with the long request:
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after the other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first copied to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
> 	'help|?' => \$help,
> 	'format=s'  => \$format,
> 	'database=s'	=> \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is required.
>
> Options:
> 	-h
> 		print this help
> 	-format: genbank|fasta|...
> 		give output format (default=genbank)
> 	-database: nucleotide|genome|protein|...
> 		define the database to search in (default=nucleotide)
>
> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";
>
> if ($#ARGV<1) {
> 	print $printhelp;
> 	exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 				-format => $dformat,
> 				-db => $db,
> 			);
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> 				-format => $format,
> 			);
> while (my $seqo = $seqio->next_seq ) {
> 	print $seqo->id, "\n";
> 	$seqout->write_seq($seqo);
> }



From bernd.web at gmail.com  Thu Jan 31 05:48:15 2008
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 31 Jan 2008 11:48:15 +0100
Subject: [Bioperl-l] searchio/blast
Message-ID: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>

Hi,

I noticed that the HTMLWriter output for a BLAST report may not be
correct if more than one sequence was "blasted".

After the BLAST report of the first sequence the report is ended with:
Search Parameters
Parameter	Value

Search Statistics
Statistic	Value

Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on
Thu Jan 31 11:35:51 2008
Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37 tseemann Exp $

Then the second HTML blast report follows.
Although maybe generally 1 sequence is blasted by a user requiring
HTML output, this may be nice to fix?
Also for the HTML Writer of FastA reports the statistics section is empty,

An additional issue with HTMLWriter  containing more than 1 BLAST
report is the following:
When a sequence ID occurs more than once, the link (on the E-value) is
to the first occurrence since it is not report specific.

In case the above is regarded as unwanted, I'd be happy to make a
concise example with code.


Best regards,
Bernd


From cjfields at uiuc.edu  Thu Jan 31 07:39:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 31 Jan 2008 06:39:46 -0600
Subject: [Bioperl-l] searchio/blast
In-Reply-To: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>
References: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>
Message-ID: 

The easiest way to take care of these (so we don't forget about them  
and can track changes) is to add them as BioPerl bugs/enhancement  
requests to bugzilla, along with example reports and code.

chris

On Jan 31, 2008, at 4:48 AM, Bernd Web wrote:

> Hi,
>
> I noticed that the HTMLWriter output for a BLAST report may not be
> correct if more than one sequence was "blasted".
>
> After the BLAST report of the first sequence the report is ended with:
> Search Parameters
> Parameter	Value
>
> Search Statistics
> Statistic	Value
>
> Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on
> Thu Jan 31 11:35:51 2008
> Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37  
> tseemann Exp $
>
> Then the second HTML blast report follows.
> Although maybe generally 1 sequence is blasted by a user requiring
> HTML output, this may be nice to fix?
> Also for the HTML Writer of FastA reports the statistics section is  
> empty,
>
> An additional issue with HTMLWriter  containing more than 1 BLAST
> report is the following:
> When a sequence ID occurs more than once, the link (on the E-value) is
> to the first occurrence since it is not report specific.
>
> In case the above is regarded as unwanted, I'd be happy to make a
> concise example with code.
>
>
> Best regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From hlapp at gmx.net  Thu Jan 31 08:12:25 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 31 Jan 2008 08:12:25 -0500
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
Message-ID: 


On Jan 30, 2008, at 2:30 PM, snoze pa wrote:

> Hi Hilmar,
>
>  After spending lots of time i figure out the error. I am able to load
> sequences if the sequences do not have following entry
>
> xrefs (non-sequence databases):

Is this the literal value? I am asking because I can't find this in  
the file at

http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb

which you said was giving you grief. So does the genbank file above  
now load, or how can I identify the critical line in there?

	-hilmar
>
> If the Genbank sequence have this entry then script  
> load_seqdatabase.pl is
> crashing. I try it in couple of sequences and found it is the  
> culprit line
> genbank format.  But this line is important as it contain lots of
> information... so I am wondering how to solve this problem
>
> Any help?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From snoze.pa at gmail.com  Thu Jan 31 13:46:24 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Thu, 31 Jan 2008 12:46:24 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: 
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
Message-ID: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>

The link i sent was related to my tutorial. I was following that website.
The typical example is one of the following which have *xrefs (non-sequence
databases): line.
thanks
s
*
LOCUS       P27912                   792 aa            linear   VRL
15-JAN-2008
DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
            protein); prM; Peptide pr; Small envelope protein M (Matrix
            protein); Envelope protein E; Non-structural protein 1 (NS1)].
ACCESSION   P27912
VERSION     P27912.1  GI:130422
DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
            class: standard.
            created: Aug 1, 1992.
            sequence updated: Aug 1, 1992.
            annotation updated: Jan 15, 2008.
            xrefs: D00502.1, BAA00394.1, B32401
            *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
            GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
            InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
            InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:2.60.98.10,
            Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
Pfam:PF00869,
            Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
KEYWORDS    Capsid protein; Cleavage on pair of basic residues; Endoplasmic
            reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
            Transmembrane; Viral nucleoprotein; Virion.
SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
  ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
            Viruses; ssRNA positive-strand viruses, no DNA stage;
Flaviviridae;
            Flavivirus; Dengue virus group.
REFERENCE   1  (residues 1 to 792)
  AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
  TITLE     Genetic relatedness among structural protein genes of dengue 1
            virus strains
  JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
   PUBMED   2738579
  REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
            [FUNCTION] Protein C packages viral RNA to form a viral
            nucleocapsid, and promotes virion budding (By similarity).
            [FUNCTION] prM acts as a chaperone for envelope protein E during
            intracellular virion assembly by masking and inactivating
envelope
            protein E fusion peptide. prM is matured in the last step of
virion
            assembly, presumably to avoid catastrophic activation of the
viral
            fusion peptide induced by the acidic pH of the trans-Golgi
network.
            After cleavage by host furin, the pr peptide is released in the
            extracellular medium and small envelope protein M and envelope
            protein E homodimers are dissociated (By similarity).
            [FUNCTION] Envelope protein E binds cell surface receptor and is
            involved in membrane fusion between virion and target cell.
            Synthesized as an homodimer with prM which acts as a chaperone
for
            envelope protein E. After cleavage of prM, envelope protein E
            dissociate from small envelope protein M and homodimerizes (By
            similarity).
            [FUNCTION] Non-structural protein 1 is slowly secreted from
            mammalian cells, but not from mosquito cells. Secreted form
elicits
            protective immune response and plays an essential role in RNA
            replication. Soluble and membrane-associated NS1 may activate
human
            complement and induce host vascular leakage. This effect might
            explain the clinical manifestations of dengue hemorrhagic fever
and
            dengue shock syndrome (By similarity).
            [SUBUNIT] prM and envelope protein E form heterodimers in the
            endoplasmic reticulum and Golgi. Envelope protein E forms
            homodimers. NS1 forms homodimers as well as homohexamers when
            secreted. NS1 may interact with NS4A (By similarity).
            [SUBCELLULAR LOCATION] Note=The virion is assembled in the
            endoplasmic reticulum lumen, transported by vesicles to the
Golgi,
            then transported again to the cell membrane where it is released
            outside the cell.
            [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
            [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
            [SUBCELLULAR LOCATION] Small envelope protein M: Virion
membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
            Endoplasmic reticulum membrane; Peripheral membrane protein;
            Lumenal side (By similarity).
            [DOMAIN] Transmembrane domains of the small envelope protein M
and
            envelope protein E contains an endoplasmic reticulum retention
            signals (By similarity).
            [PTM] Specific enzymatic cleavages in vivo yield mature
proteins.
            The nascent protein C contains a C-terminal hydrophobic domain
that
            act as a signal sequence for translocation of prM into the lumen
of
            the ER. Mature protein C is cleaved at a site upstream of this
            hydrophobic domain by NS3. prM is cleaved in post-Golgi vesicles
by
            a host furin, releasing the mature small envelope protein M, and
            peptide pr (By similarity).
            [PTM] Envelope protein E and non-structural protein 1 are
            N-glycosylated (By similarity).
FEATURES             Location/Qualifiers
     source          1..792
                     /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
                     /specific_host="Aedes aegypti (Yellowfever mosquito)"
                     /specific_host="Homo sapiens (Human)"
                     /db_xref="taxon:11057"
     Protein         1..>792
                     /product="Genome polyprotein [Contains: Protein C"
     Region          1..101
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          1..100
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Protein C. /FTId=PRO_0000037884."
     Region          5..114
                     /region_name="Flavi_capsid"
                     /note="Flavivirus capsid protein C. Flaviviruses are
small
                     enveloped viruses with virions comprised of 3 proteins
                     called C, M and E. Multiple copies of the C protein
form
                     the nucleocapsid, which contains the ssRNA molecule;
                     pfam01003"
                     /db_xref="CDD:85176"
     Site            100..101
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by serine protease NS3 (By
similarity)."
     Region          101..114
                     /region_name="Propeptide"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="ER anchor for the protein C, removed in mature
form
                     by serine protease NS3. /FTId=PRO_0000037885."
     Region          102..122
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            114..115
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          115..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="prM. /FTId=PRO_0000264649."
     Region          115..205
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Peptide pr. /FTId=PRO_0000264650."
     Region          119..204
                     /region_name="Flavi_propep"
                     /note="Flavivirus polyprotein propeptide. The
flaviviruses
                     are small enveloped animal viruses containing a single
                     positive strand genomic RNA. The genome encodes one
large
                     ORF a polyprotein which undergos proteolytic processing
                     into mature viral peptide chains; pfam01570"
                     /db_xref="CDD:65376"
     Region          123..238
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            183
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Site            205..206
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host furin (By similarity)."
     Region          206..280
                     /region_name="Flavi_M"
                     /note="Flavivirus envelope glycoprotein M. Flaviviruses
                     are small enveloped viruses with virions comprised of 3
                     proteins called C, M and E. The envelope glycoprotein M
is
                     made as a precursor, called prM; pfam01004"
                     /db_xref="CDD:85177"
     Region          206..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Small envelope protein M. /FTId=PRO_0000037886."
     Region          239..259
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          260..265
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          266..286
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            280..281
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          281..775
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Envelope protein E. /FTId=PRO_0000037887."
     Region          281..576
                     /region_name="Flavi_glycoprot"
                     /note="Flavivirus glycoprotein, central and
dimerisation
                     domains; pfam00869"
                     /db_xref="CDD:85082"
     Bond            bond(283,310)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          287..725
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Bond            bond(340,401)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            347
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(354,385)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Bond            bond(372,396)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            433
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(465,565)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          578..673
                     /region_name="Flavi_glycop_C"
                     /note="Flavivirus glycoprotein, immunoglobulin-like
                     domain; pfam02832"
                     /db_xref="CDD:66513"
     Bond            bond(582,613)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          726..746
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          747..752
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          753..773
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          774..>792
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            775..776
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          776..>792
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Non-structural protein 1. /FTId=PRO_0000037888."
ORIGIN
        1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf vaflrflaip
       61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp talafhlttr
      121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm teaepddvdc
      181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega wkqiqkvetw
      241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd fveglsgatw
      301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt dsrcptqgea
      361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv qyenlkysvi
      421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg ldfnrvvllt
      481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev vvlgsqegam
      541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek evaetqhgtv
      601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae ppfgesyivv
      661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft svgklihqif
      721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg vmvqadsgcv
      781 inwkgkelkc gs
//


On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:

>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>


From hlapp at gmx.net  Thu Jan 31 15:10:35 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 31 Jan 2008 15:10:35 -0500
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
	<10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
Message-ID: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>

I see. Note that the sequence below is really a UniProt sequence,  
that has been reformatted into GenBank format, and hence aren't in  
your typical genbank sequence format (which usually lacks DBSOURCE,  
for example). (The joys of data integration.)

If you load the same sequence from UniProt, does it still fail to  
parse or to load?

Also, does it or does this not mean that sequences at the link you  
sent load w/o error? I.e., can I close that issue report, or is there  
a bug in bioperl-db?

	-hilmar

On Jan 31, 2008, at 1:46 PM, snoze pa wrote:

> The link i sent was related to my tutorial. I was following that  
> website. The typical example is one of the following which have  
> xrefs (non-sequence databases): line.
> thanks
> s
>
> LOCUS       P27912                   792 aa            linear   VRL  
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein)  
> (Capsid
>             protein); prM; Peptide pr; Small envelope protein M  
> (Matrix
>             protein); Envelope protein E; Non-structural protein 1  
> (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             xrefs (non-sequence databases): HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069,  
> InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA: 
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,  
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;  
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane;  
> Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;  
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of  
> dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein  
> E during
>             intracellular virion assembly by masking and  
> inactivating envelope
>             protein E fusion peptide. prM is matured in the last  
> step of virion
>             assembly, presumably to avoid catastrophic activation  
> of the viral
>             fusion peptide induced by the acidic pH of the trans- 
> Golgi network.
>             After cleavage by host furin, the pr peptide is  
> released in the
>             extracellular medium and small envelope protein M and  
> envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface  
> receptor and is
>             involved in membrane fusion between virion and target  
> cell.
>             Synthesized as an homodimer with prM which acts as a  
> chaperone for
>             envelope protein E. After cleavage of prM, envelope  
> protein E
>             dissociate from small envelope protein M and  
> homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted  
> from
>             mammalian cells, but not from mosquito cells. Secreted  
> form elicits
>             protective immune response and plays an essential role  
> in RNA
>             replication. Soluble and membrane-associated NS1 may  
> activate human
>             complement and induce host vascular leakage. This  
> effect might
>             explain the clinical manifestations of dengue  
> hemorrhagic fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers  
> in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as  
> homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to  
> the Golgi,
>             then transported again to the cell membrane where it is  
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By  
> similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane  
> protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope  
> protein M and
>             envelope protein E contains an endoplasmic reticulum  
> retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature  
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic  
> domain that
>             act as a signal sequence for translocation of prM into  
> the lumen of
>             the ER. Mature protein C is cleaved at a site upstream  
> of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi  
> vesicles by
>             a host furin, releasing the mature small envelope  
> protein M, and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF  
> 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever  
> mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains:  
> Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C.  
> Flaviviruses are small
>                      enveloped viruses with virions comprised of 3  
> proteins
>                      called C, M and E. Multiple copies of the C  
> protein form
>                      the nucleocapsid, which contains the ssRNA  
> molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By  
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in  
> mature form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The  
> flaviviruses
>                      are small enveloped animal viruses containing  
> a single
>                      positive strand genomic RNA. The genome  
> encodes one large
>                      ORF a polyprotein which undergos proteolytic  
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.  
> Flaviviruses
>                      are small enveloped viruses with virions  
> comprised of 3
>                      proteins called C, M and E. The envelope  
> glycoprotein M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Small envelope protein M. / 
> FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and  
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin- 
> like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Non-structural protein 1. / 
> FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf  
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp  
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm  
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega  
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd  
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt  
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv  
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg  
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev  
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek  
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae  
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft  
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg  
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:
>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to  
> load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From snoze.pa at gmail.com  Thu Jan 31 15:21:18 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Thu, 31 Jan 2008 14:21:18 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
	<10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
	<3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>
Message-ID: <10f848910801311221q2a9f0d02x6c4600048f05adab@mail.gmail.com>

Thanks Hilmar,

 I also thought that they are translated into genbank format. My problem is
i have downloaded tons of sequences from NCBI in gb format. In my flat
file,  i have many sequences in this format so I am unable to load them into
local database using  load_seqdatabase.pl script. So far i am full of
warnings and errors. Any solution to this problem? otherwise i will try to
write some code to load all sequences into local data base. But it seems to
be easy to modify the parsing code so that we can load these sequences.


>format (which usually lacks DBSOURCE, for example

I think if the three dimensional structure of the protein is known then in
ncbi gb format the DBSOURCE is common. I agree with you, the joys of
integration.

The link was related to tutorial i was using.. u can off it.

Thanks for looking into matter..
 s

On Jan 31, 2008 2:10 PM, Hilmar Lapp  wrote:

> I see. Note that the sequence below is really a UniProt sequence, that has
> been reformatted into GenBank format, and hence aren't in your typical
> genbank sequence format (which usually lacks DBSOURCE, for example). (The
> joys of data integration.)
> If you load the same sequence from UniProt, does it still fail to parse or
> to load?
>
> Also, does it or does this not mean that sequences at the link you sent
> load w/o error? I.e., can I close that issue report, or is there a bug in
> bioperl-db?
>
> -hilmar
>
> On Jan 31, 2008, at 1:46 PM, snoze pa wrote:
>
> The link i sent was related to my tutorial. I was following that website.
> The typical example is one of the following which have *xrefs
> (non-sequence databases): line.
> thanks
> s
> *
> LOCUS       P27912                   792 aa            linear   VRL
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
>             protein); prM; Peptide pr; Small envelope protein M (Matrix
>             protein); Envelope protein E; Non-structural protein 1 (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein E
> during
>             intracellular virion assembly by masking and inactivating
> envelope
>             protein E fusion peptide. prM is matured in the last step of
> virion
>             assembly, presumably to avoid catastrophic activation of the
> viral
>             fusion peptide induced by the acidic pH of the trans-Golgi
> network.
>             After cleavage by host furin, the pr peptide is released in
> the
>             extracellular medium and small envelope protein M and envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface receptor and
> is
>             involved in membrane fusion between virion and target cell.
>             Synthesized as an homodimer with prM which acts as a chaperone
> for
>             envelope protein E. After cleavage of prM, envelope protein E
>             dissociate from small envelope protein M and homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted from
>             mammalian cells, but not from mosquito cells. Secreted form
> elicits
>             protective immune response and plays an essential role in RNA
>             replication. Soluble and membrane-associated NS1 may activate
> human
>             complement and induce host vascular leakage. This effect might
>             explain the clinical manifestations of dengue hemorrhagic
> fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to the
> Golgi,
>             then transported again to the cell membrane where it is
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope protein M
> and
>             envelope protein E contains an endoplasmic reticulum retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic domain
> that
>             act as a signal sequence for translocation of prM into the
> lumen of
>             the ER. Mature protein C is cleaved at a site upstream of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi
> vesicles by
>             a host furin, releasing the mature small envelope protein M,
> and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains: Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C. Flaviviruses are
> small
>                      enveloped viruses with virions comprised of 3
> proteins
>                      called C, M and E. Multiple copies of the C protein
> form
>                      the nucleocapsid, which contains the ssRNA molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in mature
> form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The
> flaviviruses
>                      are small enveloped animal viruses containing a
> single
>                      positive strand genomic RNA. The genome encodes one
> large
>                      ORF a polyprotein which undergos proteolytic
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.
> Flaviviruses
>                      are small enveloped viruses with virions comprised of
> 3
>                      proteins called C, M and E. The envelope glycoprotein
> M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Small envelope protein M.
> /FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin-like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Non-structural protein 1.
> /FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:
>
> >
> > On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
> >
> > > Hi Hilmar,
> > >
> > >  After spending lots of time i figure out the error. I am able to load
> > > sequences if the sequences do not have following entry
> > >
> > > xrefs (non-sequence databases):
> >
> > Is this the literal value? I am asking because I can't find this in
> > the file at
> >
> > http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
> >
> > which you said was giving you grief. So does the genbank file above
> > now load, or how can I identify the critical line in there?
> >
> >        -hilmar
> > >
> > > If the Genbank sequence have this entry then script
> > > load_seqdatabase.pl is
> > > crashing. I try it in couple of sequences and found it is the
> > > culprit line
> > > genbank format.  But this line is important as it contain lots of
> > > information... so I am wondering how to solve this problem
> > >
> > > Any help?
> > >
> > > Thanks in advance
> > > s
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>


From Laurence.Amilhat at toulouse.inra.fr  Thu Jan  3 09:29:09 2008
From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat)
Date: Thu, 03 Jan 2008 15:29:09 +0100
Subject: [Bioperl-l] BioPerl and NHX tree
Message-ID: <477CF135.9060104@toulouse.inra.fr>

Dear all,

I am trying to convert a newick tree into an NHX tree, so I can add the 
taxid tag for each leaf.

I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
The idea is
1) to read the newick tree
2) get the leaf, and get the corresponding taxid for it
3) add the nhx species tag
4) write the nhx tree

I was able to do the first 2 steps, and I could create an object 
node_nhx and add the tag T,
but I don't know how to write an nhx Tree with the node_nhx previously 
created...

Does anyone have an idea? any help are welcome.

Thanks,

laurence.


Here are my code and the samples files for better understanding:
newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt

_newick2nhx.pl:_
use strict;
use Bio::TreeIO;
use Bio::Tree::NodeNHX;
use Getopt::Long;


my $tree_file;
my $outfile;
my $codefile;
my %corresp;

GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
=>\$codefile);

open (CODE, "< $codefile");
while ()
{
    chomp;
    my($a, $b)=split (/\t/);
    $corresp{$a}=$b;
}


my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file");
my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");

while (my $tree= $treeio->next_tree)
{
    my @nodes=$tree->get_nodes();
    foreach my $nd(@nodes)
    {
        if ($nd->is_Leaf())
        {
            my $id=$nd->id();
            print "$id TAXID ",$corresp{$id},"\n";
           
            my $nodenhx=new Bio::Tree::NodeNHX();
            $nodenhx->nhx_tag({T=>$corresp{$id}});
        }
    }
    $treeout->write_tree($tree);
}


_test_tree.nwk_:
(((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
(42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,AAEL015662:100.0):100.0,
42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
42558941:100.0);

_seq_taxid.txt:_
AAEL015662      7159
42558969        9606
42558981        10090
42558942        9606
42558970        6239
42558929        10116
42558987        9606
42558930        10116
42558943        9606
148887393       10090
42558958        10090
42558941        9606
56405380        10090
90185247        9606
66774197        6239


_And the tata resulting file:_
(((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.0[&&NHX],AAEL01566
2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);




-- 
====================================================================
= Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan     	   = 
= Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
====================================================================





From aaron.j.mackey at gsk.com  Thu Jan  3 10:12:22 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Thu, 3 Jan 2008 10:12:22 -0500
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: <477CF135.9060104@toulouse.inra.fr>
Message-ID: 

Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that 
way, your tree's nodes are already NodeNHX's.  Instead of creating a new 
$nodenhx, you can use the $node variable directly from the tree ...

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM:

> Dear all,
> 
> I am trying to convert a newick tree into an NHX tree, so I can add the 
> taxid tag for each leaf.
> 
> I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
> The idea is
> 1) to read the newick tree
> 2) get the leaf, and get the corresponding taxid for it
> 3) add the nhx species tag
> 4) write the nhx tree
> 
> I was able to do the first 2 steps, and I could create an object 
> node_nhx and add the tag T,
> but I don't know how to write an nhx Tree with the node_nhx previously 
> created...
> 
> Does anyone have an idea? any help are welcome.
> 
> Thanks,
> 
> laurence.
> 
> 
> Here are my code and the samples files for better understanding:
> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt
> 
> _newick2nhx.pl:_
> use strict;
> use Bio::TreeIO;
> use Bio::Tree::NodeNHX;
> use Getopt::Long;
> 
> 
> my $tree_file;
> my $outfile;
> my $codefile;
> my %corresp;
> 
> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
> =>\$codefile);
> 
> open (CODE, "< $codefile");
> while ()
> {
>     chomp;
>     my($a, $b)=split (/\t/);
>     $corresp{$a}=$b;
> }
> 
> 
> my $treeio = new Bio::TreeIO (-format => 'newick', -file => 
"$tree_file");
> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
> 
> while (my $tree= $treeio->next_tree)
> {
>     my @nodes=$tree->get_nodes();
>     foreach my $nd(@nodes)
>     {
>         if ($nd->is_Leaf())
>         {
>             my $id=$nd->id();
>             print "$id TAXID ",$corresp{$id},"\n";
> 
>             my $nodenhx=new Bio::Tree::NodeNHX();
>             $nodenhx->nhx_tag({T=>$corresp{$id}});
>         }
>     }
>     $treeout->write_tree($tree);
> }
> 
> 
> _test_tree.nwk_:
> 
(((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
> 
42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,
> AAEL015662:100.0):100.0,
> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
> 42558941:100.0);
> 
> _seq_taxid.txt:_
> AAEL015662      7159
> 42558969        9606
> 42558981        10090
> 42558942        9606
> 42558970        6239
> 42558929        10116
> 42558987        9606
> 42558930        10116
> 42558943        9606
> 148887393       10090
> 42558958        10090
> 42558941        9606
> 56405380        10090
> 90185247        9606
> 66774197        6239
> 
> 
> _And the tata resulting file:_
> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,
> 
(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,
> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.
> 0[&&NHX],AAEL01566
> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],
> 
(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);
> 
> 
> 
> 
> -- 
> ====================================================================
> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan           = 
> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
> ====================================================================
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From Laurence.Amilhat at toulouse.inra.fr  Fri Jan  4 03:33:22 2008
From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat)
Date: Fri, 04 Jan 2008 09:33:22 +0100
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: 
References: 
Message-ID: <477DEF52.20802@toulouse.inra.fr>

Thank you Aaron,

it's working now. I've changed to species instead of taxid, so I can 
color the species on my tree using the ATV viewer.
thanks again,

Regards,

Laurence.



aaron.j.mackey at gsk.com a ?crit :
> Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that 
> way, your tree's nodes are already NodeNHX's.  Instead of creating a new 
> $nodenhx, you can use the $node variable directly from the tree ...
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM:
>
>   
>> Dear all,
>>
>> I am trying to convert a newick tree into an NHX tree, so I can add the 
>> taxid tag for each leaf.
>>
>> I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
>> The idea is
>> 1) to read the newick tree
>> 2) get the leaf, and get the corresponding taxid for it
>> 3) add the nhx species tag
>> 4) write the nhx tree
>>
>> I was able to do the first 2 steps, and I could create an object 
>> node_nhx and add the tag T,
>> but I don't know how to write an nhx Tree with the node_nhx previously 
>> created...
>>
>> Does anyone have an idea? any help are welcome.
>>
>> Thanks,
>>
>> laurence.
>>
>>
>> Here are my code and the samples files for better understanding:
>> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt
>>
>> _newick2nhx.pl:_
>> use strict;
>> use Bio::TreeIO;
>> use Bio::Tree::NodeNHX;
>> use Getopt::Long;
>>
>>
>> my $tree_file;
>> my $outfile;
>> my $codefile;
>> my %corresp;
>>
>> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
>> =>\$codefile);
>>
>> open (CODE, "< $codefile");
>> while ()
>> {
>>     chomp;
>>     my($a, $b)=split (/\t/);
>>     $corresp{$a}=$b;
>> }
>>
>>
>> my $treeio = new Bio::TreeIO (-format => 'newick', -file => 
>>     
> "$tree_file");
>   
>> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
>>
>> while (my $tree= $treeio->next_tree)
>> {
>>     my @nodes=$tree->get_nodes();
>>     foreach my $nd(@nodes)
>>     {
>>         if ($nd->is_Leaf())
>>         {
>>             my $id=$nd->id();
>>             print "$id TAXID ",$corresp{$id},"\n";
>>
>>             my $nodenhx=new Bio::Tree::NodeNHX();
>>             $nodenhx->nhx_tag({T=>$corresp{$id}});
>>         }
>>     }
>>     $treeout->write_tree($tree);
>> }
>>
>>
>> _test_tree.nwk_:
>>
>>     
> (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
>   
> 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
>   
>> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,
>> AAEL015662:100.0):100.0,
>> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
>> 42558941:100.0);
>>
>> _seq_taxid.txt:_
>> AAEL015662      7159
>> 42558969        9606
>> 42558981        10090
>> 42558942        9606
>> 42558970        6239
>> 42558929        10116
>> 42558987        9606
>> 42558930        10116
>> 42558943        9606
>> 148887393       10090
>> 42558958        10090
>> 42558941        9606
>> 56405380        10090
>> 90185247        9606
>> 66774197        6239
>>
>>
>> _And the tata resulting file:_
>> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,
>>
>>     
> (42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
>   
>> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,
>> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.
>> 0[&&NHX],AAEL01566
>> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],
>>
>>     
> (42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);
>   
>>
>>
>> -- 
>> ====================================================================
>> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan           = 
>> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
>> ====================================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
>   


-- 
====================================================================
= Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan     	   = 
= Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
====================================================================





From hlapp at gmx.net  Sun Jan  6 22:02:32 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 6 Jan 2008 22:02:32 -0500
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
In-Reply-To: 
References: 
Message-ID: <640890C9-2D34-4C70-9179-26A9EAB397D2@gmx.net>

Hi Zhihua, you didn't ever respond to Marc's link to the Persistent  
Bioperl slides - did that help?

	-hilmar

On Dec 6, 2007, at 11:25 PM, zhihuali wrote:

>
> Hi netters,
>
> I've installed BioSQL and bioperl-db, and successfully created and  
> stored a persistent object:
>
> use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
> my $dbadp=Bio::DB::BioDB->new(- 
> database=>'biosql',                             - 
> user=>'annoymous',                             -dbname=>'bioseqdb');
>
> my $seqobj=Bio::Seq->new(- 
> accession_number=>"test",                      - 
> id=>"test1",                      - 
> seq=>"AGCTAGCT",                      -version=>1);my $dbobj=$dbadp- 
> >create_persistent($seqobj);$dbobj->create;$dbobj->commit;
>
> It's successful because I found corresponding rows in the bioseqdb  
> tables.
>
> Now I want to retrieve the object back from the database. There's  
> not much documents available and I've tried find_by_unique_key/ 
> primary_key but all failed. Maybe I didn't use them correctly.  
> Could anyone give me an example as how to retrieve the stored  
> Bio::Seq object?
>
> Thanks a lot!
>
> Zhihua Li
> _________________________________________________________________
> ? Live Search ???????
> http://www.live.com/?searchOnly=true
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








From cain.cshl at gmail.com  Mon Jan  7 12:24:02 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 12:24:02 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
Message-ID: <1199726642.6374.10.camel@frissell>

Hello,

I was trying to get bioperl-live this morning from either cvs or svn and
failed.  I was wondering if something was going on with the server.

Here are the things I tried:

  cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co bioperl-live

which resulted in this:

cvs checkout: warning: cannot write to history file /home/repository/bioperl/CVSROOT/history: Permission denied
cvs checkout: Updating bioperl-live
cvs checkout: failed to create lock directory for `/home/repository/bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/#cvs.lock): Permission denied
cvs checkout: failed to obtain dir lock in repository `/home/repository/bioperl/bioperl-live'
cvs [checkout aborted]: read lock failed - giving up

Then I thought I'd try the suggested svn checkout method from the
bioperl wiki:

  svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live

which resulted in

svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live'

Finally, I after looking at the openbio server, I thought I'd try this:

   svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/bioperl/bioperl-live

which resulted in repeated requests for my password (which I supplied
correctly at least once out of the several requests).

So, what's up?

Thanks much,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From hlapp at gmx.net  Mon Jan  7 12:36:02 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 7 Jan 2008 12:36:02 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <1199726642.6374.10.camel@frissell>
References: <1199726642.6374.10.camel@frissell>
Message-ID: 

I think we are still migrating to svn. It's probably better to wait  
for the announcement that everything is ready to go. (And then cvs  
won't work anymore except for anonymous checkout - which should  
actually continue to work while this is in progress. Have you tried  
that?)

	-hilmar

On Jan 7, 2008, at 12:24 PM, Scott Cain wrote:

> Hello,
>
> I was trying to get bioperl-live this morning from either cvs or  
> svn and
> failed.  I was wondering if something was going on with the server.
>
> Here are the things I tried:
>
>   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> bioperl-live
>
> which resulted in this:
>
> cvs checkout: warning: cannot write to history file /home/ 
> repository/bioperl/CVSROOT/history: Permission denied
> cvs checkout: Updating bioperl-live
> cvs checkout: failed to create lock directory for `/home/repository/ 
> bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> #cvs.lock): Permission denied
> cvs checkout: failed to obtain dir lock in repository `/home/ 
> repository/bioperl/bioperl-live'
> cvs [checkout aborted]: read lock failed - giving up
>
> Then I thought I'd try the suggested svn checkout method from the
> bioperl wiki:
>
>   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> bioperl-live
>
> which resulted in
>
> svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> hartzell/bioperl/bioperl-live'
>
> Finally, I after looking at the openbio server, I thought I'd try  
> this:
>
>    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> bioperl/bioperl-live
>
> which resulted in repeated requests for my password (which I supplied
> correctly at least once out of the several requests).
>
> So, what's up?
>
> Thanks much,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From jason at bioperl.org  Mon Jan  7 12:43:18 2008
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 7 Jan 2008 09:43:18 -0800
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <1199726642.6374.10.camel@frissell>
References: <1199726642.6374.10.camel@frissell>
Message-ID: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>

CVS r/w is locked because we are transitioning to SVN - you can still  
checkout via anonymous CVS on code.open-bio.org.

The SVN is going to be in /home/svn-repositories/bioperl not George's  
directory, but we are still monkeying around with the directory  
structure.  You can try a checkout but be warned it may change a few  
more times if we add another directory layer in there.

You will get requests for your password at least three times - I  
strongly suggest you use SSH keys to avoid getting prompted each time  
- I don't know why you get asked 3 times as it is a SVN thing I  
assume it is having to make 3 separate requests to do a checkout.

That's what is up for now.  We'll report when the final SVN migration  
is done.

-jason
On Jan 7, 2008, at 9:24 AM, Scott Cain wrote:

> Hello,
>
> I was trying to get bioperl-live this morning from either cvs or  
> svn and
> failed.  I was wondering if something was going on with the server.
>
> Here are the things I tried:
>
>   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> bioperl-live
>
> which resulted in this:
>
> cvs checkout: warning: cannot write to history file /home/ 
> repository/bioperl/CVSROOT/history: Permission denied
> cvs checkout: Updating bioperl-live
> cvs checkout: failed to create lock directory for `/home/repository/ 
> bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> #cvs.lock): Permission denied
> cvs checkout: failed to obtain dir lock in repository `/home/ 
> repository/bioperl/bioperl-live'
> cvs [checkout aborted]: read lock failed - giving up
>
> Then I thought I'd try the suggested svn checkout method from the
> bioperl wiki:
>
>   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> bioperl-live
>
> which resulted in
>
> svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> hartzell/bioperl/bioperl-live'
>
> Finally, I after looking at the openbio server, I thought I'd try  
> this:
>
>    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> bioperl/bioperl-live
>
> which resulted in repeated requests for my password (which I supplied
> correctly at least once out of the several requests).
>
> So, what's up?
>
> Thanks much,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> ______________________________________________



From cain.cshl at gmail.com  Mon Jan  7 12:57:38 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 12:57:38 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>
References: <1199726642.6374.10.camel@frissell>
	<5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>
Message-ID: <1199728658.6374.12.camel@frissell>

Hi Hilmar and Jason,

Thanks--for some reason, I thought svn was done.  I'll remain anonymous
for right now (Kind of difficult to do when you announce it publicly :-)

Thanks,
Scott

On Mon, 2008-01-07 at 09:43 -0800, Jason Stajich wrote:
> CVS r/w is locked because we are transitioning to SVN - you can still  
> checkout via anonymous CVS on code.open-bio.org.
> 
> The SVN is going to be in /home/svn-repositories/bioperl not George's  
> directory, but we are still monkeying around with the directory  
> structure.  You can try a checkout but be warned it may change a few  
> more times if we add another directory layer in there.
> 
> You will get requests for your password at least three times - I  
> strongly suggest you use SSH keys to avoid getting prompted each time  
> - I don't know why you get asked 3 times as it is a SVN thing I  
> assume it is having to make 3 separate requests to do a checkout.
> 
> That's what is up for now.  We'll report when the final SVN migration  
> is done.
> 
> -jason
> On Jan 7, 2008, at 9:24 AM, Scott Cain wrote:
> 
> > Hello,
> >
> > I was trying to get bioperl-live this morning from either cvs or  
> > svn and
> > failed.  I was wondering if something was going on with the server.
> >
> > Here are the things I tried:
> >
> >   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> > bioperl-live
> >
> > which resulted in this:
> >
> > cvs checkout: warning: cannot write to history file /home/ 
> > repository/bioperl/CVSROOT/history: Permission denied
> > cvs checkout: Updating bioperl-live
> > cvs checkout: failed to create lock directory for `/home/repository/ 
> > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> > #cvs.lock): Permission denied
> > cvs checkout: failed to obtain dir lock in repository `/home/ 
> > repository/bioperl/bioperl-live'
> > cvs [checkout aborted]: read lock failed - giving up
> >
> > Then I thought I'd try the suggested svn checkout method from the
> > bioperl wiki:
> >
> >   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> > bioperl-live
> >
> > which resulted in
> >
> > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> > hartzell/bioperl/bioperl-live'
> >
> > Finally, I after looking at the openbio server, I thought I'd try  
> > this:
> >
> >    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> > bioperl/bioperl-live
> >
> > which resulted in repeated requests for my password (which I supplied
> > correctly at least once out of the several requests).
> >
> > So, what's up?
> >
> > Thanks much,
> > Scott
> >
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                    
> > cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> > ______________________________________________
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From cain.cshl at gmail.com  Mon Jan  7 13:34:25 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 13:34:25 -0500
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
Message-ID: <1199730865.6374.18.camel@frissell>

Hello,

I was wanting to implement this myself (and probably still will,
assuming it's not already there...) but I am not a Module::Build guru.
Here's what I'd like to do: add a parameter that I can add when evoking
perl Build.PL so that the default answers will be used when it would
normally ask me a question while running perl Build.PL, something like
this:

  perl Build.PL --yes

Is this sort of thing already built into Module::Build and I can't see
it?  Or can somebody suggest the best way of going about this?

Thanks much,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From cjfields at uiuc.edu  Mon Jan  7 17:22:35 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 7 Jan 2008 16:22:35 -0600
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
In-Reply-To: <1199730865.6374.18.camel@frissell>
References: <1199730865.6374.18.camel@frissell>
Message-ID: <31AD254B-DABA-488D-BDA8-D690F949CC39@uiuc.edu>

I agree it would be nice.  Not sure how hard it would be to implement;  
maybe it would be best to have a mode of installation, say if one  
wanted 'minimal' (no optional module installation, no scripts),  
'full', 'dev', (assume minimal install but don't test), and so on,  
falling back to the query-based approach if nothing is indicated.

chris

On Jan 7, 2008, at 12:34 PM, Scott Cain wrote:

> Hello,
>
> I was wanting to implement this myself (and probably still will,
> assuming it's not already there...) but I am not a Module::Build guru.
> Here's what I'd like to do: add a parameter that I can add when  
> evoking
> perl Build.PL so that the default answers will be used when it would
> normally ask me a question while running perl Build.PL, something like
> this:
>
>  perl Build.PL --yes
>
> Is this sort of thing already built into Module::Build and I can't see
> it?  Or can somebody suggest the best way of going about this?
>
> Thanks much,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From bix at sendu.me.uk  Mon Jan  7 17:37:36 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 07 Jan 2008 22:37:36 +0000
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
In-Reply-To: <1199730865.6374.18.camel@frissell>
References: <1199730865.6374.18.camel@frissell>
Message-ID: <4782A9B0.60203@sendu.me.uk>

Scott Cain wrote:
> Hello,
> 
> I was wanting to implement this myself (and probably still will,
> assuming it's not already there...) but I am not a Module::Build guru.
> Here's what I'd like to do: add a parameter that I can add when evoking
> perl Build.PL so that the default answers will be used when it would
> normally ask me a question while running perl Build.PL, something like
> this:
> 
>   perl Build.PL --yes
> 
> Is this sort of thing already built into Module::Build and I can't see
> it?  Or can somebody suggest the best way of going about this?

You should ask on the Module::Build mailing list. If it already exists I 
don't think it is obvious, however.

If your question is BioPerl related, and you're looking for a fast way 
of installing BioPerl without the annoying questions, I'm sure I could 
hack something into ModuleBuildBioperl.pm


From cain.cshl at gmail.com  Mon Jan  7 22:04:19 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 22:04:19 -0500
Subject: [Bioperl-l] Automatically accepting defaults for `perl	Build.PL`
In-Reply-To: <4782A9B0.60203@sendu.me.uk>
References: <1199730865.6374.18.camel@frissell> <4782A9B0.60203@sendu.me.uk>
Message-ID: <1199761459.6017.1.camel@frissell>

Hi Sendu,

I just hacked something up (I only needed to change a few lines--once I
figured out where everything was).  I like Chris' idea though; before I
commit it back (Ha, no rush there), I'll flesh it out a little more to
give more options.

Scott

On Mon, 2008-01-07 at 22:37 +0000, Sendu Bala wrote:
> Scott Cain wrote:
> > Hello,
> > 
> > I was wanting to implement this myself (and probably still will,
> > assuming it's not already there...) but I am not a Module::Build guru.
> > Here's what I'd like to do: add a parameter that I can add when evoking
> > perl Build.PL so that the default answers will be used when it would
> > normally ask me a question while running perl Build.PL, something like
> > this:
> > 
> >   perl Build.PL --yes
> > 
> > Is this sort of thing already built into Module::Build and I can't see
> > it?  Or can somebody suggest the best way of going about this?
> 
> You should ask on the Module::Build mailing list. If it already exists I 
> don't think it is obvious, however.
> 
> If your question is BioPerl related, and you're looking for a fast way 
> of installing BioPerl without the annoying questions, I'm sure I could 
> hack something into ModuleBuildBioperl.pm
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From granjeau at tagc.univ-mrs.fr  Wed Jan  9 03:30:17 2008
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Wed, 09 Jan 2008 09:30:17 +0100
Subject: [Bioperl-l] Parsing SwissProt annotation in comment
Message-ID: <47848619.40109@tagc.univ-mrs.fr>

Hello,

I would like to retrieve the human reviewed annotation of SwissProt 
entries; these information are in the comment section of the sequence 
file. Here is an example:

CC   -!- FUNCTION: Actins are highly conserved proteins that are involved
CC       in various types of cell motility and are ubiquitously expressed
CC       in all eukaryotic cells.
CC   -!- SUBUNIT: Polymerization of globular actin (G-actin) leads to a
CC       structural filament (F-actin) in the form of a two-stranded helix.
CC       Each actin can bind to 4 others. Found in a complex with XPO6,
CC       Ran, ACTB and PFN1. Component of a complex composed at least of
CC       ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with XPO6.
CC   -!- INTERACTION:
CC       Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668;
CC       P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161;
CC   -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton.

Is there a specific method to do such a job?

Thanks much,
Samuel

-- 

Samuel GRANJEAUD                   granjeau at tagc.univ-mrs.fr
INSERM - ICIM - TAGC               Tel: +33  (0)491 82 87 24
http://tagc.univ-mrs.fr            Fax: +33  (0)491 82 87 01
http://icim.marseille.inserm.fr/proteomique



From robfsouza at gmail.com  Wed Jan  9 08:20:08 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Wed, 9 Jan 2008 11:20:08 -0200
Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs
Message-ID: 

Hello All!

Greetings for everybody and happy new year for those following an
western calendary!

I'm starting a new project to store and analyze distinct sets of
sequence annotation data which are related in a way suitable for
representation in a directed (e.g. transcript splicing) or undirected
(e.g. gene product interaction) graph. Analysis will require frequent
queries based on interval overlaps, feature neighbourhood, annotation
and, most importantly, feature relationships and stored paths.

At first, I thought of build an entire new database structure to store
project specific data (e.g. alternative splicing or protein interaction),
but as I have some experience with Lincon's
Bio::DB::SeqFeature::Store, I'm now considering extending it for the
purpose of storing graphs describing relationships among features.

I'm aware that some other bioperl related databases, specifically
BioSQL and Chado, do have  components which might be suitable for
storing all or some of these data but, since Lincon's feature storage
and interval binning implementations in
Bio::DB::SeqFeature::Store::mysql are both clean, simple and very fast,
perhaps extending it in a seemingly modular way is desirable. A good
extension to Lincon's database could include tables like
feature_relationship and feature_path, for edges and transitive
closures (just like in BioSQL) and feature_stored_path, for exclusion
of biologically irrelevant paths in DAGs, like certain splicing
isoforms. These tables could be used  to store sequence assemblies or
EST alignments efficiently, including scaffolds inferred by connecting
contigs.

Before starting, I would like to know if the BioSQL and Chado schemata
do have accelerators for quering intervals among billions of features
and feature relatioships (some examples using these databases would
also help, if they that these databases are efficient for such tasks).
If these or other databases are not as suitable as Bio::DB::SeqFeature
for feature retrieval based on interval overlap and attributes,  then
again I might consider extending Bio::DB::seqFeature
and contributing such extensions back to bioperl...

Any thoughts?

Best regards,
Robson

PS: sorry if anyone gets two copies of this post, but took me some
time to realize my new e-mail wasn't subscribed to bioperl-l...


From bix at sendu.me.uk  Wed Jan  9 08:59:08 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 09 Jan 2008 13:59:08 +0000
Subject: [Bioperl-l] bioperl based database infrastucture for directed
 graphs
In-Reply-To: 
References: 
Message-ID: <4784D32C.9070807@sendu.me.uk>

Robson Francisco de Souza wrote:
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,

I'm using Bio::DB::SeqFeature for that purpose, but just a warning: I 
found that with millions of features it made a db that was too large in 
terms of disc space and too slow in terms of query time. I had to hack 
out its storage of feature objects in the db, instead generating feature 
objects on request from the stored attributes. Doing this turned out to 
be faster than simply unfreezing certain kinds of feature objects!

(I also had to hack in support for retrieval by source, a patch that 
Lincoln hasn't gotten back to me about yet.)

While I can't answer your main questions, I wish you good luck with your 
project and request that you keep us posted with what you achieve.


From bosborne11 at verizon.net  Wed Jan  9 09:46:42 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 09 Jan 2008 09:46:42 -0500
Subject: [Bioperl-l] Parsing SwissProt annotation in comment
In-Reply-To: <47848619.40109@tagc.univ-mrs.fr>
References: <47848619.40109@tagc.univ-mrs.fr>
Message-ID: <3DAEDA67-B9A5-47A4-8108-0915659F1052@verizon.net>

Samuel,

The Feature-Annotation HOWTO addresses this specifically:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation


Brian O.


On Jan 9, 2008, at 3:30 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello,
>
> I would like to retrieve the human reviewed annotation of SwissProt  
> entries; these information are in the comment section of the  
> sequence file. Here is an example:
>
> CC   -!- FUNCTION: Actins are highly conserved proteins that are  
> involved
> CC       in various types of cell motility and are ubiquitously  
> expressed
> CC       in all eukaryotic cells.
> CC   -!- SUBUNIT: Polymerization of globular actin (G-actin) leads  
> to a
> CC       structural filament (F-actin) in the form of a two-stranded  
> helix.
> CC       Each actin can bind to 4 others. Found in a complex with  
> XPO6,
> CC       Ran, ACTB and PFN1. Component of a complex composed at  
> least of
> CC       ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with  
> XPO6.
> CC   -!- INTERACTION:
> CC       Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668;
> CC       P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161;
> CC   -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton.
>
> Is there a specific method to do such a job?
>
> Thanks much,
> Samuel
>
> -- 
>
> Samuel GRANJEAUD                   granjeau at tagc.univ-mrs.fr
> INSERM - ICIM - TAGC               Tel: +33  (0)491 82 87 24
> http://tagc.univ-mrs.fr            Fax: +33  (0)491 82 87 01
> http://icim.marseille.inserm.fr/proteomique
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From alexanderptok at web.de  Wed Jan  9 10:34:56 2008
From: alexanderptok at web.de (Alexander Ptok)
Date: Wed, 09 Jan 2008 16:34:56 +0100
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN]
Message-ID: <2011210591@web.de>

Hi,

I am a beginner to BioPerl and working through the Beginners HOWTO

Version of BioPerl is 1.4-1 running on Debian etch

In the Howto everything worked fine until the section

Retrieving multiple sequences from a database

from where i copied the following script:

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
 
$query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
$query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  -query => $query );
 
$gb_obj = Bio::DB::GenBank->new;
 
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
 
while ($seq_obj = $stream_obj->next_seq) {    
    # do something with the sequence object    
    print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}


If i cut the 0:3000[SLEN] query it works and returns a lot of sequences, when i alter the query to e.g. 1830[SLEN] it
finds the one sequence that has the length 1830, but i was not able to query a range of lengths.

Please, does anyone know what i am doing wrong.
Greetings
A. Ptok
_________________________________________________________________________
In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! 
Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114



From cjm at fruitfly.org  Wed Jan  9 11:52:21 2008
From: cjm at fruitfly.org (Chris Mungall)
Date: Wed, 9 Jan 2008 08:52:21 -0800
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: 
References: 
Message-ID: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>

[cc-d to gmod-schema]

Chado does have some views and pg functions for interval-based  
retrieval. AFAIK there are no accelerators for deep feature graphs,  
as most chado users have relatively shallow gene-model/SO feature  
graphs. It may not be so hard to extend cvterm code for doing this,  
depending on the characteristics of your graphs (the closure of  
feature neighbourhood graphs may be particularly large)

On Jan 9, 2008, at 5:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From cjfields at uiuc.edu  Wed Jan  9 10:00:38 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 Jan 2008 09:00:38 -0600
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: <4784D32C.9070807@sendu.me.uk>
References: 
	<4784D32C.9070807@sendu.me.uk>
Message-ID: 


On Jan 9, 2008, at 7:59 AM, Sendu Bala wrote:

> Robson Francisco de Souza wrote:
>> Before starting, I would like to know if the BioSQL and Chado  
>> schemata
>> do have accelerators for quering intervals among billions of features
>> and feature relatioships (some examples using these databases would
>> also help, if they that these databases are efficient for such  
>> tasks).
>> If these or other databases are not as suitable as  
>> Bio::DB::SeqFeature
>> for feature retrieval based on interval overlap and attributes,
>
> I'm using Bio::DB::SeqFeature for that purpose, but just a warning:  
> I found that with millions of features it made a db that was too  
> large in terms of disc space and too slow in terms of query time. I  
> had to hack out its storage of feature objects in the db, instead  
> generating feature objects on request from the stored attributes.  
> Doing this turned out to be faster than simply unfreezing certain  
> kinds of feature objects!

Would this be Bio::SF::Annotated objects? If so I bet Storable is  
storing the OntologyStore object information along with the SF (which  
argues for refactoring the FeatureIO/Bio::SF::Annotated stuff in 1.7).

Not sure what can be done about that beyond your hack, though it might  
be worth exploring whether one can optionally set the DB::Store to  
store the object instance.

> (I also had to hack in support for retrieval by source, a patch that  
> Lincoln hasn't gotten back to me about yet.)
>
> While I can't answer your main questions, I wish you good luck with  
> your project and request that you keep us posted with what you  
> achieve.

You can always try Lincoln on the GBrowse list as well.  I would say  
go ahead and commit the patch if it isn't a big deal.

chris


From cjfields at uiuc.edu  Wed Jan  9 13:12:55 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 Jan 2008 12:12:55 -0600
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: 
References: 
Message-ID: <128517E8-3A2A-45DD-83A0-0014863A25BC@uiuc.edu>

cc'ing the gbrowse list in case Lincoln hasn't seen this.

I believe the primary intent for Bio::DB::SeqFeature::Store was as a  
more GFF3-compatible replacement for Bio::DB::GFF (unlimited feature  
nesting, uses any SeqFeatureI, etc) and was streamlined for faster  
lookups by GBrowse.  I don't think adding tables would affect  
performance dramatically, though maybe Lincoln would have a better idea.

chris

On Jan 9, 2008, at 7:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From bosborne11 at verizon.net  Wed Jan  9 13:29:15 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 09 Jan 2008 13:29:15 -0500
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths
	0:3000[SLEN]
In-Reply-To: <2011210591@web.de>
References: <2011210591@web.de>
Message-ID: <0EB96131-7931-4FC3-802F-A8152B474A99@verizon.net>

Alexander,

I don't understand. By using the clause "0:3000[SLEN] " you are  
querying for sequences in the length range of 0 to 3000.


Brian O.


On Jan 9, 2008, at 10:34 AM, Alexander Ptok wrote:

> If i cut the 0:3000[SLEN] query it works and returns a lot of  
> sequences, when i alter the query to e.g. 1830[SLEN] it
> finds the one sequence that has the length 1830, but i was not able  
> to query a range of lengths.



From stefan.kirov at bms.com  Wed Jan  9 14:54:07 2008
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 09 Jan 2008 14:54:07 -0500
Subject: [Bioperl-l] pairwise_kaks.PLS: verbose rquired by PAML
Message-ID: <4785265F.6020500@bms.com>

Jason,
Even this last fix I still had problems with bp_pairwise_kaks.pl. It
turns out, verbose needs to be set on by default for codeml in order for
the sequences to appear in mlc file.\
That being said, we need instead of:
    $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
        (-verbose => $verbose,
         -params => { 'runmode' => -2,
                      'seqtype' => 1,
                  }
         );
this:

    $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
        (-verbose => $verbose,
         -params => { 'runmode' => -2,
                      'seqtype' => 1,
                      'verbose' => 1,
                  }
         );

verbose can 2 as well.... Just got this clarification from Ziheng. He
also offers to change the output so it becomes easier for us. I plan to
ask him to put the sequence in the mlc header by default.
Stefan



From robfsouza at gmail.com  Wed Jan  9 19:28:25 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Wed, 9 Jan 2008 22:28:25 -0200
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
References: 
	<199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
Message-ID: 

Hi,

2008/1/9, Chris Mungall :
> [cc-d to gmod-schema]
>
> Chado does have some views and pg functions for interval-based
> retrieval. AFAIK there are no accelerators for deep feature graphs,
> as most chado users have relatively shallow gene-model/SO feature
> graphs. It may not be so hard to extend cvterm code for doing this,
> depending on the characteristics of your graphs (the closure of
> feature neighbourhood graphs may be particularly large)

Great! I'm studing Chado and I will have a look at the interval optimizations.
Did any of you compared BioSQL and Chado for huge feature and feature
graph storage/retrieval efficiency? As Sendu pointed to limitations in
Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms
(or maybe another one?) would be best suited for these tasks... for
the moment, I will either extend Sendu's hack of Lincon's modules or
adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to
Chado, if it turns out to be more efficient than the pg functions.

Best,
Robson

PS: I could not find the most recent version of gmod by following the
Download link to gmod(Chado) from GMOD's site to the Sourceforge
download page. Did I miss the right link on the download site or is
this unexpected? Is the version available at IUBio's mirror (0.003-10)
the most recent one?


From cain.cshl at gmail.com  Wed Jan  9 22:15:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 09 Jan 2008 22:15:29 -0500
Subject: [Bioperl-l] bioperl based database infrastucture for
	directed	graphs
In-Reply-To: 
References: 
	<199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
	
Message-ID: <1199934929.6229.44.camel@frissell>

Hi Robson,

I seem to be perennially working on the 1.0 release of Chado.  The
schema itself is quite stable but I'm always working on the tools to
make them handle more cases and be as stable as possible.  For the time
being, you need to get Chado from cvs; see 

  http://www.gmod.org/wiki/index.php/Chado_-_Getting_Started#Chado_From_CVS

I removed the 0.003 release from the SourceForge site because the schema
in it is out of date relative to what we've been working on for the last
year.

Scott

On Wed, 2008-01-09 at 22:28 -0200, Robson Francisco de Souza wrote:
> Hi,
> 
> 2008/1/9, Chris Mungall :
> > [cc-d to gmod-schema]
> >
> > Chado does have some views and pg functions for interval-based
> > retrieval. AFAIK there are no accelerators for deep feature graphs,
> > as most chado users have relatively shallow gene-model/SO feature
> > graphs. It may not be so hard to extend cvterm code for doing this,
> > depending on the characteristics of your graphs (the closure of
> > feature neighbourhood graphs may be particularly large)
> 
> Great! I'm studing Chado and I will have a look at the interval optimizations.
> Did any of you compared BioSQL and Chado for huge feature and feature
> graph storage/retrieval efficiency? As Sendu pointed to limitations in
> Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms
> (or maybe another one?) would be best suited for these tasks... for
> the moment, I will either extend Sendu's hack of Lincon's modules or
> adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to
> Chado, if it turns out to be more efficient than the pg functions.
> 
> Best,
> Robson
> 
> PS: I could not find the most recent version of gmod by following the
> Download link to gmod(Chado) from GMOD's site to the Sourceforge
> download page. Did I miss the right link on the download site or is
> this unexpected? Is the version available at IUBio's mirror (0.003-10)
> the most recent one?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From bosborne11 at verizon.net  Thu Jan 10 09:16:16 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 10 Jan 2008 09:16:16 -0500
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths
	0:3000[SLEN]
In-Reply-To: <2013325230@web.de>
References: <2013325230@web.de>
Message-ID: <932550FF-8414-4B3E-92BB-1895FD9658AE@verizon.net>

Alexander,

OK, that is odd (meaning, this did work a while back but it's not  
clear to me what could have changed).

First thing to do, upgrade to Bioperl version 1.52. Can you do this?  
Version 1.4 is very old and you could run into other problems using it.


Brian O.



On Jan 10, 2008, at 8:54 AM, Alexander Ptok wrote:

> Hallo Brian,
>
> thanks for your answer. The principle is clear, but it doesn't work
> like it should, on my computer. So maybe i should repeat what i did
> step by step.
>
> 1. i took the following script:
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
> $query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  - 
> query => $query );
>
> $gb_obj = Bio::DB::GenBank->new;
>
> $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
> while ($seq_obj = $stream_obj->next_seq) {
>    # do something with the sequence object
>    print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
> }
>
> and then on the terminal
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script1.pl
> sv1494 at r04102:~/Desktop/bioperl$
>
> 2. i took out the 0:3000[SLEN]:
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]";
>
> and then on the terminal
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script2.pl
> NM_128760       2775
> NM_125788       2874
> NM_124913       3068
> NM_124912       3117
> NM_124775       871
> NM_120360       1655
> NM_111862       2199
> NM_001036386    2734
> NM_119270       3996
> NM_105072       1656
> NM_113294       4824
> NM_180431       1673
> NM_120495       2515
> NM_120493       2050
> NM_112156       1089
> .
> .
> and a lot more of hits, and one can clearly see, there are some with  
> a lenght between 0 and 3000
>
> 3. to have a look at the [SLEN] i tried another script with e.g.  
> 2199[SLEN]
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 2199[SLEN]";
>
> on the terminal:
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script3.pl
> NM_111862       2199
> sv1494 at r04102:~/Desktop/bioperl$
>
>
>
> It think everthing works fine, except that bioperl or maybe the  
> genbank doesn't understand
> the range clause 0:3000, but in every documentation says i have to  
> do it that way. Did
> i misunterstand something or is it just a problem of my computer/ 
> bioperl installation?
> Maybe you can tell me if the script does what it is suppose to do on  
> your computer?
>
> Thanks and greetings
>
> Alexander Ptok
>>
>> Alexander,
>>
>> I don't understand. By using the clause "0:3000[SLEN] " you are
>> querying for sequences in the length range of 0 to 3000.
>>
>
>
> _______________________________________________________________________
> Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 30 Tage
> kostenlos testen. http://www.pc-sicherheit.web.de/startseite/? 
> mc=022220
>




From pmiguel at purdue.edu  Fri Jan 11 11:22:38 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 11:22:38 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
Message-ID: <478797CE.9050202@purdue.edu>

No problem getting sequence from genbank via a myriad of methods. But as 
the volume of non-finished sequence in genbank increases the importance 
of also obtaining quality values for a given sequence increases. Some 
records include quality values.

I typically use bp_fetch.pl to grab a sequence from genbank:

bp_fetch.pl -fmt fasta net::genbank:AC207960

sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't designed 
to pull down quals evidently:

bp_fetch.pl -fmt qual net::genbank:AC207960

gives:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual object 
to write_seq() as a parameter named "source"
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::qual::write_seq 
/usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205
STACK: /usr/local/perl/bin/bp_fetch.pl:313
-----------------------------------------------------------

(running under bioperl 1.5.2)

The quality values for this accession are in genbank as these URLs 
demonstrate:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual

What is the best way to pull down these qual values? They aren't present 
in "GenBank(Full)" format. They are present in an ASN.1 format.

Advice would be appreciated.

-- 
Phillip
Purdue Genomics Core Facility






From cjfields at uiuc.edu  Fri Jan 11 12:09:40 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 Jan 2008 11:09:40 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <478797CE.9050202@purdue.edu>
References: <478797CE.9050202@purdue.edu>
Message-ID: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>

I don't think this is possible with the current setup for  
Bio::DB::GenBank (which the script uses).  We'll have to investigate  
whether it is possible to retrieve this data via NCBI's eutils; if so  
we can try adding it in.  If you want you can submit this as an  
enhancement request via bugzilla for tracking:

http://bugzilla.open-bio.org/

chris

On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:

> No problem getting sequence from genbank via a myriad of methods.  
> But as the volume of non-finished sequence in genbank increases the  
> importance of also obtaining quality values for a given sequence  
> increases. Some records include quality values.
>
> I typically use bp_fetch.pl to grab a sequence from genbank:
>
> bp_fetch.pl -fmt fasta net::genbank:AC207960
>
> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't  
> designed to pull down quals evidently:
>
> bp_fetch.pl -fmt qual net::genbank:AC207960
>
> gives:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual  
> object to write_seq() as a parameter named "source"
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ 
> 5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::qual::write_seq /usr/local/perl_5.8/lib/site_perl/ 
> 5.8.8/Bio/SeqIO/qual.pm:205
> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> -----------------------------------------------------------
>
> (running under bioperl 1.5.2)
>
> The quality values for this accession are in genbank as these URLs  
> demonstrate:
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual
>
> What is the best way to pull down these qual values? They aren't  
> present in "GenBank(Full)" format. They are present in an ASN.1  
> format.
>
> Advice would be appreciated.
>
> -- 
> Phillip
> Purdue Genomics Core Facility
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From MEC at stowers-institute.org  Fri Jan 11 14:14:10 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 11 Jan 2008 13:14:10 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
Message-ID: 

Indeed eutil is capable of this

The following use of my ncbi_eutil (attached) script yeilds what you
want:

ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

It depends on the version of NCBI_PowerScripting.pm , such as is
included in 

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Friday, January 11, 2008 11:10 AM
> To: Phillip San Miguel
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> I don't think this is possible with the current setup for 
> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate whether it is possible to retrieve this data via 
> NCBI's eutils; if so we can try adding it in.  If you want 
> you can submit this as an enhancement request via bugzilla 
> for tracking:
> 
> http://bugzilla.open-bio.org/
> 
> chris
> 
> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> 
> > No problem getting sequence from genbank via a myriad of methods.  
> > But as the volume of non-finished sequence in genbank increases the 
> > importance of also obtaining quality values for a given sequence 
> > increases. Some records include quality values.
> >
> > I typically use bp_fetch.pl to grab a sequence from genbank:
> >
> > bp_fetch.pl -fmt fasta net::genbank:AC207960
> >
> > sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> > designed to pull down quals evidently:
> >
> > bp_fetch.pl -fmt qual net::genbank:AC207960
> >
> > gives:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> > object to write_seq() as a parameter named "source"
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> > 5.8.8/Bio/Root/Root.pm:359
> > STACK: Bio::SeqIO::qual::write_seq 
> /usr/local/perl_5.8/lib/site_perl/
> > 5.8.8/Bio/SeqIO/qual.pm:205
> > STACK: /usr/local/perl/bin/bp_fetch.pl:313
> > -----------------------------------------------------------
> >
> > (running under bioperl 1.5.2)
> >
> > The quality values for this accession are in genbank as these URLs
> > demonstrate:
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
> > 4937460&dopt=fasta
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
> > 4937460&dopt=qual
> >
> > What is the best way to pull down these qual values? They aren't 
> > present in "GenBank(Full)" format. They are present in an ASN.1 
> > format.
> >
> > Advice would be appreciated.
> >
> > --
> > Phillip
> > Purdue Genomics Core Facility
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From pmiguel at purdue.edu  Fri Jan 11 14:33:13 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 14:33:13 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: 
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
Message-ID: <4787C479.8070600@purdue.edu>

Hi Malcolm,
    Looks like your email was (inadvertantly?) redacted in some way. (No 
attachment and last sentence truncated.) Would it be possible to get a 
complete version so I can be sure I'm following you?
Thanks,
Phillip

Cook, Malcolm wrote:
> Indeed eutil is capable of this
>
> The following use of my ncbi_eutil (attached) script yeilds what you
> want:
>
> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
> AC207960.qual
>
> It depends on the version of NCBI_PowerScripting.pm , such as is
> included in 
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Chris Fields
>> Sent: Friday, January 11, 2008 11:10 AM
>> To: Phillip San Miguel
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>> files from Genbank?
>>
>> I don't think this is possible with the current setup for 
>> Bio::DB::GenBank (which the script uses).  We'll have to 
>> investigate whether it is possible to retrieve this data via 
>> NCBI's eutils; if so we can try adding it in.  If you want 
>> you can submit this as an enhancement request via bugzilla 
>> for tracking:
>>
>> http://bugzilla.open-bio.org/
>>
>> chris
>>
>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>>
>>     
>>> No problem getting sequence from genbank via a myriad of methods.  
>>> But as the volume of non-finished sequence in genbank increases the 
>>> importance of also obtaining quality values for a given sequence 
>>> increases. Some records include quality values.
>>>
>>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>>
>>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>>
>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>>> designed to pull down quals evidently:
>>>
>>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>>
>>> gives:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>>> object to write_seq() as a parameter named "source"
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
>>> 5.8.8/Bio/Root/Root.pm:359
>>> STACK: Bio::SeqIO::qual::write_seq 
>>>       
>> /usr/local/perl_5.8/lib/site_perl/
>>     
>>> 5.8.8/Bio/SeqIO/qual.pm:205
>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>>> -----------------------------------------------------------
>>>
>>> (running under bioperl 1.5.2)
>>>
>>> The quality values for this accession are in genbank as these URLs
>>> demonstrate:
>>>
>>>
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>>     
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
>>     
>>> 4937460&dopt=fasta
>>>
>>>
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
>>     
>>> 4937460&dopt=qual
>>>
>>> What is the best way to pull down these qual values? They aren't 
>>> present in "GenBank(Full)" format. They are present in an ASN.1 
>>> format.
>>>
>>> Advice would be appreciated.
>>>
>>> --
>>> Phillip
>>> Purdue Genomics Core Facility
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>   



From pmiguel at purdue.edu  Fri Jan 11 14:37:24 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 14:37:24 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
Message-ID: <4787C574.8020003@purdue.edu>

Hi Chris,
Thanks. I have submitted this as an enhancement request to bugzilla.
Phillip

Chris Fields wrote:
> I don't think this is possible with the current setup for 
> Bio::DB::GenBank (which the script uses).  We'll have to investigate 
> whether it is possible to retrieve this data via NCBI's eutils; if so 
> we can try adding it in.  If you want you can submit this as an 
> enhancement request via bugzilla for tracking:
>
> http://bugzilla.open-bio.org/
>
> chris
>
> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>
>> No problem getting sequence from genbank via a myriad of methods. But 
>> as the volume of non-finished sequence in genbank increases the 
>> importance of also obtaining quality values for a given sequence 
>> increases. Some records include quality values.
>>
>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>
>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>
>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>> designed to pull down quals evidently:
>>
>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>
>> gives:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>> object to write_seq() as a parameter named "source"
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::SeqIO::qual::write_seq 
>> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205
>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>> -----------------------------------------------------------
>>
>> (running under bioperl 1.5.2)
>>
>> The quality values for this accession are in genbank as these URLs 
>> demonstrate:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta 
>>
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual 
>>
>>
>> What is the best way to pull down these qual values? They aren't 
>> present in "GenBank(Full)" format. They are present in an ASN.1 format.
>>
>> Advice would be appreciated.
>>
>> -- 
>> Phillip
>> Purdue Genomics Core Facility
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>



From pmiguel at purdue.edu  Fri Jan 11 15:46:59 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 15:46:59 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: 
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
	<4787C479.8070600@purdue.edu>
	
Message-ID: <4787D5C3.1030308@purdue.edu>

Hi Malcolm,
Yes that works great!
Well, one caveat:
    If you download both the fasta and the qual files:
ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=fasta > 
AC207960.fasta
ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > 
AC207960.fasta.qual

The "primary IDs" don't match. The fasta comes out:
 >gi|154937460|gb|AC207960.1|

and the qual comes out:
 >AC207960.1

which seems to choke most programs that use seq and qual (eg 
cross_match) because they want the primary IDs of the seq and qual files 
to match.

Otherwise fine, though.
Thanks,
Phillip

Cook, Malcolm wrote:
> Phillip:
>
> Of course - mea culpa - here's the full monty....
>
> Indeed NCBI's eutils can do this:
>
>   
>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
>>     
> AC207960.qual
>
> which uses my script (attached) to wrap NCBI's eutils.
>
> It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
> by NCBI in their "Jul 24-27, 2007" course found at
> http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html
>
> I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
> very beginning so that trace messages are not printed on STDOUT, such as
> this echoed header:
> 	 Retrieving 1 records from nucleotide...
> ... and footer:
> 	Received records 1 - 1.
> 	Wrote data to -.
>
> (otherwise they are interspersed with downloaded qual files)
>
> It also depends on recent version of GetOpt::Long.
>
> Hope it helps.
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
>   
>> -----Original Message-----
>> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
>> Sent: Friday, January 11, 2008 1:33 PM
>> To: Cook, Malcolm
>> Cc: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>> files from Genbank?
>>
>> Hi Malcolm,
>>     Looks like your email was (inadvertantly?) redacted in 
>> some way. (No attachment and last sentence truncated.) Would 
>> it be possible to get a complete version so I can be sure I'm 
>> following you?
>> Thanks,
>> Phillip
>>
>> Cook, Malcolm wrote:
>>     
>>> Indeed eutil is capable of this
>>>
>>> The following use of my ncbi_eutil (attached) script yeilds what you
>>> want:
>>>
>>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
>>>       
>> rettype=qual > 
>>     
>>> AC207960.qual
>>>
>>> It depends on the version of NCBI_PowerScripting.pm , such as is 
>>> included in
>>>
>>> Malcolm Cook
>>> Database Applications Manager - Bioinformatics Stowers 
>>>       
>> Institute for 
>>     
>>> Medical Research - Kansas City, Missouri
>>>   
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
>>>> Fields
>>>> Sent: Friday, January 11, 2008 11:10 AM
>>>> To: Phillip San Miguel
>>>> Cc: bioperl-l
>>>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>>>>         
>> files from 
>>     
>>>> Genbank?
>>>>
>>>> I don't think this is possible with the current setup for 
>>>> Bio::DB::GenBank (which the script uses).  We'll have to 
>>>>         
>> investigate 
>>     
>>>> whether it is possible to retrieve this data via NCBI's 
>>>>         
>> eutils; if so 
>>     
>>>> we can try adding it in.  If you want you can submit this as an 
>>>> enhancement request via bugzilla for tracking:
>>>>
>>>> http://bugzilla.open-bio.org/
>>>>
>>>> chris
>>>>
>>>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>>>>
>>>>     
>>>>         
>>>>> No problem getting sequence from genbank via a myriad of 
>>>>>           
>> methods.  
>>     
>>>>> But as the volume of non-finished sequence in genbank 
>>>>>           
>> increases the 
>>     
>>>>> importance of also obtaining quality values for a given sequence 
>>>>> increases. Some records include quality values.
>>>>>
>>>>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>>>>
>>>>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>>>>
>>>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>>>>> designed to pull down quals evidently:
>>>>>
>>>>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>>>>
>>>>> gives:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>>>>> object to write_seq() as a parameter named "source"
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
>>>>> 5.8.8/Bio/Root/Root.pm:359
>>>>> STACK: Bio::SeqIO::qual::write_seq
>>>>>       
>>>>>           
>>>> /usr/local/perl_5.8/lib/site_perl/
>>>>     
>>>>         
>>>>> 5.8.8/Bio/SeqIO/qual.pm:205
>>>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>>>>> -----------------------------------------------------------
>>>>>
>>>>> (running under bioperl 1.5.2)
>>>>>
>>>>> The quality values for this accession are in genbank as these URLs
>>>>> demonstrate:
>>>>>
>>>>>
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
>>     
>>>> 0
>>>>     
>>>>         
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
>>     
>>>> 5
>>>>     
>>>>         
>>>>> 4937460&dopt=fasta
>>>>>
>>>>>
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
>>     
>>>> 5
>>>>     
>>>>         
>>>>> 4937460&dopt=qual
>>>>>
>>>>> What is the best way to pull down these qual values? They aren't 
>>>>> present in "GenBank(Full)" format. They are present in an ASN.1 
>>>>> format.
>>>>>
>>>>> Advice would be appreciated.
>>>>>
>>>>> --
>>>>> Phillip
>>>>> Purdue Genomics Core Facility
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>     
>>>>         
>>>   
>>>       
>>
>>     



From MEC at stowers-institute.org  Fri Jan 11 14:40:14 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 11 Jan 2008 13:40:14 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <4787C479.8070600@purdue.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
	<4787C479.8070600@purdue.edu>
Message-ID: 

Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
	 Retrieving 1 records from nucleotide...
... and footer:
	Received records 1 - 1.
	Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in 
> some way. (No attachment and last sentence truncated.) Would 
> it be possible to get a complete version so I can be sure I'm 
> following you?
> Thanks,
> Phillip
> 
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
> rettype=qual > 
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is 
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers 
> Institute for 
> > Medical Research - Kansas City, Missouri
> >   
> >
> >   
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from 
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for 
> >> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate 
> >> whether it is possible to retrieve this data via NCBI's 
> eutils; if so 
> >> we can try adding it in.  If you want you can submit this as an 
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>     
> >>> No problem getting sequence from genbank via a myriad of 
> methods.  
> >>> But as the volume of non-finished sequence in genbank 
> increases the 
> >>> importance of also obtaining quality values for a given sequence 
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>       
> >> /usr/local/perl_5.8/lib/site_perl/
> >>     
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>     
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't 
> >>> present in "GenBank(Full)" format. They are present in an ASN.1 
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>       
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >   
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ncbi_eutil
Type: application/octet-stream
Size: 1854 bytes
Desc: ncbi_eutil
URL: 

From cain.cshl at gmail.com  Mon Jan 14 13:46:39 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 14 Jan 2008 13:46:39 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
Message-ID: <1200336399.6056.12.camel@frissell>

Hi all,

Last month, I got a bug report on the GBrowse bug tracker:

  http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291

about a problem with dumping invalid GenBank files.  GBrowse uses
Bio::SeqIO::genbank to create these dumps.  

In his bug report, he claims that feature names over 15 characters long
are invalid, and provided and example GenBank file where a feature is
named 'BAC_cloned_genomic_insert', which is over 15 characters.  What I
want to know is this: is this truly a restriction on the GenBank format,
or is it a software problem with some other package?  Do we need to fix
genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
believe this is really a bug.

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From lstein at cshl.edu  Mon Jan 14 13:53:15 2008
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Jan 2008 13:53:15 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <1200336399.6056.12.camel@frissell>
References: <1200336399.6056.12.camel@frissell>
Message-ID: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>

Hi Scott,

He is correct about the limitation, but we deliberately relaxed it because
we were running into situations where we lost information during
roundtripping from other formats into genbank.

Lincoln

On Jan 14, 2008 1:46 PM, Scott Cain  wrote:

> Hi all,
>
> Last month, I got a bug report on the GBrowse bug tracker:
>
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
>
> about a problem with dumping invalid GenBank files.  GBrowse uses
> Bio::SeqIO::genbank to create these dumps.
>
> In his bug report, he claims that feature names over 15 characters long
> are invalid, and provided and example GenBank file where a feature is
> named 'BAC_cloned_genomic_insert', which is over 15 characters.  What I
> want to know is this: is this truly a restriction on the GenBank format,
> or is it a software problem with some other package?  Do we need to fix
> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> believe this is really a bug.
>
> Thanks,
> Scott
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Mon Jan 14 14:35:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 14 Jan 2008 13:35:46 -0600
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
Message-ID: 

It looks like the keys in the feature table run into the location  
string w/o intervening space, which would probably cause havoc with  
roundtripping from this output.  A few examples:

      BAC_cloned_genomic_insert<1..>1000
      combined_genscanjoin(<1..347,400..498,794..>1000)
      splign_na_dbEST_ncbi<1..>1000

I would think at least a space in between the location and the key  
would be required for round-tripping out of genbank format.

chris

On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:

> Hi Scott,
>
> He is correct about the limitation, but we deliberately relaxed it  
> because
> we were running into situations where we lost information during
> roundtripping from other formats into genbank.
>
> Lincoln
>
> On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
>
>> Hi all,
>>
>> Last month, I got a bug report on the GBrowse bug tracker:
>>
>>
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
>>
>> about a problem with dumping invalid GenBank files.  GBrowse uses
>> Bio::SeqIO::genbank to create these dumps.
>>
>> In his bug report, he claims that feature names over 15 characters  
>> long
>> are invalid, and provided and example GenBank file where a feature is
>> named 'BAC_cloned_genomic_insert', which is over 15 characters.   
>> What I
>> want to know is this: is this truly a restriction on the GenBank  
>> format,
>> or is it a software problem with some other package?  Do we need to  
>> fix
>> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
>> believe this is really a bug.
>>
>> Thanks,
>> Scott
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From lstein at cshl.edu  Mon Jan 14 14:46:20 2008
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Jan 2008 14:46:20 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: 
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
	
Message-ID: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>

That's a new bug. The version I worked on inserted a space after the name.

Lincoln

On Jan 14, 2008 2:35 PM, Chris Fields  wrote:

> It looks like the keys in the feature table run into the location
> string w/o intervening space, which would probably cause havoc with
> roundtripping from this output.  A few examples:
>
>      BAC_cloned_genomic_insert<1..>1000
>      combined_genscanjoin(<1..347,400..498,794..>1000)
>      splign_na_dbEST_ncbi<1..>1000
>
> I would think at least a space in between the location and the key
> would be required for round-tripping out of genbank format.
>
> chris
>
> On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:
>
> > Hi Scott,
> >
> > He is correct about the limitation, but we deliberately relaxed it
> > because
> > we were running into situations where we lost information during
> > roundtripping from other formats into genbank.
> >
> > Lincoln
> >
> > On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
> >
> >> Hi all,
> >>
> >> Last month, I got a bug report on the GBrowse bug tracker:
> >>
> >>
> >>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
> >>
> >> about a problem with dumping invalid GenBank files.  GBrowse uses
> >> Bio::SeqIO::genbank to create these dumps.
> >>
> >> In his bug report, he claims that feature names over 15 characters
> >> long
> >> are invalid, and provided and example GenBank file where a feature is
> >> named 'BAC_cloned_genomic_insert', which is over 15 characters.
> >> What I
> >> want to know is this: is this truly a restriction on the GenBank
> >> format,
> >> or is it a software problem with some other package?  Do we need to
> >> fix
> >> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> >> believe this is really a bug.
> >>
> >> Thanks,
> >> Scott
> >>
> >> --
> >>
> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.
> cain.cshl at gmail.com
> >> GMOD Coordinator (http://www.gmod.org/)
> >> 216-392-3087
> >> Cold Spring Harbor Laboratory
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From diogoat at gmail.com  Tue Jan 15 08:40:10 2008
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 Jan 2008 11:40:10 -0200
Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS
Message-ID: <638512560801150540m108db442r227d82c709a954@mail.gmail.com>

Hello,

I want to extract protein_id and transcript from a CDS tag, from genome in
genbak format but i have one problem, when the sequence in the file don't
have the protein_id or the transcript the script gives me this error:

------------- EXCEPTION  -------------
MSG: asking for tag value that does not exist protein_id
STACK Bio::SeqFeature::Generic::get_tag_values
/usr/share/perl5/Bio/SeqFeature/Generic.pm:504
STACK toplevel parser_cds.pl:25
--------------------------------------

Bellow I past the script

##############################################
use Bio::SeqIO;
use warnings;

my $infile = $ARGV[0];
my $outfile = "$infile.out";
open (OUT, ">>$outfile");

          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                      '-format' => 'Genbank');

         while (my $inseq = $seq_in->next_seq) {

        for my $feat_object ($inseq->get_SeqFeatures){
            if ($feat_object->primary_tag eq "CDS"){
                print OUT $feat_object->get_tag_values('protein_id')," ";
            print OUT $feat_object->get_tag_values('translation'),"\n";
        }
    }
}
###############################################

Somebody can helps me?

Thank

Diogo Tschoeke


From Marc.Logghe at ablynx.com  Tue Jan 15 09:44:54 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Tue, 15 Jan 2008 15:44:54 +0100
Subject: [Bioperl-l] Problem to extract protein_id and transcript from
	CDS
In-Reply-To: <638512560801150540m108db442r227d82c709a954@mail.gmail.com>
Message-ID: <03C512635899144083CADB0EE2220189013E2BEC@alpaca.lan.ablynx.com>

Hi,
Try testing for existence first using the has_tag() method.
It is provided by Bio::AnnotatableI.

print OUT $feat_object->get_tag_values('protein_id')," " if
($feat->has_tag('protein_id'));


HTH,
Marc

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Diogo Tschoeke
> Sent: dinsdag 15 januari 2008 14:40
> To: Bioperl-list
> Subject: [Bioperl-l] Problem to extract protein_id and transcript from
CDS
> 
> Hello,
> 
> I want to extract protein_id and transcript from a CDS tag, from
genome in
> genbak format but i have one problem, when the sequence in the file
don't
> have the protein_id or the transcript the script gives me this error:
> 
> ------------- EXCEPTION  -------------
> MSG: asking for tag value that does not exist protein_id
> STACK Bio::SeqFeature::Generic::get_tag_values
> /usr/share/perl5/Bio/SeqFeature/Generic.pm:504
> STACK toplevel parser_cds.pl:25
> --------------------------------------
> 
> Bellow I past the script
> 
> ##############################################
> use Bio::SeqIO;
> use warnings;
> 
> my $infile = $ARGV[0];
> my $outfile = "$infile.out";
> open (OUT, ">>$outfile");
> 
>           my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => 'Genbank');
> 
>          while (my $inseq = $seq_in->next_seq) {
> 
>         for my $feat_object ($inseq->get_SeqFeatures){
>             if ($feat_object->primary_tag eq "CDS"){
>                 print OUT $feat_object->get_tag_values('protein_id'),"
";
>             print OUT
$feat_object->get_tag_values('translation'),"\n";
>         }
>     }
> }
> ###############################################
> 
> Somebody can helps me?
> 
> Thank
> 
> Diogo Tschoeke
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cuiw at ncbi.nlm.nih.gov  Tue Jan 15 11:50:53 2008
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Tue, 15 Jan 2008 11:50:53 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
References: <478797CE.9050202@purdue.edu><14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu><4787C479.8070600@purdue.edu>
	
Message-ID: <18C407FD4FFB424292D769FBD68C1987048E95CC@NIHCESMLBX8.nih.gov>

There is an alternative way if you can download and compile NCBI C++ Toolkit (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/2007/Aug_27_2007/) . Simply call the binary like:
 
id1_fetch -fmt quality -gi 13508865
 
Wenwu Cui

________________________________

From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Fri 1/11/2008 2:40 PM
To: Phillip San Miguel
Cc: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] Recommended way to download qual files from Genbank?



Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
         Retrieving 1 records from nucleotide...
... and footer:
        Received records 1 - 1.
        Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu]
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from Genbank?
>
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in
> some way. (No attachment and last sentence truncated.) Would
> it be possible to get a complete version so I can be sure I'm
> following you?
> Thanks,
> Phillip
>
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch
> rettype=qual >
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers
> Institute for
> > Medical Research - Kansas City, Missouri
> >  
> >
> >  
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for
> >> Bio::DB::GenBank (which the script uses).  We'll have to
> investigate
> >> whether it is possible to retrieve this data via NCBI's
> eutils; if so
> >> we can try adding it in.  If you want you can submit this as an
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>    
> >>> No problem getting sequence from genbank via a myriad of
> methods. 
> >>> But as the volume of non-finished sequence in genbank
> increases the
> >>> importance of also obtaining quality values for a given sequence
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>      
> >> /usr/local/perl_5.8/lib/site_perl/
> >>    
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>    
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't
> >>> present in "GenBank(Full)" format. They are present in an ASN.1
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>      
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>    
> >
> >  
>
>
>





From singhal at berkeley.edu  Tue Jan 15 17:50:12 2008
From: singhal at berkeley.edu (Sonal Singhal)
Date: Tue, 15 Jan 2008 14:50:12 -0800
Subject: [Bioperl-l] redundant sequences
Message-ID: 

Hi all,

I am mining a few genomes to find all the genes in a gene family, and
of course multiple BLAST searches of different paralogs are returning
a lot of redundant hits.   I have searched the BioPerl documentation,
and I cannot find an easy way to cluster and then purge redundant
sequences.  Any ideas?

Cheers,
sonal


From MEC at stowers-institute.org  Tue Jan 15 18:21:00 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 15 Jan 2008 17:21:00 -0600
Subject: [Bioperl-l] redundant sequences
In-Reply-To: 
References: 
Message-ID: 

Cd-hit: http://bioinformatics.burnham.org/cd-hi/

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Sonal Singhal
> Sent: Tuesday, January 15, 2008 4:50 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] redundant sequences
> 
> Hi all,
> 
> I am mining a few genomes to find all the genes in a gene 
> family, and of course multiple BLAST searches of different 
> paralogs are returning
> a lot of redundant hits.   I have searched the BioPerl documentation,
> and I cannot find an easy way to cluster and then purge 
> redundant sequences.  Any ideas?
> 
> Cheers,
> sonal
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From cain.cshl at gmail.com  Tue Jan 15 21:24:50 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 15 Jan 2008 21:24:50 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
	
	<6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>
Message-ID: <1200450290.7276.3.camel@frissell>

Hi Chris and Lincoln,

I've attached my suggested patch.  So, can I use svn to check it in?  It
only adds a space after the feature type name; I suspect that will be
enough to fix the file format for most uses.

Scott

On Mon, 2008-01-14 at 14:46 -0500, Lincoln Stein wrote:
> That's a new bug. The version I worked on inserted a space after the name.
> 
> Lincoln
> 
> On Jan 14, 2008 2:35 PM, Chris Fields  wrote:
> 
> > It looks like the keys in the feature table run into the location
> > string w/o intervening space, which would probably cause havoc with
> > roundtripping from this output.  A few examples:
> >
> >      BAC_cloned_genomic_insert<1..>1000
> >      combined_genscanjoin(<1..347,400..498,794..>1000)
> >      splign_na_dbEST_ncbi<1..>1000
> >
> > I would think at least a space in between the location and the key
> > would be required for round-tripping out of genbank format.
> >
> > chris
> >
> > On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:
> >
> > > Hi Scott,
> > >
> > > He is correct about the limitation, but we deliberately relaxed it
> > > because
> > > we were running into situations where we lost information during
> > > roundtripping from other formats into genbank.
> > >
> > > Lincoln
> > >
> > > On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
> > >
> > >> Hi all,
> > >>
> > >> Last month, I got a bug report on the GBrowse bug tracker:
> > >>
> > >>
> > >>
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
> > >>
> > >> about a problem with dumping invalid GenBank files.  GBrowse uses
> > >> Bio::SeqIO::genbank to create these dumps.
> > >>
> > >> In his bug report, he claims that feature names over 15 characters
> > >> long
> > >> are invalid, and provided and example GenBank file where a feature is
> > >> named 'BAC_cloned_genomic_insert', which is over 15 characters.
> > >> What I
> > >> want to know is this: is this truly a restriction on the GenBank
> > >> format,
> > >> or is it a software problem with some other package?  Do we need to
> > >> fix
> > >> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> > >> believe this is really a bug.
> > >>
> > >> Thanks,
> > >> Scott
> > >>
> > >> --
> > >>
> > ------------------------------------------------------------------------
> > >> Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > >> GMOD Coordinator (http://www.gmod.org/)
> > >> 216-392-3087
> > >> Cold Spring Harbor Laboratory
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >
> > >
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> >
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: genbank.pm.patch
Type: text/x-patch
Size: 1110 bytes
Desc: not available
URL: 

From cjfields at uiuc.edu  Tue Jan 15 22:15:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 15 Jan 2008 21:15:51 -0600
Subject: [Bioperl-l] Subversion migration complete
Message-ID: 

On behalf of the BioPerl core developers, I am proud to announce that  
the BioPerl SVN migration has been completed.  We would like to thank  
everyone who helped, in particular George Hartzell and Chris  
Dagdigian, both of who played instrumental roles in the CVS->SVN  
conversion and anonymous SVN setup for BioPerl.

Anonymous SVN checkouts for bioperl-live are now possible using:
svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live

Developers can obtain a checkout from:
svn co svn+ssh://USER at dev.open-bio.org/home/svn-repositories/bioperl/ 
bioperl-live/trunk bioperl-live

Browsable repository:
http://code.open-bio.org/svnweb/index.cgi/bioperl/

Basic instructions:
http://www.bioperl.org/wiki/Using_Subversion

We are still in the midst of implementing a few extra details related  
to SVN migration; the status on these can be viewed here:
http://www.bioperl.org/wiki/CVS_to_SVN_Migration

Enjoy!

chris



From bug-bioperl at rt.cpan.org  Wed Jan 16 22:35:30 2008
From: bug-bioperl at rt.cpan.org (Chris Fields via RT)
Date: Wed, 16 Jan 2008 22:35:30 -0500
Subject: [Bioperl-l] [rt.cpan.org #29533] Bio::SeqIO::interpro depends on
	XML::DOM::XPath
In-Reply-To: 
References:   
	
Message-ID: 


       Queue: bioperl
 Ticket 

On Fri Sep 21 10:28:52 2007, support at helpdesk.open-bio.org wrote:
> Hi Mike,
> 
> The proper place to submit this fix is the bioperl-l at lists.open-bio.org
> mailing list or the OBF Bugzilla queue at:
> http://bugzilla.open-bio.org/, this RT system is mainly for sysadmin
> activities rather than for tracking code changes. Would you be so kind
> to re-send your request to one of the places above? Thanks for the heads
> up! :)
> 
> Regards,
> Mauricio.

This has been fixed.  I'll get the CPAN maintainer to close this out.


From vipingjo at gmail.com  Thu Jan 17 03:48:36 2008
From: vipingjo at gmail.com (viping)
Date: Thu, 17 Jan 2008 16:48:36 +0800
Subject: [Bioperl-l] Can't locate object method "is_compatible" via package
	"Bio::Tree::Tree"
Message-ID: <200801171648332965577@gmail.com>

Hi Everyone??

I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + Windows XP SP2.
When running example codes(attched below as t.pl) within Bio\Tree\Compatible.pm , I got this error:

Can't locate object method "is_compatible" via package "Bio::Tree::Tree"

I replaced "$t1->is_compatible($t2)" with "is_compatible Bio::Tree::Compatible ($t1,$t2)", the error changed:
Can't locate object method "get_nodes" via package "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252,  line 1.

I modified Compatible.pm, changed code for "get_nodes" like this "get_nodes Bio::Tree::Tree($self);", new error arised :
Can't use string ("Bio::Tree::Tree") as a HASH ref while "strict refs" in use at i:/Perl/site/lib/Bio\Tree\Tree.pm line 198,  line 1.

I gived up. Any help will be deeply appreciated.




# this is the example script in Bio::Tree::Compatible??t.pl
  use Bio::Tree::Compatible;
  use Bio::TreeIO;
  my $input = new Bio::TreeIO('-format' => 'newick',
                              '-file'   => 'input.tre');
  my $t1 = $input->next_tree;
  my $t2 = $input->next_tree;

  my ($incompat, $ilabels, $inodes) = $t1->is_compatible($t2);
  if ($incompat) {
    my %cluster1 = %{ $t1->cluster_representation };
    my %cluster2 = %{ $t2->cluster_representation };
    print "incompatible trees\n";
    if (scalar(@$ilabels)) {
      foreach my $label (@$ilabels) {
        my $node1 = $t1->find_node(-id => $label);
        my $node2 = $t2->find_node(-id => $label);
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "label $label";
        print " cluster"; map { print " ",$_ } @c1;
        print " cluster"; map { print " ",$_ } @c2; print "\n";
      }
    }
    if (scalar(@$inodes)) {
      while (@$inodes) {
        my $node1 = shift @$inodes;
        my $node2 = shift @$inodes;
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "cluster"; map { print " ",$_ } @c1;
        print " properly intersects cluster";
        map { print " ",$_ } @c2; print "\n";
      }
    }
  } else {
    print "compatible trees\n";
  }

__END__;

# this is the file 'input.tre':
(((A,B)C,D),(E,F,G));
((A,B)H,E,(J,(K)G)I);

# this is the full messages I got running like this: "perl.exe -w t.pl"
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96.
Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145.
Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162.
Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196.
Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211.
Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257.
Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278.
Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314.
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100.
Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152.
Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190.
Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252.
Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300.
Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334.
Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375.
Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399.
Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449.
Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491.
Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505.
Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526.
Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552.
Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577.
Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597.
Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617.
Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637.
Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653.
Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669.
Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685.
Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690.
Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717.
Can't locate object method "is_compatible" via package "Bio::Tree::Tree" at Z:\bp\t.pl line 8,  line 2.




From bix at sendu.me.uk  Thu Jan 17 06:18:56 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 17 Jan 2008 11:18:56 +0000
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
 package "Bio::Tree::Tree"
In-Reply-To: <200801171648332965577@gmail.com>
References: <200801171648332965577@gmail.com>
Message-ID: <478F39A0.2030508@sendu.me.uk>

viping wrote:
> Hi Everyone??
> 
> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + 
> Windows XP SP2. When running example codes(attched below as t.pl) 
> within Bio\Tree\Compatible.pm , I got this error:
> 
> Can't locate object method "is_compatible" via package 
> "Bio::Tree::Tree"
> 
> I replaced "$t1->is_compatible($t2)" with "is_compatible 
> Bio::Tree::Compatible ($t1,$t2)",

Yup, you had the right idea; unfortunately the synopsis code for
Bio::Tree::Compatible is wrong.
I've now fixed it in svn.


> the error changed: Can't locate object method "get_nodes" via package
>  "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm 
> line 252,  line 1.

I didn't get quite that error; instead I had an issue with TreeIO: for
whatever reason it is only returning one tree from your input file (ie.
$t2 is undefined).

I therefore got "Can't call method "get_nodes" on an undefined value [...]"

Can someone look into/confirm that?



From bix at sendu.me.uk  Thu Jan 17 06:35:57 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 17 Jan 2008 11:35:57 +0000
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
 package "Bio::Tree::Tree"
In-Reply-To: <478F39A0.2030508@sendu.me.uk>
References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk>
Message-ID: <478F3D9D.6050306@sendu.me.uk>

Sendu Bala wrote:
>> the error changed: Can't locate object method "get_nodes" via
>> package "Bio::Tree::Compatible" at
>> i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252,  line 1.
> 
> I didn't get quite that error; instead I had an issue with TreeIO:
> for whatever reason it is only returning one tree from your input
> file (ie. $t2 is undefined).
> 
> I therefore got "Can't call method "get_nodes" on an undefined value
> [...]"
> 
> Can someone look into/confirm that?

... Yeah, I think I'm losing my mind. The code below is 'ok' using the
commented out -fh input for TreeIO, but is 'not ok' using the -file
input, where the specified file contains the exact same data as
__DATA__. Huh?


#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::Tree::Compatible;
use Bio::TreeIO;
my $input = new Bio::TreeIO('-format' => 'newick',
                             #-fh      => \*DATA,
                             -file    => 'input.tre'
                             );
my $t1 = $input->next_tree;
my $t2 = $input->next_tree;

if ($t2) {
    print "ok\n";
}
else {
    print "not ok\n";
}

__DATA__
(((A,B)C,D),(E,F,G));
((A,B)H,E,(J,(K)G)I);




From vipingjo at gmail.com  Thu Jan 17 08:23:14 2008
From: vipingjo at gmail.com (viping)
Date: Thu, 17 Jan 2008 21:23:14 +0800
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
	package"Bio::Tree::Tree"
References: <200801171648332965577@gmail.com>, <478F39A0.2030508@sendu.me.uk>
Message-ID: <200801172123112184046@gmail.com>

I got latest  code modified by Sendu Bala vi SVN. It works well while "input.tre" and "t.pl" are in the same directory. Thank you, Sendu Bala.  

This is output:
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96.
Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145.
Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162.
Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196.
Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211.
Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257.
Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278.
Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314.
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100.
Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152.
Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190.
Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252.
Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300.
Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334.
Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375.
Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399.
Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449.
Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491.
Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505.
Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526.
Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552.
Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577.
Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597.
Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617.
Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637.
Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653.
Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669.
Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685.
Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690.
Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717.
incompatible trees
label G cluster G cluster G K
cluster A B C properly intersects cluster A B H
cluster A B C properly intersects cluster A B E G H I J K
cluster A B C D properly intersects cluster A B H
cluster A B C D properly intersects cluster A B E G H I J K
cluster E F G properly intersects cluster G K
cluster E F G properly intersects cluster G I J K
cluster E F G properly intersects cluster A B E G H I J K
cluster A B C D E F G properly intersects cluster A B H
cluster A B C D E F G properly intersects cluster G K
cluster A B C D E F G properly intersects cluster G I J K
cluster A B C D E F G properly intersects cluster A B E G H I J K

#this is latest code:
  use Bio::Tree::Compatible;
  use Bio::TreeIO;
  my $input = Bio::TreeIO->new('-format' => 'newick',
                               '-file'   => 'input.tre');
  my $t1 = $input->next_tree;
  my $t2 = $input->next_tree;

  my ($incompat, $ilabels, $inodes) = Bio::Tree::Compatible::is_compatible($t1,$t2);
  if ($incompat) {
    my %cluster1 = %{ Bio::Tree::Compatible::cluster_representation($t1) };
    my %cluster2 = %{ Bio::Tree::Compatible::cluster_representation($t2) };
    print "incompatible trees\n";
    if (scalar(@$ilabels)) {
      foreach my $label (@$ilabels) {
        my $node1 = $t1->find_node(-id => $label);
        my $node2 = $t2->find_node(-id => $label);
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "label $label";
        print " cluster"; map { print " ",$_ } @c1;
        print " cluster"; map { print " ",$_ } @c2; print "\n";
      }
    }
    if (scalar(@$inodes)) {
      while (@$inodes) {
        my $node1 = shift @$inodes;
        my $node2 = shift @$inodes;
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "cluster"; map { print " ",$_ } @c1;
        print " properly intersects cluster";
        map { print " ",$_ } @c2; print "\n";
      }
    }
  } else {
    print "compatible trees\n";
  }


------------------				 
viping
2008-01-17

-------------------------------------------------------------
From: Sendu Bala
Date: 2008-01-17 19:19:30
To: viping
Cc: bioperl-l
Subject: Re: [Bioperl-l] Can't locate object method "is_compatible" via package"Bio::Tree::Tree"

viping wrote:
> Hi Everyone??
> 
> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + 
> Windows XP SP2. When running example codes(attched below as t.pl) 
> within Bio\Tree\Compatible.pm , I got this error:
> 
> Can't locate object method "is_compatible" via package 
> "Bio::Tree::Tree"
> 
> I replaced "$t1->is_compatible($t2)" with "is_compatible 
> Bio::Tree::Compatible ($t1,$t2)",

Yup, you had the right idea; unfortunately the synopsis code for
Bio::Tree::Compatible is wrong.
I've now fixed it in svn.


> the error changed: Can't locate object method "get_nodes" via package
>  "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm 
> line 252,  line 1.

I didn't get quite that error; instead I had an issue with TreeIO: for
whatever reason it is only returning one tree from your input file (ie.
$t2 is undefined).

I therefore got "Can't call method "get_nodes" on an undefined value [...]"

Can someone look into/confirm that?



From cjfields at uiuc.edu  Thu Jan 17 08:25:41 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 17 Jan 2008 07:25:41 -0600
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
	package "Bio::Tree::Tree"
In-Reply-To: <478F39A0.2030508@sendu.me.uk>
References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk>
Message-ID: <7BF3650B-F1D4-4F21-9C59-3AC13CA35945@uiuc.edu>

Probably need to file this as a bug.  There is a similar issue with  
Bio::TreeIO::nexus, but it probably isn't related unless it is using  
the same parsing logic:

http://bugzilla.open-bio.org/show_bug.cgi?id=2356

chris

On Jan 17, 2008, at 5:18 AM, Sendu Bala wrote:

> viping wrote:
>> Hi Everyone?
>>
>> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 +
>> Windows XP SP2. When running example codes(attched below as t.pl)
>> within Bio\Tree\Compatible.pm , I got this error:
>>
>> Can't locate object method "is_compatible" via package
>> "Bio::Tree::Tree"
>>
>> I replaced "$t1->is_compatible($t2)" with "is_compatible
>> Bio::Tree::Compatible ($t1,$t2)",
>
> Yup, you had the right idea; unfortunately the synopsis code for
> Bio::Tree::Compatible is wrong.
> I've now fixed it in svn.
>
>
>> the error changed: Can't locate object method "get_nodes" via package
>> "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm
>> line 252,  line 1.
>
> I didn't get quite that error; instead I had an issue with TreeIO: for
> whatever reason it is only returning one tree from your input file  
> (ie.
> $t2 is undefined).
>
> I therefore got "Can't call method "get_nodes" on an undefined value  
> [...]"
>
> Can someone look into/confirm that?
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






From N.Haigh at sheffield.ac.uk  Fri Jan 18 07:47:48 2008
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 18 Jan 2008 12:47:48 +0000
Subject: [Bioperl-l] Parsing Primer3 output
Message-ID: <1200660468.47909ff498dd0@webmail.shef.ac.uk>

I might be overlooking something, but is it possible to parse primer3 output?

Cheers
Nath



From cjfields at uiuc.edu  Fri Jan 18 08:27:47 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 Jan 2008 07:27:47 -0600
Subject: [Bioperl-l] Parsing Primer3 output
In-Reply-To: <1200660468.47909ff498dd0@webmail.shef.ac.uk>
References: <1200660468.47909ff498dd0@webmail.shef.ac.uk>
Message-ID: <8C8BF818-FC04-42E3-9210-3FE23F92EA8F@uiuc.edu>

Bio::Tools::Primer3.

chris

On Jan 18, 2008, at 6:47 AM, Nathan S. Haigh wrote:

> I might be overlooking something, but is it possible to parse  
> primer3 output?
>
> Cheers
> Nath
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hangsyin at gmail.com  Sat Jan 19 13:25:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sat, 19 Jan 2008 10:25:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined
 value at BIO::DB::GFF.pl
Message-ID: <14971922.post@talk.nabble.com>


Hi, everyone,

I met this problem when I was running this script to extract features
overlaps with 4:20,000..25,000. It always responds like "Can't call method
"features" on an undefined value at BIO::DB::GFF.pl line XX".
==============================================================
use Bio::DB::GFF;
use Bio::Tools::GFF;
my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
                                        -dsn =>
'dbi:mysql:dmel_gff:localhost',
                                        -user => 'XXXX',
                                        -pass => 'XXXX') || die "database
open failed";

my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
my @features = $segment->features(-types => ['gene', 'exon', 'intron',
'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
print(scalar(@features)."\n");

================================================================
I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
Other methods failed also. 

Any help will be deeply appreciated!

Best,
Jon

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14971922.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain.cshl at gmail.com  Sat Jan 19 22:36:44 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Sat, 19 Jan 2008 22:36:44 -0500
Subject: [Bioperl-l] Problem: Can't call method "features" on
	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <14971922.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com>
Message-ID: <1200800204.6069.5.camel@frissell>

Hi Jon,

I think it's funny that you have "or die" on the database opening line,
"or die" on the @features line, but you didn't put one on the $segment
line.  Try adding "or die: $!" to the $segment line to see what it says,
also add a 'print $segment' after you create it and before you try to
get the features from it.  

Clearly, the problem is that $segment is not defined (that is, nothing
is in it, not that the wrong thing is in it).  The next trick is to find
out why.  My first guess, without looking at the data set, is that the
arm is not really named '4'.

Scott

On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> Hi, everyone,
> 
> I met this problem when I was running this script to extract features
> overlaps with 4:20,000..25,000. It always responds like "Can't call method
> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> ==============================================================
> use Bio::DB::GFF;
> use Bio::Tools::GFF;
> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>                                         -dsn =>
> 'dbi:mysql:dmel_gff:localhost',
>                                         -user => 'XXXX',
>                                         -pass => 'XXXX') || die "database
> open failed";
> 
> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> print(scalar(@features)."\n");
> 
> ================================================================
> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
> Other methods failed also. 
> 
> Any help will be deeply appreciated!
> 
> Best,
> Jon
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hangsyin at gmail.com  Sat Jan 19 22:49:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sat, 19 Jan 2008 19:49:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on
	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <1200800204.6069.5.camel@frissell>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
Message-ID: <14978241.post@talk.nabble.com>


Hi, Scott,

After adding die $!, I know something is wrong at line:
"my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);"

my gff file is like this:
##gff-version 3
##sequence-region 4 1 1351857
4	FlyBase	transposable_element	2	611	.	+	.
ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
4	repeatmasker_dummy	match	2	347	.	+	.
ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
4	repeatmasker_dummy	match_part	2	347	2367	+	.
ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
5860 6210 +;
...
...
I really got confused. Any further suggestion? Thank you!

Jon





Scott Cain-3 wrote:
> 
> Hi Jon,
> 
> I think it's funny that you have "or die" on the database opening line,
> "or die" on the @features line, but you didn't put one on the $segment
> line.  Try adding "or die: $!" to the $segment line to see what it says,
> also add a 'print $segment' after you create it and before you try to
> get the features from it.  
> 
> Clearly, the problem is that $segment is not defined (that is, nothing
> is in it, not that the wrong thing is in it).  The next trick is to find
> out why.  My first guess, without looking at the data set, is that the
> arm is not really named '4'.
> 
> Scott
> 
> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>> Hi, everyone,
>> 
>> I met this problem when I was running this script to extract features
>> overlaps with 4:20,000..25,000. It always responds like "Can't call
>> method
>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>> ==============================================================
>> use Bio::DB::GFF;
>> use Bio::Tools::GFF;
>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>                                         -dsn =>
>> 'dbi:mysql:dmel_gff:localhost',
>>                                         -user => 'XXXX',
>>                                         -pass => 'XXXX') || die "database
>> open failed";
>> 
>> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
>> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
>> print(scalar(@features)."\n");
>> 
>> ================================================================
>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
>> Other methods failed also. 
>> 
>> Any help will be deeply appreciated!
>> 
>> Best,
>> Jon
>> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14978241.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain.cshl at gmail.com  Sat Jan 19 23:08:04 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Sat, 19 Jan 2008 23:08:04 -0500
Subject: [Bioperl-l] Problem: Can't call method "features"
	on	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <14978241.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com>
	<1200800204.6069.5.camel@frissell>  <14978241.post@talk.nabble.com>
Message-ID: <1200802084.6069.11.camel@frissell>

Hi Jon,

Well, seeing the error message would be helpful, but my first guess
without is that there are a few things you can try:

  * removing the "sequence-region" line from the GFF file, adding a line
like this:

  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4

and then reloading the database.

  * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
is, with three levels of features (like gene, mRNA and CDS)).

Scott

On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
> Hi, Scott,
> 
> After adding die $!, I know something is wrong at line:
> "my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);"
> 
> my gff file is like this:
> ##gff-version 3
> ##sequence-region 4 1 1351857
> 4	FlyBase	transposable_element	2	611	.	+	.
> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
> 4	repeatmasker_dummy	match	2	347	.	+	.
> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
> 5860 6210 +;
> ...
> ...
> I really got confused. Any further suggestion? Thank you!
> 
> Jon
> 
> 
> 
> 
> 
> Scott Cain-3 wrote:
> > 
> > Hi Jon,
> > 
> > I think it's funny that you have "or die" on the database opening line,
> > "or die" on the @features line, but you didn't put one on the $segment
> > line.  Try adding "or die: $!" to the $segment line to see what it says,
> > also add a 'print $segment' after you create it and before you try to
> > get the features from it.  
> > 
> > Clearly, the problem is that $segment is not defined (that is, nothing
> > is in it, not that the wrong thing is in it).  The next trick is to find
> > out why.  My first guess, without looking at the data set, is that the
> > arm is not really named '4'.
> > 
> > Scott
> > 
> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> >> Hi, everyone,
> >> 
> >> I met this problem when I was running this script to extract features
> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
> >> method
> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> >> ==============================================================
> >> use Bio::DB::GFF;
> >> use Bio::Tools::GFF;
> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >>                                         -dsn =>
> >> 'dbi:mysql:dmel_gff:localhost',
> >>                                         -user => 'XXXX',
> >>                                         -pass => 'XXXX') || die "database
> >> open failed";
> >> 
> >> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> >> print(scalar(@features)."\n");
> >> 
> >> ================================================================
> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
> >> Other methods failed also. 
> >> 
> >> Any help will be deeply appreciated!
> >> 
> >> Best,
> >> Jon
> >> 
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hangsyin at gmail.com  Sun Jan 20 10:08:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sun, 20 Jan 2008 07:08:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features"
	on	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <1200802084.6069.11.camel@frissell>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
Message-ID: <14982665.post@talk.nabble.com>


Hi, Scott,
I tried to change sequence-region line to "4   FlyBase  chromosome_arm  1 
1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say anything but
"died at line 12".

So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to
load the dmel-all-r5.4.gff(from Flybase) to a test database:
=============================================================
use Bio::DB::SeqFeature::Store;
use Bio::DB::SeqFeature::Store::GFF3Loader;
my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
                                         -dsn     => 'dbi:mysql:test',
                                         -user    => 'root',
                                         -pass    => 'XXXXX',
                                         -write   =>  1 );
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    => $db,
                                                         -verbose  => 1);
$loader->load(./'dmel-all-r5.4.gff');
=============================================================
I got bunch of errors like this:
"DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at
C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316".
The line 1316 in mysql.pm looks like this: $sth->execute($name) or die
$sth->errstr;
I checked the database test after failed loading. There is only one table
created, which call 'meta'. I also tried 'grant all on test to
XXX at localhost' and used that -user and -pass to load gff, it didn't work
either.

Jon


Scott Cain-3 wrote:
> 
> Hi Jon,
> 
> Well, seeing the error message would be helpful, but my first guess
> without is that there are a few things you can try:
> 
>   * removing the "sequence-region" line from the GFF file, adding a line
> like this:
> 
>   4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
> 
> and then reloading the database.
> 
>   * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
> is, with three levels of features (like gene, mRNA and CDS)).
> 
> Scott
> 
> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>> Hi, Scott,
>> 
>> After adding die $!, I know something is wrong at line:
>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>> 25000);"
>> 
>> my gff file is like this:
>> ##gff-version 3
>> ##sequence-region 4 1 1351857
>> 4	FlyBase	transposable_element	2	611	.	+	.
>> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>> 4	repeatmasker_dummy	match	2	347	.	+	.
>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
>> 5860 6210 +;
>> ...
>> ...
>> I really got confused. Any further suggestion? Thank you!
>> 
>> Jon
>> 
>> 
>> 
>> 
>> 
>> Scott Cain-3 wrote:
>> > 
>> > Hi Jon,
>> > 
>> > I think it's funny that you have "or die" on the database opening line,
>> > "or die" on the @features line, but you didn't put one on the $segment
>> > line.  Try adding "or die: $!" to the $segment line to see what it
>> says,
>> > also add a 'print $segment' after you create it and before you try to
>> > get the features from it.  
>> > 
>> > Clearly, the problem is that $segment is not defined (that is, nothing
>> > is in it, not that the wrong thing is in it).  The next trick is to
>> find
>> > out why.  My first guess, without looking at the data set, is that the
>> > arm is not really named '4'.
>> > 
>> > Scott
>> > 
>> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>> >> Hi, everyone,
>> >> 
>> >> I met this problem when I was running this script to extract features
>> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
>> >> method
>> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>> >> ==============================================================
>> >> use Bio::DB::GFF;
>> >> use Bio::Tools::GFF;
>> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>> >>                                         -dsn =>
>> >> 'dbi:mysql:dmel_gff:localhost',
>> >>                                         -user => 'XXXX',
>> >>                                         -pass => 'XXXX') || die
>> "database
>> >> open failed";
>> >> 
>> >> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>> 25000);
>> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
>> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
>> >> print(scalar(@features)."\n");
>> >> 
>> >> ================================================================
>> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
>> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>> error.
>> >> Other methods failed also. 
>> >> 
>> >> Any help will be deeply appreciated!
>> >> 
>> >> Best,
>> >> Jon
>> >> 
>> > -- 
>> >
>> ------------------------------------------------------------------------
>> > Scott Cain, Ph. D.                                        
>> cain at cshl.edu
>> > GMOD Coordinator (http://www.gmod.org/)                    
>> 216-392-3087
>> > Cold Spring Harbor Laboratory
>> > 
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > 
>> > 
>> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain at cshl.edu  Sun Jan 20 10:25:16 2008
From: cain at cshl.edu (Scott Cain)
Date: Sun, 20 Jan 2008 10:25:16 -0500 (EST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an
 undefined value at BIO::DB::GFF.pl
In-Reply-To: <14982665.post@talk.nabble.com>
Message-ID: 

Jon,

There is a script for loading a SeqFeature database just like the GFF
database, though I don't know what it's called off hand (I'm not at my
normal computer right now).  Be sure to read the documentation and you
will probably want to use the 'fast' option (I don't remember what it is
called either).

Scott


----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain at cshl.edu
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Sun, 20 Jan 2008, Hang wrote:

> 
> Hi, Scott,
> I tried to change sequence-region line to "4   FlyBase  chromosome_arm  1 
> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say anything but
> "died at line 12".
> 
> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to
> load the dmel-all-r5.4.gff(from Flybase) to a test database:
> =============================================================
> use Bio::DB::SeqFeature::Store;
> use Bio::DB::SeqFeature::Store::GFF3Loader;
> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>                                          -dsn     => 'dbi:mysql:test',
>                                          -user    => 'root',
>                                          -pass    => 'XXXXX',
>                                          -write   =>  1 );
> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    => $db,
>                                                          -verbose  => 1);
> $loader->load(./'dmel-all-r5.4.gff');
> =============================================================
> I got bunch of errors like this:
> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at
> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316".
> The line 1316 in mysql.pm looks like this: $sth->execute($name) or die
> $sth->errstr;
> I checked the database test after failed loading. There is only one table
> created, which call 'meta'. I also tried 'grant all on test to
> XXX at localhost' and used that -user and -pass to load gff, it didn't work
> either.
> 
> Jon
> 
> 
> Scott Cain-3 wrote:
> > 
> > Hi Jon,
> > 
> > Well, seeing the error message would be helpful, but my first guess
> > without is that there are a few things you can try:
> > 
> >   * removing the "sequence-region" line from the GFF file, adding a line
> > like this:
> > 
> >   4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
> > 
> > and then reloading the database.
> > 
> >   * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
> > Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
> > is, with three levels of features (like gene, mRNA and CDS)).
> > 
> > Scott
> > 
> > On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
> >> Hi, Scott,
> >> 
> >> After adding die $!, I know something is wrong at line:
> >> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
> >> 25000);"
> >> 
> >> my gff file is like this:
> >> ##gff-version 3
> >> ##sequence-region 4 1 1351857
> >> 4	FlyBase	transposable_element	2	611	.	+	.
> >> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
> >> 4	repeatmasker_dummy	match	2	347	.	+	.
> >> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
> >> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
> >> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
> >> 5860 6210 +;
> >> ...
> >> ...
> >> I really got confused. Any further suggestion? Thank you!
> >> 
> >> Jon
> >> 
> >> 
> >> 
> >> 
> >> 
> >> Scott Cain-3 wrote:
> >> > 
> >> > Hi Jon,
> >> > 
> >> > I think it's funny that you have "or die" on the database opening line,
> >> > "or die" on the @features line, but you didn't put one on the $segment
> >> > line.  Try adding "or die: $!" to the $segment line to see what it
> >> says,
> >> > also add a 'print $segment' after you create it and before you try to
> >> > get the features from it.  
> >> > 
> >> > Clearly, the problem is that $segment is not defined (that is, nothing
> >> > is in it, not that the wrong thing is in it).  The next trick is to
> >> find
> >> > out why.  My first guess, without looking at the data set, is that the
> >> > arm is not really named '4'.
> >> > 
> >> > Scott
> >> > 
> >> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> >> >> Hi, everyone,
> >> >> 
> >> >> I met this problem when I was running this script to extract features
> >> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
> >> >> method
> >> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> >> >> ==============================================================
> >> >> use Bio::DB::GFF;
> >> >> use Bio::Tools::GFF;
> >> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >> >>                                         -dsn =>
> >> >> 'dbi:mysql:dmel_gff:localhost',
> >> >>                                         -user => 'XXXX',
> >> >>                                         -pass => 'XXXX') || die
> >> "database
> >> >> open failed";
> >> >> 
> >> >> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
> >> 25000);
> >> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> >> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> >> >> print(scalar(@features)."\n");
> >> >> 
> >> >> ================================================================
> >> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> >> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
> >> error.
> >> >> Other methods failed also. 
> >> >> 
> >> >> Any help will be deeply appreciated!
> >> >> 
> >> >> Best,
> >> >> Jon
> >> >> 
> >> > -- 
> >> >
> >> ------------------------------------------------------------------------
> >> > Scott Cain, Ph. D.                                        
> >> cain at cshl.edu
> >> > GMOD Coordinator (http://www.gmod.org/)                    
> >> 216-392-3087
> >> > Cold Spring Harbor Laboratory
> >> > 
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > 
> >> > 
> >> 
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
> -- 
> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From cjfields at uiuc.edu  Sun Jan 20 12:10:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 20 Jan 2008 11:10:27 -0600
Subject: [Bioperl-l] Problem: Can't call method "features" on an
	undefined value at BIO::DB::GFF.pl
In-Reply-To: 
References: 
Message-ID: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>

It's bp_seqfeature_load.pl (if you have the full bioperl core  
distribution, it's in script/Bio-SeqFeature/Store).  I had some  
problems with the fast-loading option but it was likely just my gff  
formatting; example data loaded just fine.

As for the error, you need to use the '-create' flag when initializing  
a database (or wiping data from a current one):

=============================================================
use Bio::DB::SeqFeature::Store;
use Bio::DB::SeqFeature::Store::GFF3Loader;
my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
                                         -dsn     => 'dbi:mysql:test',
                                         -user    => 'root',
                                         -pass    => 'XXXXX',
                                         -write   =>  1
                                         -create  => 1);
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>  
$db,
                                                         -verbose  =>  
1);
$loader->load(./'dmel-all-r5.4.gff');
=============================================================

chris

On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:

> Jon,
>
> There is a script for loading a SeqFeature database just like the GFF
> database, though I don't know what it's called off hand (I'm not at my
> normal computer right now).  Be sure to read the documentation and you
> will probably want to use the 'fast' option (I don't remember what  
> it is
> called either).
>
> Scott
>
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain at cshl.edu
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Sun, 20 Jan 2008, Hang wrote:
>
>>
>> Hi, Scott,
>> I tried to change sequence-region line to "4   FlyBase   
>> chromosome_arm  1
>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say  
>> anything but
>> "died at line 12".
>>
>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my  
>> code to
>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>> =============================================================
>> use Bio::DB::SeqFeature::Store;
>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>                                         -dsn     => 'dbi:mysql:test',
>>                                         -user    => 'root',
>>                                         -pass    => 'XXXXX',
>>                                         -write   =>  1 );
>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store     
>> => $db,
>>                                                         -verbose   
>> => 1);
>> $loader->load(./'dmel-all-r5.4.gff');
>> =============================================================
>> I got bunch of errors like this:
>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't  
>> exist at
>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line  
>> 1316".
>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or  
>> die
>> $sth->errstr;
>> I checked the database test after failed loading. There is only one  
>> table
>> created, which call 'meta'. I also tried 'grant all on test to
>> XXX at localhost' and used that -user and -pass to load gff, it didn't  
>> work
>> either.
>>
>> Jon
>>
>>
>> Scott Cain-3 wrote:
>>>
>>> Hi Jon,
>>>
>>> Well, seeing the error message would be helpful, but my first guess
>>> without is that there are a few things you can try:
>>>
>>>  * removing the "sequence-region" line from the GFF file, adding a  
>>> line
>>> like this:
>>>
>>>  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>
>>> and then reloading the database.
>>>
>>>  * Or, you may want to consider using Bio::DB::SeqFeature::Store,  
>>> since
>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3  
>>> (that
>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>
>>> Scott
>>>
>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>> Hi, Scott,
>>>>
>>>> After adding die $!, I know something is wrong at line:
>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>> 25000);"
>>>>
>>>> my gff file is like this:
>>>> ##gff-version 3
>>>> ##sequence-region 4 1 1351857
>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>> ID=FBti0062890;Name=ninja-Dsim- 
>>>> like 
>>>> {}4829 
>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- 
>>>> RepeatMasker;
>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>> ID=:5142029_dummy;Name=:5142029;Parent=: 
>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>> 5860 6210 +;
>>>> ...
>>>> ...
>>>> I really got confused. Any further suggestion? Thank you!
>>>>
>>>> Jon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Scott Cain-3 wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> I think it's funny that you have "or die" on the database  
>>>>> opening line,
>>>>> "or die" on the @features line, but you didn't put one on the  
>>>>> $segment
>>>>> line.  Try adding "or die: $!" to the $segment line to see what it
>>>> says,
>>>>> also add a 'print $segment' after you create it and before you  
>>>>> try to
>>>>> get the features from it.
>>>>>
>>>>> Clearly, the problem is that $segment is not defined (that is,  
>>>>> nothing
>>>>> is in it, not that the wrong thing is in it).  The next trick is  
>>>>> to
>>>> find
>>>>> out why.  My first guess, without looking at the data set, is  
>>>>> that the
>>>>> arm is not really named '4'.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>> Hi, everyone,
>>>>>>
>>>>>> I met this problem when I was running this script to extract  
>>>>>> features
>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't  
>>>>>> call
>>>>>> method
>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>> ==============================================================
>>>>>> use Bio::DB::GFF;
>>>>>> use Bio::Tools::GFF;
>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>                                        -dsn =>
>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>                                        -user => 'XXXX',
>>>>>>                                        -pass => 'XXXX') || die
>>>> "database
>>>>>> open failed";
>>>>>>
>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>> 25000);
>>>>>> my @features = $segment->features(-types => ['gene', 'exon',  
>>>>>> 'intron',
>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no  
>>>>>> features";
>>>>>> print(scalar(@features)."\n");
>>>>>>
>>>>>> ================================================================
>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I  
>>>>>> loaded
>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>>>> error.
>>>>>> Other methods failed also.
>>>>>>
>>>>>> Any help will be deeply appreciated!
>>>>>>
>>>>>> Best,
>>>>>> Jon
>>>>>>
>>>>> -- 
>>>>>
>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>> -- 
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From ykumagai at biken.osaka-u.ac.jp  Mon Jan 21 11:56:53 2008
From: ykumagai at biken.osaka-u.ac.jp (Yutaro Kumagai)
Date: Tue, 22 Jan 2008 01:56:53 +0900
Subject: [Bioperl-l] Problem with Bio::ASN1::EntrezGene::Indexer
Message-ID: <4794CED5.3070307@biken.osaka-u.ac.jp>

Hi, everyone,

I'm working on Bio::ASN1::EntrezGene::Indexer as below:

###
use Bio::ASN1::EntrezGene::Indexer
use Bio::ASN1::EntrezGene
use Bio::SeqIO;

my $inx = Bio::ASN1::EntrezGene::Indexer->new(-filename =>
					      'c:/chrm/asn/entrezgene.idx');

# The index file has already been made successfully. I checked it
# by counting the num. of records by $inx -> count_records etc. etc.

my $seq1 = $inx -> fetch_hash(15959);

# The ID 15969 surely exists, because I had no err message and
# by dumpening $seq1, I confirmed that $seq1 contains some data.

my $seq2 = $inx -> fetch(15969);
###

However, the last method returned this error:
"you must pass in a file name or handle through new() or input_file() first
before calling next_seq!
at C:/Perl/site/lib/Bio\SeqIO\entrezgene.pm line 136".

I chased the programm by the debugger, and found that somehow _fh()
in Bio::Index::AbstractSeq failed to pass the filehandle to fetch.

Now, I have two questions:

1) what's wrong with the above methods? Is this a bug? Or just my
fault? If so, what is my fault?

2) If I could'nt work with "fetch", how can I extract the data
of sequences (position in genomic contig, strand etc.) from
the data obtained by "fetch_hash"? Now I can't understand how
the data structure of results by "fetch_hash" is...

Thank you in advance.

Yutaro Kumagai.

-- 
**********************************
Yutaro Kumagai
Dept. of Host Defense
Res. Inst. for Microbial Diseases
Osaka University
Japan
ykumagai at biken.osaka-u.ac.jp
**********************************


From hangsyin at gmail.com  Mon Jan 21 14:22:55 2008
From: hangsyin at gmail.com (Hang)
Date: Mon, 21 Jan 2008 11:22:55 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an
 undefined value at BIO::DB::GFF.pl
In-Reply-To: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
	<14982665.post@talk.nabble.com>
	
	<3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
Message-ID: <15004412.post@talk.nabble.com>


Hi, Chris:

Following your suggestion, I added -create flag and the GFF3loader started
to work. Thanks alot!
When I load dmel-all-5.4.gff into mysql with -fast, I had the following
error:
   Data too long for column 'attribute_value' at c:/../../../mysql.pm line
510
If I don't use -fast, it is OK, except for the annoying slow speed. Do you
have any suggestion on this?

Best,
Hang




Chris Fields wrote:
> 
> It's bp_seqfeature_load.pl (if you have the full bioperl core  
> distribution, it's in script/Bio-SeqFeature/Store).  I had some  
> problems with the fast-loading option but it was likely just my gff  
> formatting; example data loaded just fine.
> 
> As for the error, you need to use the '-create' flag when initializing  
> a database (or wiping data from a current one):
> 
> =============================================================
> use Bio::DB::SeqFeature::Store;
> use Bio::DB::SeqFeature::Store::GFF3Loader;
> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>                                          -dsn     => 'dbi:mysql:test',
>                                          -user    => 'root',
>                                          -pass    => 'XXXXX',
>                                          -write   =>  1
>                                          -create  => 1);
> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>  
> $db,
>                                                          -verbose  =>  
> 1);
> $loader->load(./'dmel-all-r5.4.gff');
> =============================================================
> 
> chris
> 
> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:
> 
>> Jon,
>>
>> There is a script for loading a SeqFeature database just like the GFF
>> database, though I don't know what it's called off hand (I'm not at my
>> normal computer right now).  Be sure to read the documentation and you
>> will probably want to use the 'fast' option (I don't remember what  
>> it is
>> called either).
>>
>> Scott
>>
>>
>> ----------------------------------------------------------------------
>> Scott Cain, Ph. D.				 	 cain at cshl.edu
>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>> ----------------------------------------------------------------------
>>
>>
>> On Sun, 20 Jan 2008, Hang wrote:
>>
>>>
>>> Hi, Scott,
>>> I tried to change sequence-region line to "4   FlyBase   
>>> chromosome_arm  1
>>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say  
>>> anything but
>>> "died at line 12".
>>>
>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my  
>>> code to
>>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>>> =============================================================
>>> use Bio::DB::SeqFeature::Store;
>>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>>                                         -dsn     => 'dbi:mysql:test',
>>>                                         -user    => 'root',
>>>                                         -pass    => 'XXXXX',
>>>                                         -write   =>  1 );
>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store     
>>> => $db,
>>>                                                         -verbose   
>>> => 1);
>>> $loader->load(./'dmel-all-r5.4.gff');
>>> =============================================================
>>> I got bunch of errors like this:
>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't  
>>> exist at
>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line  
>>> 1316".
>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or  
>>> die
>>> $sth->errstr;
>>> I checked the database test after failed loading. There is only one  
>>> table
>>> created, which call 'meta'. I also tried 'grant all on test to
>>> XXX at localhost' and used that -user and -pass to load gff, it didn't  
>>> work
>>> either.
>>>
>>> Jon
>>>
>>>
>>> Scott Cain-3 wrote:
>>>>
>>>> Hi Jon,
>>>>
>>>> Well, seeing the error message would be helpful, but my first guess
>>>> without is that there are a few things you can try:
>>>>
>>>>  * removing the "sequence-region" line from the GFF file, adding a  
>>>> line
>>>> like this:
>>>>
>>>>  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>>
>>>> and then reloading the database.
>>>>
>>>>  * Or, you may want to consider using Bio::DB::SeqFeature::Store,  
>>>> since
>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3  
>>>> (that
>>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>>
>>>> Scott
>>>>
>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>>> Hi, Scott,
>>>>>
>>>>> After adding die $!, I know something is wrong at line:
>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>>> 25000);"
>>>>>
>>>>> my gff file is like this:
>>>>> ##gff-version 3
>>>>> ##sequence-region 4 1 1351857
>>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>>> ID=FBti0062890;Name=ninja-Dsim- 
>>>>> like 
>>>>> {}4829 
>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- 
>>>>> RepeatMasker;
>>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>>> ID=:5142029_dummy;Name=:5142029;Parent=: 
>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>>> 5860 6210 +;
>>>>> ...
>>>>> ...
>>>>> I really got confused. Any further suggestion? Thank you!
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Scott Cain-3 wrote:
>>>>>>
>>>>>> Hi Jon,
>>>>>>
>>>>>> I think it's funny that you have "or die" on the database  
>>>>>> opening line,
>>>>>> "or die" on the @features line, but you didn't put one on the  
>>>>>> $segment
>>>>>> line.  Try adding "or die: $!" to the $segment line to see what it
>>>>> says,
>>>>>> also add a 'print $segment' after you create it and before you  
>>>>>> try to
>>>>>> get the features from it.
>>>>>>
>>>>>> Clearly, the problem is that $segment is not defined (that is,  
>>>>>> nothing
>>>>>> is in it, not that the wrong thing is in it).  The next trick is  
>>>>>> to
>>>>> find
>>>>>> out why.  My first guess, without looking at the data set, is  
>>>>>> that the
>>>>>> arm is not really named '4'.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>>> Hi, everyone,
>>>>>>>
>>>>>>> I met this problem when I was running this script to extract  
>>>>>>> features
>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't  
>>>>>>> call
>>>>>>> method
>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>>> ==============================================================
>>>>>>> use Bio::DB::GFF;
>>>>>>> use Bio::Tools::GFF;
>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>>                                        -dsn =>
>>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>>                                        -user => 'XXXX',
>>>>>>>                                        -pass => 'XXXX') || die
>>>>> "database
>>>>>>> open failed";
>>>>>>>
>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>>> 25000);
>>>>>>> my @features = $segment->features(-types => ['gene', 'exon',  
>>>>>>> 'intron',
>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no  
>>>>>>> features";
>>>>>>> print(scalar(@features)."\n");
>>>>>>>
>>>>>>> ================================================================
>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I  
>>>>>>> loaded
>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>>>>> error.
>>>>>>> Other methods failed also.
>>>>>>>
>>>>>>> Any help will be deeply appreciated!
>>>>>>>
>>>>>>> Best,
>>>>>>> Jon
>>>>>>>
>>>>>> -- 
>>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>> Scott Cain, Ph. D.
>>>>> cain at cshl.edu
>>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>> 216-392-3087
>>>>>> Cold Spring Harbor Laboratory
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                        
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                      
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cjfields at uiuc.edu  Mon Jan 21 23:21:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 Jan 2008 22:21:27 -0600
Subject: [Bioperl-l] Problem: Can't call method "features" on an
	undefined value at BIO::DB::GFF.pl
In-Reply-To: <15004412.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
	<14982665.post@talk.nabble.com>
	
	<3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
	<15004412.post@talk.nabble.com>
Message-ID: <8B1956B2-1380-4E73-8F14-F79CA5435697@uiuc.edu>

I'm cc'ing this to the gbrowse list just in case Lincoln or Scott have  
an idea.  My guess is it's a bug in the fast loader.  Could you file  
this in bugzilla?

http://bugzilla.open-bio.org/

chris

On Jan 21, 2008, at 1:22 PM, Hang wrote:

>
> Hi, Chris:
>
> Following your suggestion, I added -create flag and the GFF3loader  
> started
> to work. Thanks alot!
> When I load dmel-all-5.4.gff into mysql with -fast, I had the  
> following
> error:
>   Data too long for column 'attribute_value' at c:/../../../mysql.pm  
> line
> 510
> If I don't use -fast, it is OK, except for the annoying slow speed.  
> Do you
> have any suggestion on this?
>
> Best,
> Hang
>
>
>
>
> Chris Fields wrote:
>>
>> It's bp_seqfeature_load.pl (if you have the full bioperl core
>> distribution, it's in script/Bio-SeqFeature/Store).  I had some
>> problems with the fast-loading option but it was likely just my gff
>> formatting; example data loaded just fine.
>>
>> As for the error, you need to use the '-create' flag when  
>> initializing
>> a database (or wiping data from a current one):
>>
>> =============================================================
>> use Bio::DB::SeqFeature::Store;
>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>                                         -dsn     => 'dbi:mysql:test',
>>                                         -user    => 'root',
>>                                         -pass    => 'XXXXX',
>>                                         -write   =>  1
>>                                         -create  => 1);
>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>
>> $db,
>>                                                         -verbose  =>
>> 1);
>> $loader->load(./'dmel-all-r5.4.gff');
>> =============================================================
>>
>> chris
>>
>> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:
>>
>>> Jon,
>>>
>>> There is a script for loading a SeqFeature database just like the  
>>> GFF
>>> database, though I don't know what it's called off hand (I'm not  
>>> at my
>>> normal computer right now).  Be sure to read the documentation and  
>>> you
>>> will probably want to use the 'fast' option (I don't remember what
>>> it is
>>> called either).
>>>
>>> Scott
>>>
>>>
>>> ----------------------------------------------------------------------
>>> Scott Cain, Ph. D.				 	 cain at cshl.edu
>>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>>> ----------------------------------------------------------------------
>>>
>>>
>>> On Sun, 20 Jan 2008, Hang wrote:
>>>
>>>>
>>>> Hi, Scott,
>>>> I tried to change sequence-region line to "4   FlyBase
>>>> chromosome_arm  1
>>>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say
>>>> anything but
>>>> "died at line 12".
>>>>
>>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my
>>>> code to
>>>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>>>> =============================================================
>>>> use Bio::DB::SeqFeature::Store;
>>>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>>>                                        -dsn     =>  
>>>> 'dbi:mysql:test',
>>>>                                        -user    => 'root',
>>>>                                        -pass    => 'XXXXX',
>>>>                                        -write   =>  1 );
>>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store
>>>> => $db,
>>>>                                                        -verbose
>>>> => 1);
>>>> $loader->load(./'dmel-all-r5.4.gff');
>>>> =============================================================
>>>> I got bunch of errors like this:
>>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't
>>>> exist at
>>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line
>>>> 1316".
>>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or
>>>> die
>>>> $sth->errstr;
>>>> I checked the database test after failed loading. There is only one
>>>> table
>>>> created, which call 'meta'. I also tried 'grant all on test to
>>>> XXX at localhost' and used that -user and -pass to load gff, it didn't
>>>> work
>>>> either.
>>>>
>>>> Jon
>>>>
>>>>
>>>> Scott Cain-3 wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> Well, seeing the error message would be helpful, but my first  
>>>>> guess
>>>>> without is that there are a few things you can try:
>>>>>
>>>>> * removing the "sequence-region" line from the GFF file, adding a
>>>>> line
>>>>> like this:
>>>>>
>>>>> 4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>>>
>>>>> and then reloading the database.
>>>>>
>>>>> * Or, you may want to consider using Bio::DB::SeqFeature::Store,
>>>>> since
>>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3
>>>>> (that
>>>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>>>
>>>>> Scott
>>>>>
>>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>>>> Hi, Scott,
>>>>>>
>>>>>> After adding die $!, I know something is wrong at line:
>>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end  
>>>>>> =>
>>>>>> 25000);"
>>>>>>
>>>>>> my gff file is like this:
>>>>>> ##gff-version 3
>>>>>> ##sequence-region 4 1 1351857
>>>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>>>> ID=FBti0062890;Name=ninja-Dsim-
>>>>>> like
>>>>>> {}4829
>>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-
>>>>>> RepeatMasker;
>>>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>>>> ID=:5142029_dummy;Name=:5142029;Parent=:
>>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>>>> 5860 6210 +;
>>>>>> ...
>>>>>> ...
>>>>>> I really got confused. Any further suggestion? Thank you!
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Scott Cain-3 wrote:
>>>>>>>
>>>>>>> Hi Jon,
>>>>>>>
>>>>>>> I think it's funny that you have "or die" on the database
>>>>>>> opening line,
>>>>>>> "or die" on the @features line, but you didn't put one on the
>>>>>>> $segment
>>>>>>> line.  Try adding "or die: $!" to the $segment line to see  
>>>>>>> what it
>>>>>> says,
>>>>>>> also add a 'print $segment' after you create it and before you
>>>>>>> try to
>>>>>>> get the features from it.
>>>>>>>
>>>>>>> Clearly, the problem is that $segment is not defined (that is,
>>>>>>> nothing
>>>>>>> is in it, not that the wrong thing is in it).  The next trick is
>>>>>>> to
>>>>>> find
>>>>>>> out why.  My first guess, without looking at the data set, is
>>>>>>> that the
>>>>>>> arm is not really named '4'.
>>>>>>>
>>>>>>> Scott
>>>>>>>
>>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>>>> Hi, everyone,
>>>>>>>>
>>>>>>>> I met this problem when I was running this script to extract
>>>>>>>> features
>>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't
>>>>>>>> call
>>>>>>>> method
>>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>>>> ==============================================================
>>>>>>>> use Bio::DB::GFF;
>>>>>>>> use Bio::Tools::GFF;
>>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>>>                                       -dsn =>
>>>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>>>                                       -user => 'XXXX',
>>>>>>>>                                       -pass => 'XXXX') || die
>>>>>> "database
>>>>>>>> open failed";
>>>>>>>>
>>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, - 
>>>>>>>> end =>
>>>>>> 25000);
>>>>>>>> my @features = $segment->features(-types => ['gene', 'exon',
>>>>>>>> 'intron',
>>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no
>>>>>>>> features";
>>>>>>>> print(scalar(@features)."\n");
>>>>>>>>
>>>>>>>> = 
>>>>>>>> ===============================================================
>>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I
>>>>>>>> loaded
>>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without  
>>>>>>>> any
>>>>>> error.
>>>>>>>> Other methods failed also.
>>>>>>>>
>>>>>>>> Any help will be deeply appreciated!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jon
>>>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>> Scott Cain, Ph. D.
>>>>>> cain at cshl.edu
>>>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>>> 216-392-3087
>>>>>>> Cold Spring Harbor Laboratory
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>> -- 
>>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.
>>>>> cain at cshl.edu
>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>> 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From jason at bioperl.org  Wed Jan 23 03:14:06 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 23 Jan 2008 00:14:06 -0800
Subject: [Bioperl-l] [Bioperl-guts-l] [14455]
	bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm: fixed up the
	gene glyph so that it works properly with CDS-only genes
In-Reply-To: <200801222048.m0MKmhiI007977@dev.open-bio.org>
References: <200801222048.m0MKmhiI007977@dev.open-bio.org>
Message-ID: <91659EDD-B102-47C8-BF93-92576C2CF324@bioperl.org>

Lincoln -- Thank you, Thank you for this fix!  This takes care of  
inconsistency problems I was having with GFF3 and GFF2 data.  It  
works so much more beautifully now!

-jason
On Jan 22, 2008, at 12:48 PM, Lincoln Stein wrote:

> Revision: 14455
> Author:   lstein
> Date:     2008-01-22 15:48:42 -0500 (Tue, 22 Jan 2008)
>
> Log Message:
> -----------
> fixed up the gene glyph so that it works properly with CDS-only genes
>
> Modified Paths:
> --------------
>     bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm
>
> Modified: bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm
> ===================================================================
> --- bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm	2008-01-22  
> 00:16:02 UTC (rev 14454)
> +++ bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm	2008-01-22  
> 20:48:42 UTC (rev 14455)
> @@ -44,7 +44,9 @@
>
>  sub bump {
>    my $self = shift;
> -  return 1 if $self->{level} == 0; # top level bumps, other levels  
> don't unless specified in config
> +  return 1
> +    if $self->{level} == 0
> +      && lc $self->feature->primary_tag eq 'gene'; # top level  
> bumps, other levels don't unless specified in config
>    return $self->SUPER::bump;
>  }
>
> @@ -92,12 +94,16 @@
>  sub _subfeat {
>    my $class   = shift;
>    my $feature = shift;
> -  if ($feature->primary_tag eq 'gene') {
> +  if (lc $feature->primary_tag eq 'gene') {
>      my @transcripts;
>      for my $t (qw/mRNA tRNA snRNA snoRNA miRNA ncRNA pseudogene/) {
>        push @transcripts, $feature->get_SeqFeatures($t);
>      }
>      return @transcripts;
> +  } elsif (lc $feature->primary_tag eq 'cds') {
> +    my @parts = $feature->get_SeqFeatures();
> +    return ($feature) if $class->{level} == 0 and !@parts;
> +    return @parts;
>    }
>
>    my @subparts;
>
>
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l



From ste.ghi at libero.it  Thu Jan 24 08:42:49 2008
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Thu, 24 Jan 2008 14:42:49 +0100
Subject: [Bioperl-l] parsing ACE file
Message-ID: 

Dear All,
    dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job?

Any suggestion about how to start is welcome...
Cheers

Stefano




From pmiguel at purdue.edu  Thu Jan 24 14:06:35 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Thu, 24 Jan 2008 14:06:35 -0500
Subject: [Bioperl-l] parsing ACE file
In-Reply-To: 
References: 
Message-ID: <4798E1BB.2020809@purdue.edu>

Stefano Ghignone wrote:
> Dear All,
>     dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job?
>
> Any suggestion about how to start is welcome...
> Cheers
>
> Stefano
>
>   
 perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace

will give you a list of each the contigs followed by the reads in each 
contig, if "acefile.ace" is a phrap ace file.

There is a bioperl module for handling phrap ace file, but I'm not sure 
what its current status is. Last time I looked (probably a couple of 
years ago) it seemed to have been abandoned half-finished.

-- 
Phillip


From golharam at umdnj.edu  Thu Jan 24 14:36:29 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 24 Jan 2008 14:36:29 -0500
Subject: [Bioperl-l] Wiki inconsistency?
Message-ID: <4798E8BD.7030107@umdnj.edu>

Hi,

I haven't used Bioperl in a while but recently started using it.  I was 
using 1.4.0 but see on the website that 1.5.2 has been released.   If I 
click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), 
I see a two versions:

bioperl-1.5.2_102

and

bioperl-1.5.2_100

However, If I click on the Downloads link on the left toolbar, then 
scroll down, I see 1.5.2 Developer Release.  The tar file here points to 
  current_core_unstable.tar.gz.

Is this supposed to be this way?  It seems a bit confusing.  I think it 
might be appropriate to put all the download links in one 
location...just my two cents...

Ryan



From cjfields at uiuc.edu  Thu Jan 24 15:58:25 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 24 Jan 2008 14:58:25 -0600
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <4798E8BD.7030107@umdnj.edu>
References: <4798E8BD.7030107@umdnj.edu>
Message-ID: 

Maybe Sendu can answer more specifically, but I believe the extra  
designation referred to the release candidate (of which bioperl-core  
was the only one with '102').  You definitely want the core package.   
The other ones with '100' are other bioperl-related distributions  
which require the core package but have additional functionality  
(BioSQL-related functions, wrapper modules, etc.).

chris

On Jan 24, 2008, at 1:36 PM, Ryan Golhar wrote:

> Hi,
>
> I haven't used Bioperl in a while but recently started using it.  I  
> was using 1.4.0 but see on the website that 1.5.2 has been  
> released.   If I click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2 
> ), I see a two versions:
>
> bioperl-1.5.2_102
>
> and
>
> bioperl-1.5.2_100
>
> However, If I click on the Downloads link on the left toolbar, then  
> scroll down, I see 1.5.2 Developer Release.  The tar file here  
> points to  current_core_unstable.tar.gz.
>
> Is this supposed to be this way?  It seems a bit confusing.  I think  
> it might be appropriate to put all the download links in one  
> location...just my two cents...
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From florent.angly at gmail.com  Thu Jan 24 17:06:29 2008
From: florent.angly at gmail.com (Florent Angly)
Date: Thu, 24 Jan 2008 14:06:29 -0800
Subject: [Bioperl-l] parsing ACE file
In-Reply-To: <4798E1BB.2020809@purdue.edu>
References: 
	<4798E1BB.2020809@purdue.edu>
Message-ID: <47990BE5.2010005@gmail.com>

That would be the module Bio::Assembly::IO::ace
It works fine as far as I know.
To parse an assembly, use Bio::Assembly::IO: 
http://doc.bioperl.org/bioperl-live/Bio/Assembly/IO.html
Regards,
Florent

Phillip San Miguel wrote:
> Stefano Ghignone wrote:
>> Dear All,
>>     dealing with an assembly .ace file and a list of contigs (from 
>> that assembly), how can I extract from the .ace file the read names 
>> forming each listed contig? Is there any module doing this job?
>>
>> Any suggestion about how to start is welcome...
>> Cheers
>>
>> Stefano
>>
>>   
> perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace
>
> will give you a list of each the contigs followed by the reads in each 
> contig, if "acefile.ace" is a phrap ace file.
>
> There is a bioperl module for handling phrap ace file, but I'm not 
> sure what its current status is. Last time I looked (probably a couple 
> of years ago) it seemed to have been abandoned half-finished.
>



From golharam at umdnj.edu  Thu Jan 24 16:17:14 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 24 Jan 2008 16:17:14 -0500
Subject: [Bioperl-l] GenBank updated sequence not being retrieved
Message-ID: <4799005A.5030204@umdnj.edu>

I'm using Bioperl 1.4 (and tried with 1.5.1).

I'm trying to download GenBank sequence for which I have accession #'s. 
  One of the sequences has been replaced with a newer version.  I'm 
using get_Seq_by_acc, which returns the warning:

-------------------- WARNING ---------------------
MSG: acc (gb|XM_087386) does not exist
---------------------------------------------------

If I check NCBI's website for the sequence, it has indeed been replaced 
by an NM_ sequence.  How can I get BioPerl to retrieve the latest 
version of a sequence?



From johan.nilsson at sh.se  Thu Jan 24 17:33:42 2008
From: johan.nilsson at sh.se (Johan Nilsson)
Date: Thu, 24 Jan 2008 23:33:42 +0100
Subject: [Bioperl-l] Quickest Codon Based MSA?
Message-ID: <47991246.6010106@sh.se>

Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign', 
but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage 
of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson


From e-just at northwestern.edu  Thu Jan 24 18:07:57 2008
From: e-just at northwestern.edu (Eric Just)
Date: Thu, 24 Jan 2008 17:07:57 -0600
Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago
Message-ID: 

Hello everyone,

We have an opening at dictyBase (Northwestern University in Chicago) for a
Bioinformatics Software Engineer.  This job involves writing and maintaining
software for a genome database using Chado/OO-Perl/ Bioperl and many other
state-of-the-art technologies.

For more information please see:
http://dictybase.org/dictybase_jobs.htm

Thanks,
Eric


From bix at sendu.me.uk  Thu Jan 24 18:16:14 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 24 Jan 2008 23:16:14 +0000
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <4798E8BD.7030107@umdnj.edu>
References: <4798E8BD.7030107@umdnj.edu>
Message-ID: <47991C3E.2010908@sendu.me.uk>

Ryan Golhar wrote:
> Hi,
> 
> I haven't used Bioperl in a while but recently started using it.  I was 
> using 1.4.0 but see on the website that 1.5.2 has been released.   If I 
> click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), 
> I see a two versions:
> 
> bioperl-1.5.2_102
> 
> and
> 
> bioperl-1.5.2_100

Where do you see this older version? I did a search on the page and that 
term isn't found. _100 was the first version of 1.5.2 core to go out. 
There were then 2 minor revisions released, as detailed in the 'Updates' 
section of the page.


> However, If I click on the Downloads link on the left toolbar, then 
> scroll down, I see 1.5.2 Developer Release.  The tar file here points to 
> current_core_unstable.tar.gz.

Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest 
version happens to be. So that people don't need to worry about the 
actual version, they can just have one static bookmark.


> Is this supposed to be this way?  It seems a bit confusing.  I think it 
> might be appropriate to put all the download links in one 
> location...just my two cents...

Well the primary page where all the links are found is the Downloads 
page. The Release_1.5.2 page is specific to 1.5.2 and will remain for 
historic reasons (so at some point there will be 1.5.3 or something and 
the appropriate links on the main Downloads page will be updated to 
that, but if someone specifically wants 1.5.2 they can still find the 
1.5.2 downloads on its own dedicated page).


From jason at bioperl.org  Thu Jan 24 21:17:02 2008
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 24 Jan 2008 18:17:02 -0800
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
References: <47991246.6010106@sh.se>
Message-ID: 

I don't know if it is faster or slower than what you have tried but  
the aa_to_dna_aln translates a protein alignment back to CDS.  You  
can see example code of it in use in the pairwise_kaks script in  
scripts/utilities/pairwise_kaks.PLS

-jason
On Jan 24, 2008, at 2:33 PM, Johan Nilsson wrote:

> Hello,
>
> I have a question which might not necessarily be related to  
> Bioperl, although I do believe the expertise is available here. I  
> have a couple of thousand FASTA files, each containing 20 CDS  
> sequence orthologues of rather high sequence similarity. I would  
> like to create a codon-based multiple sequence alignment for each  
> of these FASTA files (i.e. a nucleotide sequence alignment inferred  
> from alignment of the translated peptide sequences, to assure that  
> no frame shifts will occur). I first tried running Dialign2, which  
> can perform the translation/back-translation in one go, but this  
> turned out to be far too slow. I next tried to build protein  
> alignments using ClustalW and subsequently built the coding region  
> alignment using EMBOSS 'tranalign', but this also was too slow.
>
> Is there any method available which significantly speeds up the  
> codon-preserving alignment??? As I mentioned, the sequences to be  
> aligned are in general very conserved, so any heuristic taking  
> advantage of the low divergence would be very helpful! Also, is  
> there any adjustable parameter in dialign2/dialign-T that might  
> speed up the program when looking at highly similar sequences?
>
> Best regards
> /Johan Nilsson
> _______________________________________________
> Bioperl-l mailing list



From tristan.lefebure at gmail.com  Thu Jan 24 22:07:52 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Thu, 24 Jan 2008 22:07:52 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree, and how to combine trees
Message-ID: <200801242207.52991.tristan.lefebure@gmail.com>

Hi,

I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would 
like to merge several "one leaf taxonomic trees" into a taxonomic tree with 
several leafs. For example:

#####BEGINNING#####
#! /usr/bin/perl

use strict;
use warnings;
use Bio::DB::Taxonomy;
use Bio::TreeIO;

# The taxonomic database
# You might want to switch to a different flatfile or to Entrez 
my $dbh = new Bio::DB::Taxonomy(-source   => 'flatfile',
                                  -directory=> '/tmp',  
                                  -nodesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/nodes.dmp', 
                                  -namesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/names.dmp');

# Fetch 4 taxa for the example
my $tax_decapoda =  $dbh->get_taxon(-name => 'Decapoda');
my $tax_heteroptera =  $dbh->get_taxon(-name => 'Heteroptera');
my $tax_coleoptera =  $dbh->get_taxon(-name => 'Coleoptera');
my $tax_copepoda =  $dbh->get_taxon(-name => 'Copepoda');

# Transform to tree objects
my $decapoda_tree = new Bio::Tree::Tree(-node => $tax_decapoda);
my $heteroptera_tree = new Bio::Tree::Tree(-node => $tax_heteroptera);
my $coleoptera_tree = new Bio::Tree::Tree(-node => $tax_coleoptera);
my $copepoda_tree = new Bio::Tree::Tree(-node => $tax_copepoda);

# Reduce the number of nodes to the following ranks
my @ranks = qw(kingdom phylum subphylum superclass class subclass superorder 
order family);

$decapoda_tree->splice(-keep_rank => \@ranks);
$heteroptera_tree->splice(-keep_rank => \@ranks);
$coleoptera_tree->splice(-keep_rank => \@ranks);
$copepoda_tree->splice(-keep_rank => \@ranks);

# Print the trees
my $out = new Bio::TreeIO('-format' => 'newick',
                                   '-file'   => ">four.tree");
$out->write_tree($decapoda_tree);
$out->write_tree($heteroptera_tree);
$out->write_tree($coleoptera_tree);
$out->write_tree($copepoda_tree);

#####END#######

This gives the following "trees":
(((((7524)33340)50557)6960)6656)33208;
(((((7041)33340)50557)6960)6656)33208;
((((((6683)6682)72041)6681)6657)6656)33208;
((((6830)72037)6657)6656)33208;

They are really special trees, as they contain only one leaf. I would like to 
combine them and remove the 'unused' nodes to obtain something like that:

((7524,7041)33340,(6683,6830)6657)6656;

or even better:

((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda;

Any suggestions?

Thanks!

-Tristan



From anjan.purkayastha at gmail.com  Thu Jan 24 18:32:20 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Thu, 24 Jan 2008 18:32:20 -0500
Subject: [Bioperl-l] Question from a bioperl newbie
Message-ID: 

hi,
i recently installed bioperl on my mac-machine.
tried to use it in a simple script with a "use Bio::Perl" command. however,
i get an error message "Can't locate Bio/Perl.pm in @INC".
the BioPerl folder is in my desktop. so i tried use: use lib
"/Users/anjan/Desktop/bioperl-1.5.2_102/Bio";
This time it returned me another error: Undefined subroutine
&main::get_sequence.

so, when BioPerl is installed, which directory does it reside in.( it's not
present in the .cpan/build directory.)

appreciate your prompt reply.

anjan

-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From bosborne11 at verizon.net  Thu Jan 24 23:04:50 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 24 Jan 2008 23:04:50 -0500
Subject: [Bioperl-l] Question from a bioperl newbie
In-Reply-To: 
References: 
Message-ID: <3B13E81A-66E1-418A-8915-9E877C2B751D@verizon.net>

Anjan,

use lib "/Users/anjan/Desktop/bioperl-1.5.2_102/";

Brian O.


On Jan 24, 2008, at 6:32 PM, ANJAN PURKAYASTHA wrote:

> hi,
> i recently installed bioperl on my mac-machine.
> tried to use it in a simple script with a "use Bio::Perl" command.  
> however,
> i get an error message "Can't locate Bio/Perl.pm in @INC".
> the BioPerl folder is in my desktop. so i tried use: use lib
> "/Users/anjan/Desktop/bioperl-1.5.2_102/Bio";
> This time it returned me another error: Undefined subroutine
> &main::get_sequence.
>
> so, when BioPerl is installed, which directory does it reside in. 
> ( it's not
> present in the .cpan/build directory.)
>
> appreciate your prompt reply.
>
> anjan
>
> -- 
> ANJAN PURKAYASTHA, PhD.
> Senior Computational Biologist
> ==========================
>
> 1101 King Street, Suite 310,
> Alexandria, VA 22314.
> 703.518.8040 (office)
> 703.740.6939 (mobile)
>
> email:
> anjan at vbi.vt.edu;
> anjan.purkayastha at gmail.com
>
> http://www.vbi.vt.edu
>
> ==========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From n.haigh at sheffield.ac.uk  Fri Jan 25 02:32:10 2008
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Fri, 25 Jan 2008 07:32:10 +0000
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <47991C3E.2010908@sendu.me.uk>
References: <4798E8BD.7030107@umdnj.edu> <47991C3E.2010908@sendu.me.uk>
Message-ID: <4799907A.9060301@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sendu,

Have you thought about using a template for the latest stable release and the latest developer release? That way, any article/link that always needs
to point to the latest version simply has to include the correct template? So once a new release is made, you simply update the one template, and
changes automatically propagate through the wiki - might save some wiki admin each time there's a new release. You could get more intricate, and use a
template to show the latest version of any particular release series so you could do something like:

{{latest release|series=1.5.x|full=y}}
and
{{latest release|series=1.4.x|full=y}}

or even:

{{latest release|series=stable|full=y}}
and
{{latest release|series=dev|full=y}}

these templates could return 1.5.2_102 if the "full" param is set to something or simply 1.5.2 if the "full" param is missing.

Just a thought.
Nath


Sendu Bala wrote:
> Ryan Golhar wrote:
>> Hi,
>>
>> I haven't used Bioperl in a while but recently started using it.  I
>> was using 1.4.0 but see on the website that 1.5.2 has been released.  
>> If I click on the link for 1.5.2
>> (http://www.bioperl.org/wiki/Release_1.5.2), I see a two versions:
>>
>> bioperl-1.5.2_102
>>
>> and
>>
>> bioperl-1.5.2_100
> 
> Where do you see this older version? I did a search on the page and that
> term isn't found. _100 was the first version of 1.5.2 core to go out.
> There were then 2 minor revisions released, as detailed in the 'Updates'
> section of the page.
> 
> 
>> However, If I click on the Downloads link on the left toolbar, then
>> scroll down, I see 1.5.2 Developer Release.  The tar file here points
>> to current_core_unstable.tar.gz.
> 
> Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest
> version happens to be. So that people don't need to worry about the
> actual version, they can just have one static bookmark.
> 
> 
>> Is this supposed to be this way?  It seems a bit confusing.  I think
>> it might be appropriate to put all the download links in one
>> location...just my two cents...
> 
> Well the primary page where all the links are found is the Downloads
> page. The Release_1.5.2 page is specific to 1.5.2 and will remain for
> historic reasons (so at some point there will be 1.5.3 or something and
> the appropriate links on the main Downloads page will be updated to
> that, but if someone specifically wants 1.5.2 they can still find the
> 1.5.2 downloads on its own dedicated page).
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHmZB69gTv6QYzVL4RAnRpAJwOyWjZXzD0UJBNFNP8H1Hrn4c66ACfRyzA
NsJEZydsG+aMzNltrBw+Nx4=
=kHt0
-----END PGP SIGNATURE-----


From derek.fairley at belfasttrust.hscni.net  Fri Jan 25 03:31:28 2008
From: derek.fairley at belfasttrust.hscni.net (Fairley, Derek)
Date: Fri, 25 Jan 2008 08:31:28 -0000
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
Message-ID: 

Johan,

There is currently no Bioperl-run wrapper for this program, but you
might want to have a look at Codon Align 2.0 as well:
http://homepage.mac.com/barryghall/CodonAlign.html

Derek

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Johan Nilsson
Sent: 24 January 2008 22:34
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Quickest Codon Based MSA?

Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign',

but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage

of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From ewijaya at gmail.com  Fri Jan 25 04:26:05 2008
From: ewijaya at gmail.com (Edward Wijaya)
Date: Fri, 25 Jan 2008 17:26:05 +0800
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
Message-ID: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>

Dear Experts,

Suppose I have the following list of gene names and Ensemble Ids.

RBL1	ENSG00000080839
RB1	ENSG00000139687
CDC2	ENSG00000170312
CDC25A	ENSG00000164045
CCNA2	ENSG00000145386
E2F3	ENSG00000112242
E2F2	ENSG00000007968
CDK2	ENSG00000123374
...etc...

Is there a way to extract the gene sequence from those list?
And then output them in FASTA format.

- Edward


From bix at sendu.me.uk  Fri Jan 25 05:55:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 25 Jan 2008 10:55:50 +0000
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
References: <47991246.6010106@sh.se>
Message-ID: <4799C036.5060404@sendu.me.uk>

Johan Nilsson wrote:
> Hello,
> 
> I have a question which might not necessarily be related to Bioperl, 
> although I do believe the expertise is available here. I have a couple 
> of thousand FASTA files, each containing 20 CDS sequence orthologues of 
> rather high sequence similarity. I would like to create a codon-based 
> multiple sequence alignment for each of these FASTA files (i.e. a 
> nucleotide sequence alignment inferred from alignment of the translated 
> peptide sequences, to assure that no frame shifts will occur). I first 
> tried running Dialign2, which can perform the 
> translation/back-translation in one go, but this turned out to be far 
> too slow. I next tried to build protein alignments using ClustalW and 
> subsequently built the coding region alignment using EMBOSS 'tranalign', 
> but this also was too slow.
> 
> Is there any method available which significantly speeds up the 
> codon-preserving alignment??? As I mentioned, the sequences to be 
> aligned are in general very conserved, so any heuristic taking advantage 
> of the low divergence would be very helpful! Also, is there any 
> adjustable parameter in dialign2/dialign-T that might speed up the 
> program when looking at highly similar sequences?

Do you know which is the slow part? For example, when using ClustalW, 
are the alignments slower than the creating the codon alignment from the 
protein?

If ClustalW is the problem, you can try using other alignment programs 
famous for their speed, such as Muscle. If it's the protein->codon bit 
that's slow, try using other programs to do that, like Pal2Nal or the 
BioPerl method.


From David.Messina at sbc.su.se  Fri Jan 25 06:35:16 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 25 Jan 2008 12:35:16 +0100
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
Message-ID: <628aabb70801250335l2a2754efn3e73e44a9dae6a35@mail.gmail.com>

Hi Edward,

I don't think there's a direct BioPerl interface to Ensembl, but BioMart at
Ensembl itself will get you sequences (and lots of other things if you want)
given a list of Ensembl IDs.

http://www.ensembl.org/biomart/martview

Note that as of this writing, the Ensembl BioMart server appears to be down
temporarily.

If you want to be able to get Ensembl sequences from a program, there's the
Ensembl API:

http://www.ensembl.org/info/using/api/core/core_tutorial.html



Dave


From bix at sendu.me.uk  Fri Jan 25 06:07:42 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 25 Jan 2008 11:07:42 +0000
Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree,
 and how to combine trees
In-Reply-To: <200801242207.52991.tristan.lefebure@gmail.com>
References: <200801242207.52991.tristan.lefebure@gmail.com>
Message-ID: <4799C2FE.8080700@sendu.me.uk>

Tristan Lefebure wrote:
> Hi,
> 
> I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would 
> like to merge several "one leaf taxonomic trees" into a taxonomic tree with 
> several leafs.
[...]
> or even better:
> 
> ((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda;

The BioPerl script taxonomy2tree.pl generates:

(((Decapoda,Copepoda)Crustacea,(Heteroptera,Coleoptera)Neoptera)Pancrustacea)"cellular 
organisms";

I think you can modify it similar to your own script to only output the 
classes you're interested in.



http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/taxa/taxonomy2tree.PLS


From bosborne11 at verizon.net  Fri Jan 25 08:53:36 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 25 Jan 2008 08:53:36 -0500
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
Message-ID: <9CE20DF3-ED5F-4432-A191-4123896E5815@verizon.net>

Edward,

Various approaches are discussed here:

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

Since you have ENSEMBL ids I'd think that would be the way to go.


Brian O.

On Jan 25, 2008, at 4:26 AM, Edward Wijaya wrote:

> Dear Experts,
>
> Suppose I have the following list of gene names and Ensemble Ids.
>
> RBL1	ENSG00000080839
> RB1	ENSG00000139687
> CDC2	ENSG00000170312
> CDC25A	ENSG00000164045
> CCNA2	ENSG00000145386
> E2F3	ENSG00000112242
> E2F2	ENSG00000007968
> CDK2	ENSG00000123374
> ...etc...
>
> Is there a way to extract the gene sequence from those list?
> And then output them in FASTA format.
>
> - Edward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From snoze.pa at gmail.com  Fri Jan 25 18:30:56 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Fri, 25 Jan 2008 17:30:56 -0600
Subject: [Bioperl-l] bioperl DB error
Message-ID: <10f848910801251530j6eacfcb0x81780ae312cf19c5@mail.gmail.com>

Dear Users,
 I am using bioperl/iosql and trying to install ncbi taxonomy. But I am
getting following error message.
any help? thanks in advance

perl load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser
root
Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568.


From snoze.pa at gmail.com  Fri Jan 25 18:49:28 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Fri, 25 Jan 2008 17:49:28 -0600
Subject: [Bioperl-l] bioseqDB error
Message-ID: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>

Hi Anyone know why i am getting this error message!!

Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568


From wkath83 at vbi.vt.edu  Thu Jan 24 13:19:06 2008
From: wkath83 at vbi.vt.edu (Katherine Wendelsdorf)
Date: Thu, 24 Jan 2008 13:19:06 -0500 (EST)
Subject: [Bioperl-l] bioperl on mac
Message-ID: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>

Dear one who knows,

I have a macbook with Leopard OSX and I am having trouble running scripts
that call for bioperl modules.

Here is my history: Using Fink I installed bioperl-pm586 version 1.5.2-4
and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl-pm586 in
to the command line I get nothing. Spotlight says that the path is
/sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.

1. I tried to run test2.pl script that was literally copied and pasted
from the HOWTO manual, but it wouldnt run. The two attached docs are the
script I tried to run and the output (which is nonexistant). I read
something that said to "go in to" Bioperl to execute a command. I could
not enter the bioperl directory when it was in the sw/shared directory so
I copied the bioperl folder to the Desktop just so I could try executing
the script inside bioperl. Where am I going wrong here?

Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
somewhere else on my computer? Shoudl they be in the same directory as
perl (usr/bin/perl)?

2. How do I know what modules are included in the bioperl-pm586 I
downloaded? Specifically I want to use Bio::SeqIO.

3. What is the best way to download/install new modules as I need them?


Any answers you coudl give me for any of these questions would be greatly
appreciated!

Thank you so much, kind volunteer!
-Kate
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL: 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test2.pl
URL: 

From bosborne11 at verizon.net  Sat Jan 26 11:14:13 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Sat, 26 Jan 2008 11:14:13 -0500
Subject: [Bioperl-l] bioperl on mac
In-Reply-To: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
Message-ID: 

Katherine,

Perl keeps the addresses of all the module directories in its @INC  
array. What do you see when you do:

perl -e 'print @INC'

?

If '/sw/share/bioperl-pm586' is not in @INC then you need to put it  
there, perhaps by adding something like:

setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586

to the .tcshrc file in your home directory (if you use tcsh that is,  
most use bash, .bashrc, and 'set' these days).

You asked some other questions, the general answer is that all the  
modules you'll need are in the 2 packages you've installed, and you  
don't need to move them from /sw.


Brian O.


On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote:

> Dear one who knows,
>
> I have a macbook with Leopard OSX and I am having trouble running  
> scripts
> that call for bioperl modules.
>
> Here is my history: Using Fink I installed bioperl-pm586 version  
> 1.5.2-4
> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- 
> pm586 in
> to the command line I get nothing. Spotlight says that the path is
> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.
>
> 1. I tried to run test2.pl script that was literally copied and pasted
> from the HOWTO manual, but it wouldnt run. The two attached docs are  
> the
> script I tried to run and the output (which is nonexistant). I read
> something that said to "go in to" Bioperl to execute a command. I  
> could
> not enter the bioperl directory when it was in the sw/shared  
> directory so
> I copied the bioperl folder to the Desktop just so I could try  
> executing
> the script inside bioperl. Where am I going wrong here?
>
> Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
> somewhere else on my computer? Shoudl they be in the same directory as
> perl (usr/bin/perl)?
>
> 2. How do I know what modules are included in the bioperl-pm586 I
> downloaded? Specifically I want to use Bio::SeqIO.
>
> 3. What is the best way to download/install new modules as I need  
> them?
>
>
> Any answers you coudl give me for any of these questions would be  
> greatly
> appreciated!
>
> Thank you so much, kind volunteer!
> - 
> Kate 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason at bioperl.org  Sat Jan 26 15:30:11 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 26 Jan 2008 12:30:11 -0800
Subject: [Bioperl-l] bioperl on mac
In-Reply-To: 
References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
	
Message-ID: 

Usually this is done by fink by adding a line to your .tcshrc (if you  
are running that shell) or .bash_profile or .bashrc.

On my machine I have this at the top of my .bash_profile file:
test -r /sw/bin/init.sh && . /sw/bin/init.sh

if that is not there you need to add it to insure that all the fink  
tools are setup properly.

On Jan 26, 2008, at 8:14 AM, Brian Osborne wrote:

> Katherine,
>
> Perl keeps the addresses of all the module directories in its @INC  
> array. What do you see when you do:
>
> perl -e 'print @INC'
>
> ?
>
> If '/sw/share/bioperl-pm586' is not in @INC then you need to put it  
> there, perhaps by adding something like:
>
> setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586
>
> to the .tcshrc file in your home directory (if you use tcsh that  
> is, most use bash, .bashrc, and 'set' these days).
>
> You asked some other questions, the general answer is that all the  
> modules you'll need are in the 2 packages you've installed, and you  
> don't need to move them from /sw.
>
>
> Brian O.
>
>
> On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote:
>
>> Dear one who knows,
>>
>> I have a macbook with Leopard OSX and I am having trouble running  
>> scripts
>> that call for bioperl modules.
>>
>> Here is my history: Using Fink I installed bioperl-pm586 version  
>> 1.5.2-4
>> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- 
>> pm586 in
>> to the command line I get nothing. Spotlight says that the path is
>> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.
>>
>> 1. I tried to run test2.pl script that was literally copied and  
>> pasted
>> from the HOWTO manual, but it wouldnt run. The two attached docs  
>> are the
>> script I tried to run and the output (which is nonexistant). I read
>> something that said to "go in to" Bioperl to execute a command. I  
>> could
>> not enter the bioperl directory when it was in the sw/shared  
>> directory so
>> I copied the bioperl folder to the Desktop just so I could try  
>> executing
>> the script inside bioperl. Where am I going wrong here?
>>
>> Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
>> somewhere else on my computer? Shoudl they be in the same  
>> directory as
>> perl (usr/bin/perl)?
>>
>> 2. How do I know what modules are included in the bioperl-pm586 I
>> downloaded? Specifically I want to use Bio::SeqIO.
>>
>> 3. What is the best way to download/install new modules as I need  
>> them?
>>
>>
>> Any answers you coudl give me for any of these questions would be  
>> greatly
>> appreciated!
>>
>> Thank you so much, kind volunteer!
>> - 
>> Kate_____________________________________________ 
>> __
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason at bioperl.org  Sat Jan 26 19:14:45 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 26 Jan 2008 16:14:45 -0800
Subject: [Bioperl-l] a question on "move_id_to_bootstrap" usage
In-Reply-To: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com>
References: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com>
Message-ID: <8273f6c20801261614p312886d5x562593aa0cde60da@mail.gmail.com>

I'm not sure why you still have the __DATA__ block if you are reading data
in from a file or are you trying to send an example of the code but forgot
to specify a different input point?

If you are reading from a file that looks like the tree in the __DATA__
block you notice that the bootstrap info is encoded as the branch_length,
NOT the id - the move_id_to_bootstrap only moves the ID to the BOOTSTRAP.
you'll have to write a custom routine or just run a simple loop on your tree
to move the data to the bootstrap - it would look just the
move_id_to_bootstrap except you'd use branch_length instead of id to get the
data that you want to set in the bootstrap.  I leave it as an exercise for
the reader, but if you can't figure it out let us know.


In the future please ask your questions on the mailing list as I don't have
much time to answer questions individually when someone else can help.

-jason

On Jan 23, 2008 1:57 PM, Anand  wrote:

> HI Jason,
>
> Thanks a lot. I followed your suggestion and updated both the modules.
>
> I followed the code example on http://www.bioperl.org/wiki/HOWTO:Trees and
> tried to extract bootstrap values for my tree (which is output after
> seqboot, protdist, fitch and consense)
>
> When I try running my script, I am not able to print the bootstrap
> values...and it doesn't throw any error messages. Am I missing something?
>
> ====START of Code====
> #!/usr/bin/perl -w
> use strict;
> use lib "/home/anand/myperlmodules/lib/perl5/";
> use Bio::TreeIO;
> # $usage: $0 
>
> my $infile = shift;
>
> my $treeio = Bio::TreeIO->new(-format => 'newick',
>                          -file => $infile,
>                          -internal_node_id => 'bootstrap',
>                          );
>
> while( my $tree = $treeio->next_tree ) {
>    for my $node ( $tree->get_nodes ) {
>        printf "id: %s bootstrap: %s\n", $node->id || '', $node->bootstrap
> || '', "\n";
>    }
> }
> __END__
> ((5815_1:100.0,(((5815_5:100.0,5815_7:100.0):100.0,5815_6:100.0):97.0
> ,5815_8:100.0):
> 98.0,5815_4:100.0,5815_2:100.0):100.0,5815_3:100.0);
> ====END of Code====
>
> Thanks in advance for your time and help,
>
> Anand
>
> PS: Just to preserve formatting, I have attached the consense_output_file
>
> On Jan 22, 2008 8:02 AM, Jason Stajich  wrote:
>
> > I suspect you may want to update everything in Bio/TreeIO and Bio/
> > Tree to be safe, I'm not exactly sure what was changed - you can look
> > at the commit logs to see what else changed at the time - http://
> > code.open-bio.org/.   You can also use that same server to grab a
> > fresh checkout of what is the current state of the code base.
> >
> > -jason
> > On Jan 22, 2008, at 12:59 AM, Anand wrote:
> >
> > > Hi Jason
> > >
> > > I have a question on the method "move_id_to_bootstrap". From this
> > > post:
> > > http://portal.open-bio.org/pipermail/bioperl-guts-l/2007-May/
> > > 025718.html
> > >
> > > it looks like it has been added very recently. As luck would have
> > > it, the
> > > TreeFunctionsI.pm in my bioperl installation is missing that method.
> > >
> > > My question: What is the best method to update TreeFunctionsI.pm so
> > > that it
> > > can have the "move_id_to_bootstrap" method? Does it have other update
> > > dependencies.
> > >
> > > Thanks in advance for your help and time,
> > >
> > > Anand
> >
> >
>



-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From hlapp at duke.edu  Mon Jan 28 00:27:34 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 28 Jan 2008 00:27:34 -0500
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
References: <4795292E.4030401@sdsc.edu>
Message-ID: 

Some folks may remember that CIPRES (http://www.phylo.org) released  
their portal with access to remote execution of several phylogenetic  
tree reconstruction programs in spring last year.

It took a while but they have now also built a really nice REST-based  
API that makes the service fully programmable instead of screen- 
scraping 5 pages:

http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)

It should be relatively straightforward to build the equivalent of  
RemoteBlast on top of this. Would anyone be keen to take this on?

	-hilmar

P.S. Sorry for the cross-posting - I thought this is relevant to both  
communities. When responding in a project-specific way please make  
sure you remove the list that is no longer pertinent.


Begin forwarded message:

> From: Lucie Chan 
> Date: January 21, 2008 6:22:22 PM EST
> To: Hilmar Lapp 
> Cc: Mark Miller , Rutger Vos ,  
> Terri Liebowitz , Paul Hoover ,  
> mtholder at ku.edu
> Subject: Re: REST APIs for Cipres Web Portal
> Reply-To: lcchan at sdsc.edu
>
> Hilmar, et al.,
>
> I just released the first version of our REST Web Services API for  
> job submission, and job status query, and
> job result file retrieval. I'd like to get some feedbacks (issues,  
> problems, improvements, suggestions, etc) from you. For  
> documentation on how to access the services, check it out at:
>
> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
> API" below the "CIPRES PORTAL" banner.
>
> Lucie
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================





From cjfields at uiuc.edu  Mon Jan 28 01:04:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 00:04:46 -0600
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: 
References: <4795292E.4030401@sdsc.edu>
	
Message-ID: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>

We can certainly add it to the to-do list; just need to sort out the  
details (how often to allow posts, etc).  I guess we would want this  
in the Bio::Tools::Run namespace, same as RemoteBlast?

chris

On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:

> Some folks may remember that CIPRES (http://www.phylo.org) released  
> their portal with access to remote execution of several phylogenetic  
> tree reconstruction programs in spring last year.
>
> It took a while but they have now also built a really nice REST- 
> based API that makes the service fully programmable instead of  
> screen-scraping 5 pages:
>
> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>
> It should be relatively straightforward to build the equivalent of  
> RemoteBlast on top of this. Would anyone be keen to take this on?
>
> 	-hilmar
>
> P.S. Sorry for the cross-posting - I thought this is relevant to  
> both communities. When responding in a project-specific way please  
> make sure you remove the list that is no longer pertinent.
>
>
> Begin forwarded message:
>
>> From: Lucie Chan 
>> Date: January 21, 2008 6:22:22 PM EST
>> To: Hilmar Lapp 
>> Cc: Mark Miller , Rutger Vos ,  
>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu
>> Subject: Re: REST APIs for Cipres Web Portal
>> Reply-To: lcchan at sdsc.edu
>>
>> Hilmar, et al.,
>>
>> I just released the first version of our REST Web Services API for  
>> job submission, and job status query, and
>> job result file retrieval. I'd like to get some feedbacks (issues,  
>> problems, improvements, suggestions, etc) from you. For  
>> documentation on how to access the services, check it out at:
>>
>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>> API" below the "CIPRES PORTAL" banner.
>>
>> Lucie
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From hlapp at duke.edu  Mon Jan 28 08:42:39 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 28 Jan 2008 08:42:39 -0500
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
References: <4795292E.4030401@sdsc.edu>
	
	<7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
Message-ID: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>

Yep that's what I was thinking.

BTW the API needs multipart/form-data encoding for input (due to file  
upload); I'm assuming that that's supported well in LWP but if anyone  
knows where to start digging for that the pointer would be appreciated.

	-hilmar

On Jan 28, 2008, at 1:04 AM, Chris Fields wrote:

> We can certainly add it to the to-do list; just need to sort out  
> the details (how often to allow posts, etc).  I guess we would want  
> this in the Bio::Tools::Run namespace, same as RemoteBlast?
>
> chris
>
> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:
>
>> Some folks may remember that CIPRES (http://www.phylo.org)  
>> released their portal with access to remote execution of several  
>> phylogenetic tree reconstruction programs in spring last year.
>>
>> It took a while but they have now also built a really nice REST- 
>> based API that makes the service fully programmable instead of  
>> screen-scraping 5 pages:
>>
>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>>
>> It should be relatively straightforward to build the equivalent of  
>> RemoteBlast on top of this. Would anyone be keen to take this on?
>>
>> 	-hilmar
>>
>> P.S. Sorry for the cross-posting - I thought this is relevant to  
>> both communities. When responding in a project-specific way please  
>> make sure you remove the list that is no longer pertinent.
>>
>>
>> Begin forwarded message:
>>
>>> From: Lucie Chan 
>>> Date: January 21, 2008 6:22:22 PM EST
>>> To: Hilmar Lapp 
>>> Cc: Mark Miller , Rutger Vos ,  
>>> Terri Liebowitz , Paul Hoover ,  
>>> mtholder at ku.edu
>>> Subject: Re: REST APIs for Cipres Web Portal
>>> Reply-To: lcchan at sdsc.edu
>>>
>>> Hilmar, et al.,
>>>
>>> I just released the first version of our REST Web Services API  
>>> for job submission, and job status query, and
>>> job result file retrieval. I'd like to get some feedbacks  
>>> (issues, problems, improvements, suggestions, etc) from you. For  
>>> documentation on how to access the services, check it out at:
>>>
>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>>> API" below the "CIPRES PORTAL" banner.
>>>
>>> Lucie
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================





From cjfields at uiuc.edu  Mon Jan 28 08:50:08 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 07:50:08 -0600
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>
References: <4795292E.4030401@sdsc.edu>
	
	<7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
	<2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>
Message-ID: 

Googled it.

 From http://www.issociate.de/board/post/258535/LWP_-_multipart/form-data_file_upload_from_scalar_rather_than_local_file.html 
  :

my $ua = new LWP::UserAgent;
$response=$ua->request(POST $URL,
Content_Type => 'multipart/form-data',
Content => [ $PARAM => [undef,$FILENAME, Content => $CONTENTS ] ]);

Where $PARAM is the name of the parameter, $FILENAME is what you want
to call the file, and $CONTENTS is a scalar holding the contents of the
file.

Could probably use HTTP::Request in there, but whatever works.

chris

On Jan 28, 2008, at 7:42 AM, Hilmar Lapp wrote:

> Yep that's what I was thinking.
>
> BTW the API needs multipart/form-data encoding for input (due to  
> file upload); I'm assuming that that's supported well in LWP but if  
> anyone knows where to start digging for that the pointer would be  
> appreciated.
>
> 	-hilmar
>
> On Jan 28, 2008, at 1:04 AM, Chris Fields wrote:
>
>> We can certainly add it to the to-do list; just need to sort out  
>> the details (how often to allow posts, etc).  I guess we would want  
>> this in the Bio::Tools::Run namespace, same as RemoteBlast?
>>
>> chris
>>
>> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:
>>
>>> Some folks may remember that CIPRES (http://www.phylo.org)  
>>> released their portal with access to remote execution of several  
>>> phylogenetic tree reconstruction programs in spring last year.
>>>
>>> It took a while but they have now also built a really nice REST- 
>>> based API that makes the service fully programmable instead of  
>>> screen-scraping 5 pages:
>>>
>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>>>
>>> It should be relatively straightforward to build the equivalent of  
>>> RemoteBlast on top of this. Would anyone be keen to take this on?
>>>
>>> 	-hilmar
>>>
>>> P.S. Sorry for the cross-posting - I thought this is relevant to  
>>> both communities. When responding in a project-specific way please  
>>> make sure you remove the list that is no longer pertinent.
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: Lucie Chan 
>>>> Date: January 21, 2008 6:22:22 PM EST
>>>> To: Hilmar Lapp 
>>>> Cc: Mark Miller , Rutger Vos ,  
>>>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu
>>>> Subject: Re: REST APIs for Cipres Web Portal
>>>> Reply-To: lcchan at sdsc.edu
>>>>
>>>> Hilmar, et al.,
>>>>
>>>> I just released the first version of our REST Web Services API  
>>>> for job submission, and job status query, and
>>>> job result file retrieval. I'd like to get some feedbacks  
>>>> (issues, problems, improvements, suggestions, etc) from you. For  
>>>> documentation on how to access the services, check it out at:
>>>>
>>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>>>> API" below the "CIPRES PORTAL" banner.
>>>>
>>>> Lucie
>>>>
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>>> ===========================================================
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From shandar at nibio.go.jp  Sun Jan 27 01:50:40 2008
From: shandar at nibio.go.jp (Shandar Ahmad)
Date: Sun, 27 Jan 2008 15:50:40 +0900
Subject: [Bioperl-l] PRIB 2008
Message-ID: <1201416640.31793.7.camel@boe>

******* Our apologies if you received multiple copies ***********
If you wish not to receive PRIB 2008 related emails, please write to
Madhu Chetty 
and CC to me at shandar at nibio.go.jp
******************************************************************



PRELIMINARY CALL FOR PAPERS AND INVITED SESSIONS

********************************************************************************************
Third IAPR International Conference on Pattern Recognition in 
Bioinformatics (PRIB 2008)
October 15 ? 17, 2008
Melbourne, Australia

http://www.infotech.monash.edu.au/prib08
********************************************************************************************

PRIB 2008 is aimed at bringing together top researchers, practitioners, 
and students from around the world to discuss the applications of 
pattern recognition methods in the field of bioinformatics to solve 
problems in life sciences. Pattern recognition techniques of interest 
include: statistical, syntactic, and structural approaches, Bayesian, 
hidden Markov and graphical models, neural networks, fuzzy and genetic 
algorithms, data mining, and their hybrids. Papers in areas of (but not 
limited to) bio-sequence analysis, gene and protein expression
analysis, 
structure prediction, protein folding, docking, metabolic pathway 
analysis and regulatory networks, system biology, drug design, and 
bioimaging, are solicited for presentation at the conference.

All papers will be peer reviewed and accepted papers will be published 
in the conference proceedings as an edited volume in Lecture Notes in 
Bioinformatics by Springer. Submission of papers will be electronic and 
through the conference website. Proposals for special sessions and 
tutorials at the conference are also invited in all related areas of 
research. Authors of selected papers presented at the conference will 
also be invited for publication in Special Issues of reputed journals.

Location:
Melbourne is a sophisticated city in the south-east corner of mainland 
Australia. It is known for its attractive site seeing places, great 
events, passion for food and wine and fabulous scenery. Boasting as a 
style-setter, Melbourne is home to continuous program of festivals, art 
exhibitions and musical extravaganzas. Warning: you might never want to 
go home.

For latest information on PRIB 2008, visit the conference web site:
http://www.infotech.monash.edu.au/prib08

or email the secretariat at prib2008.melb at infotech.monash.edu.au

Important Deadlines
Paper submission: 15 April 2008
Proposals for Special Sessions/Tutorials: 15 March 2008
Author notification: 15 May 2008
Camera-ready papers: 15 June 2008


Organising Committee, PRIB 2008



From snoze.pa at gmail.com  Mon Jan 28 16:07:37 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Mon, 28 Jan 2008 15:07:37 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
Message-ID: <10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>

Still I am getting the same error message..

My question is:

Do i need to install bioperl-DB for biosql?

When I am using biosql and trying to load NCBI taxonomy then it is working
fine. but when I am trying to install bioperl-DB then it is giving me
following error message when loading NCBI taxonomy.

Any help?



Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568


From susantoroy at gmail.com  Mon Jan 28 16:05:49 2008
From: susantoroy at gmail.com (Susanta Roy)
Date: Tue, 29 Jan 2008 02:35:49 +0530
Subject: [Bioperl-l] Please remove my letter from your site
Message-ID: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>

Dear Sir,
Please remove my letter appearing at your below URL:
http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html
http://bioperl.org/pipermail/bioperl-l/2007-December.txt
http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html


It is not supposed to appear online.
Thanks in advance.

Regards
Suisanta


From cjfields at uiuc.edu  Mon Jan 28 16:53:33 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 15:53:33 -0600
Subject: [Bioperl-l] Please remove my letter from your site
In-Reply-To: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>
References: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>
Message-ID: 

Um, you posted to a public mailing list (hence the list is open to the  
public, for searching, indexing via Google, etc).  Terms of usage are  
here:

http://lists.open-bio.org/mailman/listinfo/bioperl-l

with more info here:

http://www.bioperl.org/wiki/Mailing_lists

BTW, this post will also appear.  C'est la vie!

chris

On Jan 28, 2008, at 3:05 PM, Susanta Roy wrote:

> Dear Sir,
> Please remove my letter appearing at your below URL:
> http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html
> http://bioperl.org/pipermail/bioperl-l/2007-December.txt
> http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html
>
>
> It is not supposed to appear online.
> Thanks in advance.
>
> Regards
> Suisanta
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Tue Jan 29 12:15:41 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 29 Jan 2008 11:15:41 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
Message-ID: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>

Dear Users,
I tried the to refresh installation and seems it is working. But when I
loading sequences then it is giving me following warning messages. Am i
doing alright? or i am missing huge chunk of sequences..Thanks in advance
s

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were ("","1") FKs (27,3,4)
Duplicate entry '27-3-4-1' for key 2
---------------------------------------------------
...
...
and so on


From tristan.lefebure at gmail.com  Tue Jan 29 12:19:23 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 29 Jan 2008 12:19:23 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
Message-ID: <200801291219.23172.tristan.lefebure@gmail.com>

Hello,

I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers.
I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. 
My script works well for short request, but it gives the following error with the long request:

 ------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
500 short write
Content-Type: text/plain
Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
Client-Warning: Internal response

500 short write

STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: ./fetch_from_genbank.pl:58
---------------------------------------------------------

Does that mean that we can only fetch 500 sequences at a time?
Should I split my list in 500 ids framents and submit them one after the other?

Any suggestions very welcomed...
Thanks,
-Tristan


Here is the script:

##################################
use strict;
use warnings;
use Bio::DB::GenBank;
# use Bio::DB::EUtilities;
use Bio::SeqIO;
use Getopt::Long;

# 2008-01-22 T Lefebure
# I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank.
# The following procedure is not really good as the stream is first copied to a temporary file,
# and than re-used by BioPerl to generate the final file.

my $db = 'nucleotide';
my $format = 'genbank';
my $help= '';
my $dformat = 'gb';

GetOptions(
	'help|?' => \$help,
	'format=s'  => \$format,
	'database=s'	=> \$db,
);


my $printhelp = "\nUsage: $0 [options]  

Will download the corresponding data from GenBank. BioPerl is required.

Options:
	-h
		print this help
	-format: genbank|fasta|...
		give output format (default=genbank)
	-database: nucleotide|genome|protein|...
		define the database to search in (default=nucleotide)

The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";

if ($#ARGV<1) {
	print $printhelp;
	exit;
}

open LIST, $ARGV[0];
my @list = ;

if ($format eq 'fasta') { $dformat = 'fasta' }

my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
				-format => $dformat,
				-db => $db,
			);
my $seqio = $gb->get_Stream_by_acc(\@list);

my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
				-format => $format,
			);
while (my $seqo = $seqio->next_seq ) {
	print $seqo->id, "\n";
	$seqout->write_seq($seqo);
}


From cjfields at uiuc.edu  Tue Jan 29 13:06:08 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 29 Jan 2008 12:06:08 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>

Yes, you can only retrieve ~500 sequences at a time using either  
Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities  
interact with NCBI's EUtilities (the former module returns raw data  
from the URL to be processed later, the latter module returns Bio::Seq/ 
Bio::SeqIO objects).

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets

You can usually post more IDs using epost and fetch sequence referring  
to the WebEnv/key combo (batch posting).  I try to make this a bit  
easier with EUtilities but it is woefully lacking in documentation (my  
fault), but there is some code up on the wiki which should work.

chris

On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:

> Hello,
>
> I would like to download a large number of sequences from GenBank  
> (122,146 to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and  
> finally used Bio::DB::GenBank.
> My script works well for short request, but it gives the following  
> error with the long request:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after  
> the other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get  
> back to Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first  
> copied to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
> 	'help|?' => \$help,
> 	'format=s'  => \$format,
> 	'database=s'	=> \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is  
> required.
>
> Options:
> 	-h
> 		print this help
> 	-format: genbank|fasta|...
> 		give output format (default=genbank)
> 	-database: nucleotide|genome|protein|...
> 		define the database to search in (default=nucleotide)
>
> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html 
> \n";
>
> if ($#ARGV<1) {
> 	print $printhelp;
> 	exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 				-format => $dformat,
> 				-db => $db,
> 			);
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> 				-format => $format,
> 			);
> while (my $seqo = $seqio->next_seq ) {
> 	print $seqo->id, "\n";
> 	$seqout->write_seq($seqo);
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Tue Jan 29 13:22:56 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 29 Jan 2008 12:22:56 -0600
Subject: [Bioperl-l] loading sequence error bioseq
Message-ID: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>

Dear User,

 After successfully creating a database bioseqdb and loading ncbi_taxonomy
successfully I am getting following error message while loading sequences
into database.

load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc

MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were ("","31") FKs
MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were
MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were

Column 'dbname' cannot be null

STACK: /usr/local/bioperl-
db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620
-----------------------------------------------------------

 at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl line
633

Any Idea?

Thanks in advance
s


From cjfields at uiuc.edu  Tue Jan 29 13:44:16 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 29 Jan 2008 12:44:16 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <479F7149.1010203@atgc.org>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
	<479F7149.1010203@atgc.org>
Message-ID: 

Forgot about that one; it's definitely a better way to do it if you  
have the GI/accessions.

chris

On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:

> you don't need to use bioperl to accomplish this task, to download  
> several thousand sequences based on accession ID list.
>
> NCBI batch Entrez can do that:
> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>
> just submit a large list of IDs, select database, and download.
>
> you can submit ~50,000 IDs in one file usually without problems.
> it may not return results if a list is larger than ~100,000 IDs
>
> --
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 Health Sciences Drive
> Genome Center, 4-th floor, room 4302
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email#1: akozik at atgc.org
> email#2: akozik at gmail.com
> web: http://www.atgc.org/
>
>
>
> Chris Fields wrote:
>> Yes, you can only retrieve ~500 sequences at a time using either  
>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities  
>> interact with NCBI's EUtilities (the former module returns raw data  
>> from the URL to be processed later, the latter module returns  
>> Bio::Seq/Bio::SeqIO objects).
>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets 
>>  You can usually post more IDs using epost and fetch sequence  
>> referring to the WebEnv/key combo (batch posting).  I try to make  
>> this a bit easier with EUtilities but it is woefully lacking in  
>> documentation (my fault), but there is some code up on the wiki  
>> which should work.
>> chris
>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>> Hello,
>>>
>>> I would like to download a large number of sequences from GenBank  
>>> (122,146 to be exact) following a list of accession numbers.
>>> I first investigated around Bio::DB::EUtilities, but got lost and  
>>> finally used Bio::DB::GenBank.
>>> My script works well for short request, but it gives the following  
>>> error with the long request:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: WebDBSeqI Request Error:
>>> 500 short write
>>> Content-Type: text/plain
>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>> Client-Warning: Internal response
>>>
>>> 500 short write
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
>>> Root.pm:359
>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ 
>>> Bio/DB/WebDBSeqI.pm:685
>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ 
>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>> STACK: ./fetch_from_genbank.pl:58
>>> ---------------------------------------------------------
>>>
>>> Does that mean that we can only fetch 500 sequences at a time?
>>> Should I split my list in 500 ids framents and submit them one  
>>> after the other?
>>>
>>> Any suggestions very welcomed...
>>> Thanks,
>>> -Tristan
>>>
>>>
>>> Here is the script:
>>>
>>> ##################################
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GenBank;
>>> # use Bio::DB::EUtilities;
>>> use Bio::SeqIO;
>>> use Getopt::Long;
>>>
>>> # 2008-01-22 T Lefebure
>>> # I tried to use Bio::DB::EUtilities without much succes and get  
>>> back to Bio::DB::GenBank.
>>> # The following procedure is not really good as the stream is  
>>> first copied to a temporary file,
>>> # and than re-used by BioPerl to generate the final file.
>>>
>>> my $db = 'nucleotide';
>>> my $format = 'genbank';
>>> my $help= '';
>>> my $dformat = 'gb';
>>>
>>> GetOptions(
>>>    'help|?' => \$help,
>>>    'format=s'  => \$format,
>>>    'database=s'    => \$db,
>>> );
>>>
>>>
>>> my $printhelp = "\nUsage: $0 [options]  
>>>
>>> Will download the corresponding data from GenBank. BioPerl is  
>>> required.
>>>
>>> Options:
>>>    -h
>>>        print this help
>>>    -format: genbank|fasta|...
>>>        give output format (default=genbank)
>>>    -database: nucleotide|genome|protein|...
>>>        define the database to search in (default=nucleotide)
>>>
>>> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html 
>>> \n";
>>>
>>> if ($#ARGV<1) {
>>>    print $printhelp;
>>>    exit;
>>> }
>>>
>>> open LIST, $ARGV[0];
>>> my @list = ;
>>>
>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>
>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>                -format => $dformat,
>>>                -db => $db,
>>>            );
>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>
>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>                -format => $format,
>>>            );
>>> while (my $seqo = $seqio->next_seq ) {
>>>    print $seqo->id, "\n";
>>>    $seqout->write_seq($seqo);
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From akozik at atgc.org  Tue Jan 29 13:32:41 2008
From: akozik at atgc.org (Alexander Kozik)
Date: Tue, 29 Jan 2008 10:32:41 -0800
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
Message-ID: <479F7149.1010203@atgc.org>

you don't need to use bioperl to accomplish this task, to download 
several thousand sequences based on accession ID list.

NCBI batch Entrez can do that:
http://www.ncbi.nlm.nih.gov/sites/batchentrez

just submit a large list of IDs, select database, and download.

you can submit ~50,000 IDs in one file usually without problems.
it may not return results if a list is larger than ~100,000 IDs

--
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 Health Sciences Drive
Genome Center, 4-th floor, room 4302
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/



Chris Fields wrote:
> Yes, you can only retrieve ~500 sequences at a time using either 
> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities 
> interact with NCBI's EUtilities (the former module returns raw data from 
> the URL to be processed later, the latter module returns 
> Bio::Seq/Bio::SeqIO objects).
> 
> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets 
> 
> 
> You can usually post more IDs using epost and fetch sequence referring 
> to the WebEnv/key combo (batch posting).  I try to make this a bit 
> easier with EUtilities but it is woefully lacking in documentation (my 
> fault), but there is some code up on the wiki which should work.
> 
> chris
> 
> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> 
>> Hello,
>>
>> I would like to download a large number of sequences from GenBank 
>> (122,146 to be exact) following a list of accession numbers.
>> I first investigated around Bio::DB::EUtilities, but got lost and 
>> finally used Bio::DB::GenBank.
>> My script works well for short request, but it gives the following 
>> error with the long request:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: WebDBSeqI Request Error:
>> 500 short write
>> Content-Type: text/plain
>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>> Client-Warning: Internal response
>>
>> 500 short write
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::DB::WebDBSeqI::_request 
>> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
>> STACK: Bio::DB::WebDBSeqI::get_seq_stream 
>> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc 
>> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>> STACK: ./fetch_from_genbank.pl:58
>> ---------------------------------------------------------
>>
>> Does that mean that we can only fetch 500 sequences at a time?
>> Should I split my list in 500 ids framents and submit them one after 
>> the other?
>>
>> Any suggestions very welcomed...
>> Thanks,
>> -Tristan
>>
>>
>> Here is the script:
>>
>> ##################################
>> use strict;
>> use warnings;
>> use Bio::DB::GenBank;
>> # use Bio::DB::EUtilities;
>> use Bio::SeqIO;
>> use Getopt::Long;
>>
>> # 2008-01-22 T Lefebure
>> # I tried to use Bio::DB::EUtilities without much succes and get back 
>> to Bio::DB::GenBank.
>> # The following procedure is not really good as the stream is first 
>> copied to a temporary file,
>> # and than re-used by BioPerl to generate the final file.
>>
>> my $db = 'nucleotide';
>> my $format = 'genbank';
>> my $help= '';
>> my $dformat = 'gb';
>>
>> GetOptions(
>>     'help|?' => \$help,
>>     'format=s'  => \$format,
>>     'database=s'    => \$db,
>> );
>>
>>
>> my $printhelp = "\nUsage: $0 [options]  
>>
>> Will download the corresponding data from GenBank. BioPerl is required.
>>
>> Options:
>>     -h
>>         print this help
>>     -format: genbank|fasta|...
>>         give output format (default=genbank)
>>     -database: nucleotide|genome|protein|...
>>         define the database to search in (default=nucleotide)
>>
>> The full description of the options can be find at 
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";
>>
>> if ($#ARGV<1) {
>>     print $printhelp;
>>     exit;
>> }
>>
>> open LIST, $ARGV[0];
>> my @list = ;
>>
>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>
>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>                 -format => $dformat,
>>                 -db => $db,
>>             );
>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>
>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>                 -format => $format,
>>             );
>> while (my $seqo = $seqio->next_seq ) {
>>     print $seqo->id, "\n";
>>     $seqout->write_seq($seqo);
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Tue Jan 29 16:31:47 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 29 Jan 2008 16:31:47 -0500
Subject: [Bioperl-l] loading sequence error bioseq
In-Reply-To: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>
References: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>
Message-ID: 

This looks suspiciously like a data error. Can you please give the  
full command line. This should also show which format your sequences  
are in.

	-hilmar

On Jan 29, 2008, at 1:22 PM, snoze pa wrote:

> Dear User,
>
>  After successfully creating a database bioseqdb and loading  
> ncbi_taxonomy
> successfully I am getting following error message while loading  
> sequences
> into database.
>
> load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc
>
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were ("","31") FKs
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values  
> were
>
> Column 'dbname' cannot be null
>
> STACK: /usr/local/bioperl-
> db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620
> -----------------------------------------------------------
>
>  at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/ 
> load_seqdatabase.pl line
> 633
>
> Any Idea?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From hlapp at gmx.net  Tue Jan 29 16:40:21 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 29 Jan 2008 16:40:21 -0500
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
	<10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
Message-ID: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net>

This would mean that two or more seqfeatures with the same type for  
the same sequence exist in the input data, each with rank 1.

Normally the rank will be incremented for each seqfeature of a  
sequence, so I'm not sure how this is happening here w/o seeing the  
data.

	-hilmar
On Jan 29, 2008, at 12:15 PM, snoze pa wrote:

> Dear Users,
> I tried the to refresh installation and seems it is working. But  
> when I
> loading sequences then it is giving me following warning messages.  
> Am i
> doing alright? or i am missing huge chunk of sequences..Thanks in  
> advance
> s
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were ("","1") FKs (27,3,4)
> Duplicate entry '27-3-4-1' for key 2
> ---------------------------------------------------
> ...
> ...
> and so on
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From avilella at gmail.com  Wed Jan 30 04:28:34 2008
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 30 Jan 2008 09:28:34 +0000
Subject: [Bioperl-l] fetch dna seqs from genbank protein ids
Message-ID: <358f4d650801300128q44cf95a0va11799908c4f26a0@mail.gmail.com>

Hi bioperlers,

Got a question here:

>I have a bunch of protein sequences in multi-FastA with their
>accession numbers in the header and I want to retrieve their
>corresponding nucleotide sequences and nucleotide accession numbers.
>I can't seem to find a way to do it. I am looking at eUtils on the
>NCBI site, but they only do really simple stuff.

I had a look at the fetch example scripts, and I could fetch proteins
from Genbank,
but I don't see a clear connection between the protein sequence and
the DNA sequence.
Is this a DBlink? Which type?

Cheers,

    Albert.


From tristan.lefebure at gmail.com  Wed Jan 30 09:56:07 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 30 Jan 2008 09:56:07 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: 
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
Message-ID: <200801300956.07849.tristan.lefebure@gmail.com>

Thank you both!

Just in case it might be usefull for someone else, here are my ramblings:

1. I first tried to adapt my script and fetch 500 sequences at a time. It works, except that ~40% of the time NCBI gives the following error and my script crashed:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
[...]
    The proxy server received an invalid
    response from an upstream server.
[...]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: ./fetch_from_genbank.pl:68
-----------------------------------------------------------

I tried to modify the script so that when the retrieval of a 500 sequence block crashes, it continues with the other blocks, but I was unsuccessfull. It probably needs some better understanding of BioPerl errors...
Here is the section of the script that was modified:
#########
my $n_seq = scalar @list;
my @aborted;

for (my $i=1; $i<=$n_seq; $i += 500) {
	print "Fetching sequences $i to ", $i+499, ": ";
	my $start = $i -1;
	my $end = $i + 500 -1;
	my @red_list = @list[$start .. $end]; 
	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
					-format => $dformat,
					-db => $db,
				);

	my $seqio;
	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
		print "Aborted, resubmit latter\n";
		push @aborted, @red_list;
		next;
	}
	
	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
					-format => $format,
				);
	while (my $seqo = $seqio->next_seq ) {
# 		print $seqo->id, "\n";
		$seqout->write_seq($seqo);
	}
	print "Done\n";
}

if (@aborted) {
	open OUT, ">aborted_fetching.AN";
	foreach (@aborted) { print OUT $_ };
}
##########


2. So I moved to the second solution and tried batchentrez. I cut my 120,000 long AN list into 10,000 long pieces using split:
split -l 10000 full_list.AN splitted_list_

and then submitted the 13 lists one by one. I must say that I don't really like using a web-interface to fetch data, and here the most ennoying part is that you end up with a regular Entrez/GenBank webpage: select your format, export to file, chosse file name... and have to do it many times.
It is too much prone to human and web-browser errors for my taste, but it worked.
Nevertheless there is some caveats: 
- some downloaded files were incomplete (~10%) and you have to restart it
- you can't submit several lists in the same time (otherwise the same cookie will be used and you'll end up with several identical files) 

-Tristan

On Tuesday 29 January 2008 13:44:16 you wrote:
> Forgot about that one; it's definitely a better way to do it if you
> have the GI/accessions.
>
> chris
>
> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
> > you don't need to use bioperl to accomplish this task, to download
> > several thousand sequences based on accession ID list.
> >
> > NCBI batch Entrez can do that:
> > http://www.ncbi.nlm.nih.gov/sites/batchentrez
> >
> > just submit a large list of IDs, select database, and download.
> >
> > you can submit ~50,000 IDs in one file usually without problems.
> > it may not return results if a list is larger than ~100,000 IDs
> >
> > --
> > Alexander Kozik
> > Bioinformatics Specialist
> > Genome and Biomedical Sciences Facility
> > 451 Health Sciences Drive
> > Genome Center, 4-th floor, room 4302
> > University of California
> > Davis, CA 95616-8816
> > Phone: (530) 754-9127
> > email#1: akozik at atgc.org
> > email#2: akozik at gmail.com
> > web: http://www.atgc.org/
> >
> > Chris Fields wrote:
> >> Yes, you can only retrieve ~500 sequences at a time using either
> >> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
> >> interact with NCBI's EUtilities (the former module returns raw data
> >> from the URL to be processed later, the latter module returns
> >> Bio::Seq/Bio::SeqIO objects).
> >> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
> >>atasets You can usually post more IDs using epost and fetch sequence
> >> referring to the WebEnv/key combo (batch posting).  I try to make
> >> this a bit easier with EUtilities but it is woefully lacking in
> >> documentation (my fault), but there is some code up on the wiki
> >> which should work.
> >> chris
> >>
> >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> >>> Hello,
> >>>
> >>> I would like to download a large number of sequences from GenBank
> >>> (122,146 to be exact) following a list of accession numbers.
> >>> I first investigated around Bio::DB::EUtilities, but got lost and
> >>> finally used Bio::DB::GenBank.
> >>> My script works well for short request, but it gives the following
> >>> error with the long request:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: WebDBSeqI Request Error:
> >>> 500 short write
> >>> Content-Type: text/plain
> >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> >>> Client-Warning: Internal response
> >>>
> >>> 500 short write
> >>>
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/
> >>> Root.pm:359
> >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
> >>> Bio/DB/WebDBSeqI.pm:685
> >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
> >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
> >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> >>> STACK: ./fetch_from_genbank.pl:58
> >>> ---------------------------------------------------------
> >>>
> >>> Does that mean that we can only fetch 500 sequences at a time?
> >>> Should I split my list in 500 ids framents and submit them one
> >>> after the other?
> >>>
> >>> Any suggestions very welcomed...
> >>> Thanks,
> >>> -Tristan
> >>>
> >>>
> >>> Here is the script:
> >>>
> >>> ##################################
> >>> use strict;
> >>> use warnings;
> >>> use Bio::DB::GenBank;
> >>> # use Bio::DB::EUtilities;
> >>> use Bio::SeqIO;
> >>> use Getopt::Long;
> >>>
> >>> # 2008-01-22 T Lefebure
> >>> # I tried to use Bio::DB::EUtilities without much succes and get
> >>> back to Bio::DB::GenBank.
> >>> # The following procedure is not really good as the stream is
> >>> first copied to a temporary file,
> >>> # and than re-used by BioPerl to generate the final file.
> >>>
> >>> my $db = 'nucleotide';
> >>> my $format = 'genbank';
> >>> my $help= '';
> >>> my $dformat = 'gb';
> >>>
> >>> GetOptions(
> >>>    'help|?' => \$help,
> >>>    'format=s'  => \$format,
> >>>    'database=s'    => \$db,
> >>> );
> >>>
> >>>
> >>> my $printhelp = "\nUsage: $0 [options]  
> >>>
> >>> Will download the corresponding data from GenBank. BioPerl is
> >>> required.
> >>>
> >>> Options:
> >>>    -h
> >>>        print this help
> >>>    -format: genbank|fasta|...
> >>>        give output format (default=genbank)
> >>>    -database: nucleotide|genome|protein|...
> >>>        define the database to search in (default=nucleotide)
> >>>
> >>> The full description of the options can be find at
> >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
> >>> \n";
> >>>
> >>> if ($#ARGV<1) {
> >>>    print $printhelp;
> >>>    exit;
> >>> }
> >>>
> >>> open LIST, $ARGV[0];
> >>> my @list = ;
> >>>
> >>> if ($format eq 'fasta') { $dformat = 'fasta' }
> >>>
> >>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
> >>>                -format => $dformat,
> >>>                -db => $db,
> >>>            );
> >>> my $seqio = $gb->get_Stream_by_acc(\@list);
> >>>
> >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> >>>                -format => $format,
> >>>            );
> >>> while (my $seqo = $seqio->next_seq ) {
> >>>    print $seqo->id, "\n";
> >>>    $seqout->write_seq($seqo);
> >>> }
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign




From cjfields at uiuc.edu  Wed Jan 30 10:10:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 09:10:14 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: <7143A650-AA84-4331-B55A-A66C3F5BBAB0@uiuc.edu>

You can use an eval {} block to catch the error, then redo the loop  
(so you don't iterate to the next block) or use next and skip the  
current block if an error occurs.  If you use redo then you should use  
a counter to exit the loop after several tries.

chris

On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote:

> Thank you both!
>
> Just in case it might be usefull for someone else, here are my  
> ramblings:
>
> 1. I first tried to adapt my script and fetch 500 sequences at a  
> time. It works, except that ~40% of the time NCBI gives the  
> following error and my script crashed:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>    The proxy server received an invalid
>    response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
>
> I tried to modify the script so that when the retrieval of a 500  
> sequence block crashes, it continues with the other blocks, but I  
> was unsuccessfull. It probably needs some better understanding of  
> BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
>
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
>
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 	
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
>
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
>
>
> 2. So I moved to the second solution and tried batchentrez. I cut my  
> 120,000 long AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
>
> and then submitted the 13 lists one by one. I must say that I don't  
> really like using a web-interface to fetch data, and here the most  
> ennoying part is that you end up with a regular Entrez/GenBank  
> webpage: select your format, export to file, chosse file name... and  
> have to do it many times.
> It is too much prone to human and web-browser errors for my taste,  
> but it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to  
> restart it
> - you can't submit several lists in the same time (otherwise the  
> same cookie will be used and you'll end up with several identical  
> files)
>
> -Tristan
>
> On Tuesday 29 January 2008 13:44:16 you wrote:
>> Forgot about that one; it's definitely a better way to do it if you
>> have the GI/accessions.
>>
>> chris
>>
>> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
>>> you don't need to use bioperl to accomplish this task, to download
>>> several thousand sequences based on accession ID list.
>>>
>>> NCBI batch Entrez can do that:
>>> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>>>
>>> just submit a large list of IDs, select database, and download.
>>>
>>> you can submit ~50,000 IDs in one file usually without problems.
>>> it may not return results if a list is larger than ~100,000 IDs
>>>
>>> --
>>> Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 Health Sciences Drive
>>> Genome Center, 4-th floor, room 4302
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>> Chris Fields wrote:
>>>> Yes, you can only retrieve ~500 sequences at a time using either
>>>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
>>>> interact with NCBI's EUtilities (the former module returns raw data
>>>> from the URL to be processed later, the latter module returns
>>>> Bio::Seq/Bio::SeqIO objects).
>>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
>>>> atasets You can usually post more IDs using epost and fetch  
>>>> sequence
>>>> referring to the WebEnv/key combo (batch posting).  I try to make
>>>> this a bit easier with EUtilities but it is woefully lacking in
>>>> documentation (my fault), but there is some code up on the wiki
>>>> which should work.
>>>> chris
>>>>
>>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to download a large number of sequences from GenBank
>>>>> (122,146 to be exact) following a list of accession numbers.
>>>>> I first investigated around Bio::DB::EUtilities, but got lost and
>>>>> finally used Bio::DB::GenBank.
>>>>> My script works well for short request, but it gives the following
>>>>> error with the long request:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: WebDBSeqI Request Error:
>>>>> 500 short write
>>>>> Content-Type: text/plain
>>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>>>> Client-Warning: Internal response
>>>>>
>>>>> 500 short write
>>>>>
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ 
>>>>> Root/
>>>>> Root.pm:359
>>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
>>>>> Bio/DB/WebDBSeqI.pm:685
>>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
>>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
>>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>>>> STACK: ./fetch_from_genbank.pl:58
>>>>> ---------------------------------------------------------
>>>>>
>>>>> Does that mean that we can only fetch 500 sequences at a time?
>>>>> Should I split my list in 500 ids framents and submit them one
>>>>> after the other?
>>>>>
>>>>> Any suggestions very welcomed...
>>>>> Thanks,
>>>>> -Tristan
>>>>>
>>>>>
>>>>> Here is the script:
>>>>>
>>>>> ##################################
>>>>> use strict;
>>>>> use warnings;
>>>>> use Bio::DB::GenBank;
>>>>> # use Bio::DB::EUtilities;
>>>>> use Bio::SeqIO;
>>>>> use Getopt::Long;
>>>>>
>>>>> # 2008-01-22 T Lefebure
>>>>> # I tried to use Bio::DB::EUtilities without much succes and get
>>>>> back to Bio::DB::GenBank.
>>>>> # The following procedure is not really good as the stream is
>>>>> first copied to a temporary file,
>>>>> # and than re-used by BioPerl to generate the final file.
>>>>>
>>>>> my $db = 'nucleotide';
>>>>> my $format = 'genbank';
>>>>> my $help= '';
>>>>> my $dformat = 'gb';
>>>>>
>>>>> GetOptions(
>>>>>   'help|?' => \$help,
>>>>>   'format=s'  => \$format,
>>>>>   'database=s'    => \$db,
>>>>> );
>>>>>
>>>>>
>>>>> my $printhelp = "\nUsage: $0 [options]   
>>>>> 
>>>>>
>>>>> Will download the corresponding data from GenBank. BioPerl is
>>>>> required.
>>>>>
>>>>> Options:
>>>>>   -h
>>>>>       print this help
>>>>>   -format: genbank|fasta|...
>>>>>       give output format (default=genbank)
>>>>>   -database: nucleotide|genome|protein|...
>>>>>       define the database to search in (default=nucleotide)
>>>>>
>>>>> The full description of the options can be find at
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ 
>>>>> efetchseq_help.html
>>>>> \n";
>>>>>
>>>>> if ($#ARGV<1) {
>>>>>   print $printhelp;
>>>>>   exit;
>>>>> }
>>>>>
>>>>> open LIST, $ARGV[0];
>>>>> my @list = ;
>>>>>
>>>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>>>
>>>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>>>               -format => $dformat,
>>>>>               -db => $db,
>>>>>           );
>>>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>>>
>>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>>>               -format => $format,
>>>>>           );
>>>>> while (my $seqo = $seqio->next_seq ) {
>>>>>   print $seqo->id, "\n";
>>>>>   $seqout->write_seq($seqo);
>>>>> }
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Wed Jan 30 12:34:24 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 11:34:24 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
	<10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
	<31534016-91B3-45C0-995D-CE5A82466303@gmx.net>
Message-ID: <10f848910801300934q57e5d45cpbf0e17b45640e3f9@mail.gmail.com>

Hilmar,

The command I am using is following

load_seqdatabase.pl -host localhost -namespace bioperl -dbname bioseqdb
-dbuser root -format genbank sequences.txt

I have no idea why i am getting that error

thanks in advance


On Jan 29, 2008 3:40 PM, Hilmar Lapp  wrote:

> This would mean that two or more seqfeatures with the same type for
> the same sequence exist in the input data, each with rank 1.
>
> Normally the rank will be incremented for each seqfeature of a
> sequence, so I'm not sure how this is happening here w/o seeing the
> data.
>
>        -hilmar
> On Jan 29, 2008, at 12:15 PM, snoze pa wrote:
>
> > Dear Users,
> > I tried the to refresh installation and seems it is working. But
> > when I
> > loading sequences then it is giving me following warning messages.
> > Am i
> > doing alright? or i am missing huge chunk of sequences..Thanks in
> > advance
> > s
> >
> > -------------------- WARNING ---------------------
> > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,
> > values
> > were ("","1") FKs (27,3,4)
> > Duplicate entry '27-3-4-1' for key 2
> > ---------------------------------------------------
> > ...
> > ...
> > and so on
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>


From snoze.pa at gmail.com  Wed Jan 30 13:01:46 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 12:01:46 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: <10f848910801301001k681e1291we0ce468e96d88f57@mail.gmail.com>

U can use LWP one line code to grab sequences..

On Jan 29, 2008 11:19 AM, Tristan Lefebure 
wrote:

> Hello,
>
> I would like to download a large number of sequences from GenBank (122,146
> to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and finally
> used Bio::DB::GenBank.
> My script works well for short request, but it gives the following error
> with the long request:
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after the
> other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get back to
> Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first copied
> to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
>        'help|?' => \$help,
>        'format=s'  => \$format,
>        'database=s'    => \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is required.
>
> Options:
>        -h
>                print this help
>        -format: genbank|fasta|...
>                give output format (default=genbank)
>        -database: nucleotide|genome|protein|...
>                define the database to search in (default=nucleotide)
>
> The full description of the options can be find at
> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n
> ";
>
> if ($#ARGV<1) {
>        print $printhelp;
>        exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(  -retrievaltype => 'tempfile',
>                                -format => $dformat,
>                                -db => $db,
>                        );
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>                                -format => $format,
>                        );
> while (my $seqo = $seqio->next_seq ) {
>        print $seqo->id, "\n";
>        $seqout->write_seq($seqo);
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From snoze.pa at gmail.com  Wed Jan 30 13:38:12 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 12:38:12 -0600
Subject: [Bioperl-l] load_seqdatabase help
Message-ID: <10f848910801301038t1ae296c2o2453728b68dc81f8@mail.gmail.com>

Dear User,
 Is there any alternative way so that I can load following sequence in to
biosql schema. I am trying to use load_seqdatabase.pl but it is not working
in my case and showing numbers of warning/error messages.. I did everything
but unable to load it yet.

http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb



Any help, if i can load above sequence into my bioseqdb database.

Thanks in advance
s


From snoze.pa at gmail.com  Wed Jan 30 14:30:22 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 13:30:22 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
Message-ID: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>

Hi Hilmar,

 After spending lots of time i figure out the error. I am able to load
sequences if the sequences do not have following entry

xrefs (non-sequence databases):

If the Genbank sequence have this entry then script load_seqdatabase.pl is
crashing. I try it in couple of sequences and found it is the culprit line
genbank format.  But this line is important as it contain lots of
information... so I am wondering how to solve this problem

Any help?

Thanks in advance
s


From Russell.Smithies at agresearch.co.nz  Wed Jan 30 14:34:44 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 31 Jan 2008 08:34:44 +1300
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com><479F7149.1010203@atgc.org>
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: 

Take a look at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi
Ebot is an interactive tool that generates a Perl script that implements
an E-utility pipeline.
You can probably hack the resulting script to introduce the required
BioPerly bits.

Russell Smithies 

Bioinformatics Software Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Tristan Lefebure
> Sent: Thursday, 31 January 2008 3:56 a.m.
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::GenBank and large number of requests
> 
> Thank you both!
> 
> Just in case it might be usefull for someone else, here are my
ramblings:
> 
> 1. I first tried to adapt my script and fetch 500 sequences at a time.
It works,
> except that ~40% of the time NCBI gives the following error and my
script crashed:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>     The proxy server received an invalid
>     response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
> 
> I tried to modify the script so that when the retrieval of a 500
sequence block
> crashes, it continues with the other blocks, but I was unsuccessfull.
It probably
> needs some better understanding of BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
> 
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
> 
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
> 
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
> 
> 
> 2. So I moved to the second solution and tried batchentrez. I cut my
120,000 long
> AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
> 
> and then submitted the 13 lists one by one. I must say that I don't
really like using
> a web-interface to fetch data, and here the most ennoying part is that
you end up
> with a regular Entrez/GenBank webpage: select your format, export to
file, chosse
> file name... and have to do it many times.
> It is too much prone to human and web-browser errors for my taste, but
it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to restart
it
> - you can't submit several lists in the same time (otherwise the same
cookie will be
> used and you'll end up with several identical files)
> 
> -Tristan
> 
> On Tuesday 29 January 2008 13:44:16 you wrote:
> > Forgot about that one; it's definitely a better way to do it if you
> > have the GI/accessions.
> >
> > chris
> >
> > On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
> > > you don't need to use bioperl to accomplish this task, to download
> > > several thousand sequences based on accession ID list.
> > >
> > > NCBI batch Entrez can do that:
> > > http://www.ncbi.nlm.nih.gov/sites/batchentrez
> > >
> > > just submit a large list of IDs, select database, and download.
> > >
> > > you can submit ~50,000 IDs in one file usually without problems.
> > > it may not return results if a list is larger than ~100,000 IDs
> > >
> > > --
> > > Alexander Kozik
> > > Bioinformatics Specialist
> > > Genome and Biomedical Sciences Facility
> > > 451 Health Sciences Drive
> > > Genome Center, 4-th floor, room 4302
> > > University of California
> > > Davis, CA 95616-8816
> > > Phone: (530) 754-9127
> > > email#1: akozik at atgc.org
> > > email#2: akozik at gmail.com
> > > web: http://www.atgc.org/
> > >
> > > Chris Fields wrote:
> > >> Yes, you can only retrieve ~500 sequences at a time using either
> > >> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
> > >> interact with NCBI's EUtilities (the former module returns raw
data
> > >> from the URL to be processed later, the latter module returns
> > >> Bio::Seq/Bio::SeqIO objects).
> > >>
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
> > >>atasets You can usually post more IDs using epost and fetch
sequence
> > >> referring to the WebEnv/key combo (batch posting).  I try to make
> > >> this a bit easier with EUtilities but it is woefully lacking in
> > >> documentation (my fault), but there is some code up on the wiki
> > >> which should work.
> > >> chris
> > >>
> > >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> > >>> Hello,
> > >>>
> > >>> I would like to download a large number of sequences from
GenBank
> > >>> (122,146 to be exact) following a list of accession numbers.
> > >>> I first investigated around Bio::DB::EUtilities, but got lost
and
> > >>> finally used Bio::DB::GenBank.
> > >>> My script works well for short request, but it gives the
following
> > >>> error with the long request:
> > >>>
> > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>> MSG: WebDBSeqI Request Error:
> > >>> 500 short write
> > >>> Content-Type: text/plain
> > >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> > >>> Client-Warning: Internal response
> > >>>
> > >>> 500 short write
> > >>>
> > >>> STACK: Error::throw
> > >>> STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/
> > >>> Root.pm:359
> > >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
> > >>> Bio/DB/WebDBSeqI.pm:685
> > >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
> > >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> > >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
> > >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> > >>> STACK: ./fetch_from_genbank.pl:58
> > >>> ---------------------------------------------------------
> > >>>
> > >>> Does that mean that we can only fetch 500 sequences at a time?
> > >>> Should I split my list in 500 ids framents and submit them one
> > >>> after the other?
> > >>>
> > >>> Any suggestions very welcomed...
> > >>> Thanks,
> > >>> -Tristan
> > >>>
> > >>>
> > >>> Here is the script:
> > >>>
> > >>> ##################################
> > >>> use strict;
> > >>> use warnings;
> > >>> use Bio::DB::GenBank;
> > >>> # use Bio::DB::EUtilities;
> > >>> use Bio::SeqIO;
> > >>> use Getopt::Long;
> > >>>
> > >>> # 2008-01-22 T Lefebure
> > >>> # I tried to use Bio::DB::EUtilities without much succes and get
> > >>> back to Bio::DB::GenBank.
> > >>> # The following procedure is not really good as the stream is
> > >>> first copied to a temporary file,
> > >>> # and than re-used by BioPerl to generate the final file.
> > >>>
> > >>> my $db = 'nucleotide';
> > >>> my $format = 'genbank';
> > >>> my $help= '';
> > >>> my $dformat = 'gb';
> > >>>
> > >>> GetOptions(
> > >>>    'help|?' => \$help,
> > >>>    'format=s'  => \$format,
> > >>>    'database=s'    => \$db,
> > >>> );
> > >>>
> > >>>
> > >>> my $printhelp = "\nUsage: $0 [options] 

> > >>>
> > >>> Will download the corresponding data from GenBank. BioPerl is
> > >>> required.
> > >>>
> > >>> Options:
> > >>>    -h
> > >>>        print this help
> > >>>    -format: genbank|fasta|...
> > >>>        give output format (default=genbank)
> > >>>    -database: nucleotide|genome|protein|...
> > >>>        define the database to search in (default=nucleotide)
> > >>>
> > >>> The full description of the options can be find at
> > >>>
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
> > >>> \n";
> > >>>
> > >>> if ($#ARGV<1) {
> > >>>    print $printhelp;
> > >>>    exit;
> > >>> }
> > >>>
> > >>> open LIST, $ARGV[0];
> > >>> my @list = ;
> > >>>
> > >>> if ($format eq 'fasta') { $dformat = 'fasta' }
> > >>>
> > >>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
> > >>>                -format => $dformat,
> > >>>                -db => $db,
> > >>>            );
> > >>> my $seqio = $gb->get_Stream_by_acc(\@list);
> > >>>
> > >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> > >>>                -format => $format,
> > >>>            );
> > >>> while (my $seqo = $seqio->next_seq ) {
> > >>>    print $seqo->id, "\n";
> > >>>    $seqout->write_seq($seqo);
> > >>> }
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher
> > >> Lab of Dr. Robert Switzer
> > >> Dept of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



From cjfields at uiuc.edu  Wed Jan 30 15:04:18 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 14:04:18 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
Message-ID: <0BA39C27-1871-441B-B2DE-F7FECF8570D7@uiuc.edu>

Sounds like a bug in the GenBank parser.  Could you post a bug report  
with an example sequence record and your script?

http://bugzilla.open-bio.org/

chris

On Jan 30, 2008, at 1:30 PM, snoze pa wrote:

> Hi Hilmar,
>
> After spending lots of time i figure out the error. I am able to load
> sequences if the sequences do not have following entry
>
> xrefs (non-sequence databases):
>
> If the Genbank sequence have this entry then script  
> load_seqdatabase.pl is
> crashing. I try it in couple of sequences and found it is the  
> culprit line
> genbank format.  But this line is important as it contain lots of
> information... so I am wondering how to solve this problem
>
> Any help?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Wed Jan 30 15:42:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 14:42:14 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: <29768205-F511-4EDB-84D2-BCC36DBA92C7@uiuc.edu>

When using Bio::DB::EUtilities (from bioperl-live) this works for me:

use Bio::DB::EUtilities;

# get array of IDs somehow, in @ids

my ($start, $chunk, $last) = (0, 100, $#ids);

my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
                      -db => 'protein',
                      -rettype => 'genbank');

my $ct = 1; # used to denote separate files
my $tries = 0; # server attempts

while ($start < $last) {
     # want seqs in chunk size of 100 (set above)
     my $end = ($start + $chunk - 1 ) < $last ? ($start + $chunk -  
1) : $last;
     # grab slice of IDs
     my @sub = @ids[$start..$end];

     # pass to agent
     $factory->set_parameters(-id => \@sub );

     eval {
         # check server response, if good send to file
         $factory->get_Response(-file => ">seqs_$ct.gb");
     };

     # ERROR!
     if ($@) {
         $tries++;
         if ($tries <= 10) {
             warn("Server problem on attempt $tries:$@.\nTrying  
again...");
             redo;
         } else {
             die("Repeated server issues after $tries attempts.");
             # could warn and just skip this batch of accs using 'next'
         }
     }

     $start = $end+1;
     $ct++;
     $tries = 0;
}



chris

On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote:

> Thank you both!
>
> Just in case it might be usefull for someone else, here are my  
> ramblings:
>
> 1. I first tried to adapt my script and fetch 500 sequences at a  
> time. It works, except that ~40% of the time NCBI gives the  
> following error and my script crashed:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>    The proxy server received an invalid
>    response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
>
> I tried to modify the script so that when the retrieval of a 500  
> sequence block crashes, it continues with the other blocks, but I  
> was unsuccessfull. It probably needs some better understanding of  
> BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
>
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
>
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 	
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
>
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
>
>
> 2. So I moved to the second solution and tried batchentrez. I cut my  
> 120,000 long AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
>
> and then submitted the 13 lists one by one. I must say that I don't  
> really like using a web-interface to fetch data, and here the most  
> ennoying part is that you end up with a regular Entrez/GenBank  
> webpage: select your format, export to file, chosse file name... and  
> have to do it many times.
> It is too much prone to human and web-browser errors for my taste,  
> but it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to  
> restart it
> - you can't submit several lists in the same time (otherwise the  
> same cookie will be used and you'll end up with several identical  
> files)
>
> -Tristan
>
> On Tuesday 29 January 2008 13:44:16 you wrote:
>> Forgot about that one; it's definitely a better way to do it if you
>> have the GI/accessions.
>>
>> chris
>>
>> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
>>> you don't need to use bioperl to accomplish this task, to download
>>> several thousand sequences based on accession ID list.
>>>
>>> NCBI batch Entrez can do that:
>>> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>>>
>>> just submit a large list of IDs, select database, and download.
>>>
>>> you can submit ~50,000 IDs in one file usually without problems.
>>> it may not return results if a list is larger than ~100,000 IDs
>>>
>>> --
>>> Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 Health Sciences Drive
>>> Genome Center, 4-th floor, room 4302
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>> Chris Fields wrote:
>>>> Yes, you can only retrieve ~500 sequences at a time using either
>>>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
>>>> interact with NCBI's EUtilities (the former module returns raw data
>>>> from the URL to be processed later, the latter module returns
>>>> Bio::Seq/Bio::SeqIO objects).
>>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
>>>> atasets You can usually post more IDs using epost and fetch  
>>>> sequence
>>>> referring to the WebEnv/key combo (batch posting).  I try to make
>>>> this a bit easier with EUtilities but it is woefully lacking in
>>>> documentation (my fault), but there is some code up on the wiki
>>>> which should work.
>>>> chris
>>>>
>>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to download a large number of sequences from GenBank
>>>>> (122,146 to be exact) following a list of accession numbers.
>>>>> I first investigated around Bio::DB::EUtilities, but got lost and
>>>>> finally used Bio::DB::GenBank.
>>>>> My script works well for short request, but it gives the following
>>>>> error with the long request:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: WebDBSeqI Request Error:
>>>>> 500 short write
>>>>> Content-Type: text/plain
>>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>>>> Client-Warning: Internal response
>>>>>
>>>>> 500 short write
>>>>>
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ 
>>>>> Root/
>>>>> Root.pm:359
>>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
>>>>> Bio/DB/WebDBSeqI.pm:685
>>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
>>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
>>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>>>> STACK: ./fetch_from_genbank.pl:58
>>>>> ---------------------------------------------------------
>>>>>
>>>>> Does that mean that we can only fetch 500 sequences at a time?
>>>>> Should I split my list in 500 ids framents and submit them one
>>>>> after the other?
>>>>>
>>>>> Any suggestions very welcomed...
>>>>> Thanks,
>>>>> -Tristan
>>>>>
>>>>>
>>>>> Here is the script:
>>>>>
>>>>> ##################################
>>>>> use strict;
>>>>> use warnings;
>>>>> use Bio::DB::GenBank;
>>>>> # use Bio::DB::EUtilities;
>>>>> use Bio::SeqIO;
>>>>> use Getopt::Long;
>>>>>
>>>>> # 2008-01-22 T Lefebure
>>>>> # I tried to use Bio::DB::EUtilities without much succes and get
>>>>> back to Bio::DB::GenBank.
>>>>> # The following procedure is not really good as the stream is
>>>>> first copied to a temporary file,
>>>>> # and than re-used by BioPerl to generate the final file.
>>>>>
>>>>> my $db = 'nucleotide';
>>>>> my $format = 'genbank';
>>>>> my $help= '';
>>>>> my $dformat = 'gb';
>>>>>
>>>>> GetOptions(
>>>>>   'help|?' => \$help,
>>>>>   'format=s'  => \$format,
>>>>>   'database=s'    => \$db,
>>>>> );
>>>>>
>>>>>
>>>>> my $printhelp = "\nUsage: $0 [options]   
>>>>> 
>>>>>
>>>>> Will download the corresponding data from GenBank. BioPerl is
>>>>> required.
>>>>>
>>>>> Options:
>>>>>   -h
>>>>>       print this help
>>>>>   -format: genbank|fasta|...
>>>>>       give output format (default=genbank)
>>>>>   -database: nucleotide|genome|protein|...
>>>>>       define the database to search in (default=nucleotide)
>>>>>
>>>>> The full description of the options can be find at
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ 
>>>>> efetchseq_help.html
>>>>> \n";
>>>>>
>>>>> if ($#ARGV<1) {
>>>>>   print $printhelp;
>>>>>   exit;
>>>>> }
>>>>>
>>>>> open LIST, $ARGV[0];
>>>>> my @list = ;
>>>>>
>>>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>>>
>>>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>>>               -format => $dformat,
>>>>>               -db => $db,
>>>>>           );
>>>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>>>
>>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>>>               -format => $format,
>>>>>           );
>>>>> while (my $seqo = $seqio->next_seq ) {
>>>>>   print $seqo->id, "\n";
>>>>>   $seqout->write_seq($seqo);
>>>>> }
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From georg.otto at tuebingen.mpg.de  Thu Jan 31 04:34:31 2008
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Thu, 31 Jan 2008 10:34:31 +0100
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: 

Hi,

I succeeded with a similar task using the seqhound database. I had a
list of > 200,000 gid numbers, but I guess it can work in a similar
fashion using accession numbers. Here is the script:

#!/usr/perl

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::Query::GenBank;
use Bio::DB::SeqHound;

my $sh = new Bio::DB::SeqHound();

my($USAGE) = "$0 id_file\n\n";

unless(@ARGV) {
	print $USAGE;
	exit;
}

my $id_file = $ARGV[0];

open ID_FILE, "<$id_file" or die "error: $!";

while () {
  chomp;
  my $id = $_;
  if (defined(my $seq_obj = $sh->get_Seq_by_gi($id))) {
    my $out = Bio::SeqIO->new(-format => 'fasta');
    $out->write_seq($seq_obj);
  } else {
    next;
  }
}


Best,

Georg


Tristan Lefebure  writes:

> Hello,
>
> I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. 
> My script works well for short request, but it gives the following error with the long request:
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after the other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first copied to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
> 	'help|?' => \$help,
> 	'format=s'  => \$format,
> 	'database=s'	=> \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is required.
>
> Options:
> 	-h
> 		print this help
> 	-format: genbank|fasta|...
> 		give output format (default=genbank)
> 	-database: nucleotide|genome|protein|...
> 		define the database to search in (default=nucleotide)
>
> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";
>
> if ($#ARGV<1) {
> 	print $printhelp;
> 	exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 				-format => $dformat,
> 				-db => $db,
> 			);
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> 				-format => $format,
> 			);
> while (my $seqo = $seqio->next_seq ) {
> 	print $seqo->id, "\n";
> 	$seqout->write_seq($seqo);
> }



From bernd.web at gmail.com  Thu Jan 31 05:48:15 2008
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 31 Jan 2008 11:48:15 +0100
Subject: [Bioperl-l] searchio/blast
Message-ID: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>

Hi,

I noticed that the HTMLWriter output for a BLAST report may not be
correct if more than one sequence was "blasted".

After the BLAST report of the first sequence the report is ended with:
Search Parameters
Parameter	Value

Search Statistics
Statistic	Value

Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on
Thu Jan 31 11:35:51 2008
Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37 tseemann Exp $

Then the second HTML blast report follows.
Although maybe generally 1 sequence is blasted by a user requiring
HTML output, this may be nice to fix?
Also for the HTML Writer of FastA reports the statistics section is empty,

An additional issue with HTMLWriter  containing more than 1 BLAST
report is the following:
When a sequence ID occurs more than once, the link (on the E-value) is
to the first occurrence since it is not report specific.

In case the above is regarded as unwanted, I'd be happy to make a
concise example with code.


Best regards,
Bernd


From cjfields at uiuc.edu  Thu Jan 31 07:39:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 31 Jan 2008 06:39:46 -0600
Subject: [Bioperl-l] searchio/blast
In-Reply-To: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>
References: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>
Message-ID: 

The easiest way to take care of these (so we don't forget about them  
and can track changes) is to add them as BioPerl bugs/enhancement  
requests to bugzilla, along with example reports and code.

chris

On Jan 31, 2008, at 4:48 AM, Bernd Web wrote:

> Hi,
>
> I noticed that the HTMLWriter output for a BLAST report may not be
> correct if more than one sequence was "blasted".
>
> After the BLAST report of the first sequence the report is ended with:
> Search Parameters
> Parameter	Value
>
> Search Statistics
> Statistic	Value
>
> Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on
> Thu Jan 31 11:35:51 2008
> Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37  
> tseemann Exp $
>
> Then the second HTML blast report follows.
> Although maybe generally 1 sequence is blasted by a user requiring
> HTML output, this may be nice to fix?
> Also for the HTML Writer of FastA reports the statistics section is  
> empty,
>
> An additional issue with HTMLWriter  containing more than 1 BLAST
> report is the following:
> When a sequence ID occurs more than once, the link (on the E-value) is
> to the first occurrence since it is not report specific.
>
> In case the above is regarded as unwanted, I'd be happy to make a
> concise example with code.
>
>
> Best regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From hlapp at gmx.net  Thu Jan 31 08:12:25 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 31 Jan 2008 08:12:25 -0500
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
Message-ID: 


On Jan 30, 2008, at 2:30 PM, snoze pa wrote:

> Hi Hilmar,
>
>  After spending lots of time i figure out the error. I am able to load
> sequences if the sequences do not have following entry
>
> xrefs (non-sequence databases):

Is this the literal value? I am asking because I can't find this in  
the file at

http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb

which you said was giving you grief. So does the genbank file above  
now load, or how can I identify the critical line in there?

	-hilmar
>
> If the Genbank sequence have this entry then script  
> load_seqdatabase.pl is
> crashing. I try it in couple of sequences and found it is the  
> culprit line
> genbank format.  But this line is important as it contain lots of
> information... so I am wondering how to solve this problem
>
> Any help?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From snoze.pa at gmail.com  Thu Jan 31 13:46:24 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Thu, 31 Jan 2008 12:46:24 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: 
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
Message-ID: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>

The link i sent was related to my tutorial. I was following that website.
The typical example is one of the following which have *xrefs (non-sequence
databases): line.
thanks
s
*
LOCUS       P27912                   792 aa            linear   VRL
15-JAN-2008
DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
            protein); prM; Peptide pr; Small envelope protein M (Matrix
            protein); Envelope protein E; Non-structural protein 1 (NS1)].
ACCESSION   P27912
VERSION     P27912.1  GI:130422
DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
            class: standard.
            created: Aug 1, 1992.
            sequence updated: Aug 1, 1992.
            annotation updated: Jan 15, 2008.
            xrefs: D00502.1, BAA00394.1, B32401
            *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
            GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
            InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
            InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:2.60.98.10,
            Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
Pfam:PF00869,
            Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
KEYWORDS    Capsid protein; Cleavage on pair of basic residues; Endoplasmic
            reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
            Transmembrane; Viral nucleoprotein; Virion.
SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
  ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
            Viruses; ssRNA positive-strand viruses, no DNA stage;
Flaviviridae;
            Flavivirus; Dengue virus group.
REFERENCE   1  (residues 1 to 792)
  AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
  TITLE     Genetic relatedness among structural protein genes of dengue 1
            virus strains
  JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
   PUBMED   2738579
  REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
            [FUNCTION] Protein C packages viral RNA to form a viral
            nucleocapsid, and promotes virion budding (By similarity).
            [FUNCTION] prM acts as a chaperone for envelope protein E during
            intracellular virion assembly by masking and inactivating
envelope
            protein E fusion peptide. prM is matured in the last step of
virion
            assembly, presumably to avoid catastrophic activation of the
viral
            fusion peptide induced by the acidic pH of the trans-Golgi
network.
            After cleavage by host furin, the pr peptide is released in the
            extracellular medium and small envelope protein M and envelope
            protein E homodimers are dissociated (By similarity).
            [FUNCTION] Envelope protein E binds cell surface receptor and is
            involved in membrane fusion between virion and target cell.
            Synthesized as an homodimer with prM which acts as a chaperone
for
            envelope protein E. After cleavage of prM, envelope protein E
            dissociate from small envelope protein M and homodimerizes (By
            similarity).
            [FUNCTION] Non-structural protein 1 is slowly secreted from
            mammalian cells, but not from mosquito cells. Secreted form
elicits
            protective immune response and plays an essential role in RNA
            replication. Soluble and membrane-associated NS1 may activate
human
            complement and induce host vascular leakage. This effect might
            explain the clinical manifestations of dengue hemorrhagic fever
and
            dengue shock syndrome (By similarity).
            [SUBUNIT] prM and envelope protein E form heterodimers in the
            endoplasmic reticulum and Golgi. Envelope protein E forms
            homodimers. NS1 forms homodimers as well as homohexamers when
            secreted. NS1 may interact with NS4A (By similarity).
            [SUBCELLULAR LOCATION] Note=The virion is assembled in the
            endoplasmic reticulum lumen, transported by vesicles to the
Golgi,
            then transported again to the cell membrane where it is released
            outside the cell.
            [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
            [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
            [SUBCELLULAR LOCATION] Small envelope protein M: Virion
membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
            Endoplasmic reticulum membrane; Peripheral membrane protein;
            Lumenal side (By similarity).
            [DOMAIN] Transmembrane domains of the small envelope protein M
and
            envelope protein E contains an endoplasmic reticulum retention
            signals (By similarity).
            [PTM] Specific enzymatic cleavages in vivo yield mature
proteins.
            The nascent protein C contains a C-terminal hydrophobic domain
that
            act as a signal sequence for translocation of prM into the lumen
of
            the ER. Mature protein C is cleaved at a site upstream of this
            hydrophobic domain by NS3. prM is cleaved in post-Golgi vesicles
by
            a host furin, releasing the mature small envelope protein M, and
            peptide pr (By similarity).
            [PTM] Envelope protein E and non-structural protein 1 are
            N-glycosylated (By similarity).
FEATURES             Location/Qualifiers
     source          1..792
                     /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
                     /specific_host="Aedes aegypti (Yellowfever mosquito)"
                     /specific_host="Homo sapiens (Human)"
                     /db_xref="taxon:11057"
     Protein         1..>792
                     /product="Genome polyprotein [Contains: Protein C"
     Region          1..101
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          1..100
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Protein C. /FTId=PRO_0000037884."
     Region          5..114
                     /region_name="Flavi_capsid"
                     /note="Flavivirus capsid protein C. Flaviviruses are
small
                     enveloped viruses with virions comprised of 3 proteins
                     called C, M and E. Multiple copies of the C protein
form
                     the nucleocapsid, which contains the ssRNA molecule;
                     pfam01003"
                     /db_xref="CDD:85176"
     Site            100..101
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by serine protease NS3 (By
similarity)."
     Region          101..114
                     /region_name="Propeptide"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="ER anchor for the protein C, removed in mature
form
                     by serine protease NS3. /FTId=PRO_0000037885."
     Region          102..122
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            114..115
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          115..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="prM. /FTId=PRO_0000264649."
     Region          115..205
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Peptide pr. /FTId=PRO_0000264650."
     Region          119..204
                     /region_name="Flavi_propep"
                     /note="Flavivirus polyprotein propeptide. The
flaviviruses
                     are small enveloped animal viruses containing a single
                     positive strand genomic RNA. The genome encodes one
large
                     ORF a polyprotein which undergos proteolytic processing
                     into mature viral peptide chains; pfam01570"
                     /db_xref="CDD:65376"
     Region          123..238
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            183
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Site            205..206
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host furin (By similarity)."
     Region          206..280
                     /region_name="Flavi_M"
                     /note="Flavivirus envelope glycoprotein M. Flaviviruses
                     are small enveloped viruses with virions comprised of 3
                     proteins called C, M and E. The envelope glycoprotein M
is
                     made as a precursor, called prM; pfam01004"
                     /db_xref="CDD:85177"
     Region          206..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Small envelope protein M. /FTId=PRO_0000037886."
     Region          239..259
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          260..265
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          266..286
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            280..281
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          281..775
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Envelope protein E. /FTId=PRO_0000037887."
     Region          281..576
                     /region_name="Flavi_glycoprot"
                     /note="Flavivirus glycoprotein, central and
dimerisation
                     domains; pfam00869"
                     /db_xref="CDD:85082"
     Bond            bond(283,310)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          287..725
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Bond            bond(340,401)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            347
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(354,385)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Bond            bond(372,396)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            433
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(465,565)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          578..673
                     /region_name="Flavi_glycop_C"
                     /note="Flavivirus glycoprotein, immunoglobulin-like
                     domain; pfam02832"
                     /db_xref="CDD:66513"
     Bond            bond(582,613)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          726..746
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          747..752
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          753..773
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          774..>792
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            775..776
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          776..>792
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Non-structural protein 1. /FTId=PRO_0000037888."
ORIGIN
        1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf vaflrflaip
       61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp talafhlttr
      121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm teaepddvdc
      181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega wkqiqkvetw
      241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd fveglsgatw
      301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt dsrcptqgea
      361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv qyenlkysvi
      421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg ldfnrvvllt
      481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev vvlgsqegam
      541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek evaetqhgtv
      601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae ppfgesyivv
      661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft svgklihqif
      721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg vmvqadsgcv
      781 inwkgkelkc gs
//


On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:

>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>


From hlapp at gmx.net  Thu Jan 31 15:10:35 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 31 Jan 2008 15:10:35 -0500
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
	<10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
Message-ID: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>

I see. Note that the sequence below is really a UniProt sequence,  
that has been reformatted into GenBank format, and hence aren't in  
your typical genbank sequence format (which usually lacks DBSOURCE,  
for example). (The joys of data integration.)

If you load the same sequence from UniProt, does it still fail to  
parse or to load?

Also, does it or does this not mean that sequences at the link you  
sent load w/o error? I.e., can I close that issue report, or is there  
a bug in bioperl-db?

	-hilmar

On Jan 31, 2008, at 1:46 PM, snoze pa wrote:

> The link i sent was related to my tutorial. I was following that  
> website. The typical example is one of the following which have  
> xrefs (non-sequence databases): line.
> thanks
> s
>
> LOCUS       P27912                   792 aa            linear   VRL  
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein)  
> (Capsid
>             protein); prM; Peptide pr; Small envelope protein M  
> (Matrix
>             protein); Envelope protein E; Non-structural protein 1  
> (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             xrefs (non-sequence databases): HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069,  
> InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA: 
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,  
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;  
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane;  
> Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;  
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of  
> dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein  
> E during
>             intracellular virion assembly by masking and  
> inactivating envelope
>             protein E fusion peptide. prM is matured in the last  
> step of virion
>             assembly, presumably to avoid catastrophic activation  
> of the viral
>             fusion peptide induced by the acidic pH of the trans- 
> Golgi network.
>             After cleavage by host furin, the pr peptide is  
> released in the
>             extracellular medium and small envelope protein M and  
> envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface  
> receptor and is
>             involved in membrane fusion between virion and target  
> cell.
>             Synthesized as an homodimer with prM which acts as a  
> chaperone for
>             envelope protein E. After cleavage of prM, envelope  
> protein E
>             dissociate from small envelope protein M and  
> homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted  
> from
>             mammalian cells, but not from mosquito cells. Secreted  
> form elicits
>             protective immune response and plays an essential role  
> in RNA
>             replication. Soluble and membrane-associated NS1 may  
> activate human
>             complement and induce host vascular leakage. This  
> effect might
>             explain the clinical manifestations of dengue  
> hemorrhagic fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers  
> in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as  
> homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to  
> the Golgi,
>             then transported again to the cell membrane where it is  
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By  
> similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane  
> protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope  
> protein M and
>             envelope protein E contains an endoplasmic reticulum  
> retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature  
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic  
> domain that
>             act as a signal sequence for translocation of prM into  
> the lumen of
>             the ER. Mature protein C is cleaved at a site upstream  
> of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi  
> vesicles by
>             a host furin, releasing the mature small envelope  
> protein M, and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF  
> 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever  
> mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains:  
> Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C.  
> Flaviviruses are small
>                      enveloped viruses with virions comprised of 3  
> proteins
>                      called C, M and E. Multiple copies of the C  
> protein form
>                      the nucleocapsid, which contains the ssRNA  
> molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By  
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in  
> mature form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The  
> flaviviruses
>                      are small enveloped animal viruses containing  
> a single
>                      positive strand genomic RNA. The genome  
> encodes one large
>                      ORF a polyprotein which undergos proteolytic  
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.  
> Flaviviruses
>                      are small enveloped viruses with virions  
> comprised of 3
>                      proteins called C, M and E. The envelope  
> glycoprotein M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Small envelope protein M. / 
> FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and  
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin- 
> like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Non-structural protein 1. / 
> FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf  
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp  
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm  
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega  
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd  
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt  
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv  
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg  
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev  
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek  
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae  
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft  
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg  
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:
>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to  
> load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From snoze.pa at gmail.com  Thu Jan 31 15:21:18 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Thu, 31 Jan 2008 14:21:18 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
	<10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
	<3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>
Message-ID: <10f848910801311221q2a9f0d02x6c4600048f05adab@mail.gmail.com>

Thanks Hilmar,

 I also thought that they are translated into genbank format. My problem is
i have downloaded tons of sequences from NCBI in gb format. In my flat
file,  i have many sequences in this format so I am unable to load them into
local database using  load_seqdatabase.pl script. So far i am full of
warnings and errors. Any solution to this problem? otherwise i will try to
write some code to load all sequences into local data base. But it seems to
be easy to modify the parsing code so that we can load these sequences.


>format (which usually lacks DBSOURCE, for example

I think if the three dimensional structure of the protein is known then in
ncbi gb format the DBSOURCE is common. I agree with you, the joys of
integration.

The link was related to tutorial i was using.. u can off it.

Thanks for looking into matter..
 s

On Jan 31, 2008 2:10 PM, Hilmar Lapp  wrote:

> I see. Note that the sequence below is really a UniProt sequence, that has
> been reformatted into GenBank format, and hence aren't in your typical
> genbank sequence format (which usually lacks DBSOURCE, for example). (The
> joys of data integration.)
> If you load the same sequence from UniProt, does it still fail to parse or
> to load?
>
> Also, does it or does this not mean that sequences at the link you sent
> load w/o error? I.e., can I close that issue report, or is there a bug in
> bioperl-db?
>
> -hilmar
>
> On Jan 31, 2008, at 1:46 PM, snoze pa wrote:
>
> The link i sent was related to my tutorial. I was following that website.
> The typical example is one of the following which have *xrefs
> (non-sequence databases): line.
> thanks
> s
> *
> LOCUS       P27912                   792 aa            linear   VRL
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
>             protein); prM; Peptide pr; Small envelope protein M (Matrix
>             protein); Envelope protein E; Non-structural protein 1 (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein E
> during
>             intracellular virion assembly by masking and inactivating
> envelope
>             protein E fusion peptide. prM is matured in the last step of
> virion
>             assembly, presumably to avoid catastrophic activation of the
> viral
>             fusion peptide induced by the acidic pH of the trans-Golgi
> network.
>             After cleavage by host furin, the pr peptide is released in
> the
>             extracellular medium and small envelope protein M and envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface receptor and
> is
>             involved in membrane fusion between virion and target cell.
>             Synthesized as an homodimer with prM which acts as a chaperone
> for
>             envelope protein E. After cleavage of prM, envelope protein E
>             dissociate from small envelope protein M and homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted from
>             mammalian cells, but not from mosquito cells. Secreted form
> elicits
>             protective immune response and plays an essential role in RNA
>             replication. Soluble and membrane-associated NS1 may activate
> human
>             complement and induce host vascular leakage. This effect might
>             explain the clinical manifestations of dengue hemorrhagic
> fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to the
> Golgi,
>             then transported again to the cell membrane where it is
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope protein M
> and
>             envelope protein E contains an endoplasmic reticulum retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic domain
> that
>             act as a signal sequence for translocation of prM into the
> lumen of
>             the ER. Mature protein C is cleaved at a site upstream of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi
> vesicles by
>             a host furin, releasing the mature small envelope protein M,
> and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains: Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C. Flaviviruses are
> small
>                      enveloped viruses with virions comprised of 3
> proteins
>                      called C, M and E. Multiple copies of the C protein
> form
>                      the nucleocapsid, which contains the ssRNA molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in mature
> form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The
> flaviviruses
>                      are small enveloped animal viruses containing a
> single
>                      positive strand genomic RNA. The genome encodes one
> large
>                      ORF a polyprotein which undergos proteolytic
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.
> Flaviviruses
>                      are small enveloped viruses with virions comprised of
> 3
>                      proteins called C, M and E. The envelope glycoprotein
> M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Small envelope protein M.
> /FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin-like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Non-structural protein 1.
> /FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:
>
> >
> > On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
> >
> > > Hi Hilmar,
> > >
> > >  After spending lots of time i figure out the error. I am able to load
> > > sequences if the sequences do not have following entry
> > >
> > > xrefs (non-sequence databases):
> >
> > Is this the literal value? I am asking because I can't find this in
> > the file at
> >
> > http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
> >
> > which you said was giving you grief. So does the genbank file above
> > now load, or how can I identify the critical line in there?
> >
> >        -hilmar
> > >
> > > If the Genbank sequence have this entry then script
> > > load_seqdatabase.pl is
> > > crashing. I try it in couple of sequences and found it is the
> > > culprit line
> > > genbank format.  But this line is important as it contain lots of
> > > information... so I am wondering how to solve this problem
> > >
> > > Any help?
> > >
> > > Thanks in advance
> > > s
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>


From Laurence.Amilhat at toulouse.inra.fr  Thu Jan  3 14:29:09 2008
From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat)
Date: Thu, 03 Jan 2008 15:29:09 +0100
Subject: [Bioperl-l] BioPerl and NHX tree
Message-ID: <477CF135.9060104@toulouse.inra.fr>

Dear all,

I am trying to convert a newick tree into an NHX tree, so I can add the 
taxid tag for each leaf.

I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
The idea is
1) to read the newick tree
2) get the leaf, and get the corresponding taxid for it
3) add the nhx species tag
4) write the nhx tree

I was able to do the first 2 steps, and I could create an object 
node_nhx and add the tag T,
but I don't know how to write an nhx Tree with the node_nhx previously 
created...

Does anyone have an idea? any help are welcome.

Thanks,

laurence.


Here are my code and the samples files for better understanding:
newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt

_newick2nhx.pl:_
use strict;
use Bio::TreeIO;
use Bio::Tree::NodeNHX;
use Getopt::Long;


my $tree_file;
my $outfile;
my $codefile;
my %corresp;

GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
=>\$codefile);

open (CODE, "< $codefile");
while ()
{
    chomp;
    my($a, $b)=split (/\t/);
    $corresp{$a}=$b;
}


my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file");
my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");

while (my $tree= $treeio->next_tree)
{
    my @nodes=$tree->get_nodes();
    foreach my $nd(@nodes)
    {
        if ($nd->is_Leaf())
        {
            my $id=$nd->id();
            print "$id TAXID ",$corresp{$id},"\n";
           
            my $nodenhx=new Bio::Tree::NodeNHX();
            $nodenhx->nhx_tag({T=>$corresp{$id}});
        }
    }
    $treeout->write_tree($tree);
}


_test_tree.nwk_:
(((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
(42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,AAEL015662:100.0):100.0,
42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
42558941:100.0);

_seq_taxid.txt:_
AAEL015662      7159
42558969        9606
42558981        10090
42558942        9606
42558970        6239
42558929        10116
42558987        9606
42558930        10116
42558943        9606
148887393       10090
42558958        10090
42558941        9606
56405380        10090
90185247        9606
66774197        6239


_And the tata resulting file:_
(((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.0[&&NHX],AAEL01566
2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);




-- 
====================================================================
= Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan     	   = 
= Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
====================================================================





From aaron.j.mackey at gsk.com  Thu Jan  3 15:12:22 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Thu, 3 Jan 2008 10:12:22 -0500
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: <477CF135.9060104@toulouse.inra.fr>
Message-ID: 

Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that 
way, your tree's nodes are already NodeNHX's.  Instead of creating a new 
$nodenhx, you can use the $node variable directly from the tree ...

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM:

> Dear all,
> 
> I am trying to convert a newick tree into an NHX tree, so I can add the 
> taxid tag for each leaf.
> 
> I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
> The idea is
> 1) to read the newick tree
> 2) get the leaf, and get the corresponding taxid for it
> 3) add the nhx species tag
> 4) write the nhx tree
> 
> I was able to do the first 2 steps, and I could create an object 
> node_nhx and add the tag T,
> but I don't know how to write an nhx Tree with the node_nhx previously 
> created...
> 
> Does anyone have an idea? any help are welcome.
> 
> Thanks,
> 
> laurence.
> 
> 
> Here are my code and the samples files for better understanding:
> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt
> 
> _newick2nhx.pl:_
> use strict;
> use Bio::TreeIO;
> use Bio::Tree::NodeNHX;
> use Getopt::Long;
> 
> 
> my $tree_file;
> my $outfile;
> my $codefile;
> my %corresp;
> 
> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
> =>\$codefile);
> 
> open (CODE, "< $codefile");
> while ()
> {
>     chomp;
>     my($a, $b)=split (/\t/);
>     $corresp{$a}=$b;
> }
> 
> 
> my $treeio = new Bio::TreeIO (-format => 'newick', -file => 
"$tree_file");
> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
> 
> while (my $tree= $treeio->next_tree)
> {
>     my @nodes=$tree->get_nodes();
>     foreach my $nd(@nodes)
>     {
>         if ($nd->is_Leaf())
>         {
>             my $id=$nd->id();
>             print "$id TAXID ",$corresp{$id},"\n";
> 
>             my $nodenhx=new Bio::Tree::NodeNHX();
>             $nodenhx->nhx_tag({T=>$corresp{$id}});
>         }
>     }
>     $treeout->write_tree($tree);
> }
> 
> 
> _test_tree.nwk_:
> 
(((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
> 
42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,
> AAEL015662:100.0):100.0,
> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
> 42558941:100.0);
> 
> _seq_taxid.txt:_
> AAEL015662      7159
> 42558969        9606
> 42558981        10090
> 42558942        9606
> 42558970        6239
> 42558929        10116
> 42558987        9606
> 42558930        10116
> 42558943        9606
> 148887393       10090
> 42558958        10090
> 42558941        9606
> 56405380        10090
> 90185247        9606
> 66774197        6239
> 
> 
> _And the tata resulting file:_
> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,
> 
(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,
> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.
> 0[&&NHX],AAEL01566
> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],
> 
(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);
> 
> 
> 
> 
> -- 
> ====================================================================
> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan           = 
> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
> ====================================================================
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From Laurence.Amilhat at toulouse.inra.fr  Fri Jan  4 08:33:22 2008
From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat)
Date: Fri, 04 Jan 2008 09:33:22 +0100
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: 
References: 
Message-ID: <477DEF52.20802@toulouse.inra.fr>

Thank you Aaron,

it's working now. I've changed to species instead of taxid, so I can 
color the species on my tree using the ATV viewer.
thanks again,

Regards,

Laurence.



aaron.j.mackey at gsk.com a ?crit :
> Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that 
> way, your tree's nodes are already NodeNHX's.  Instead of creating a new 
> $nodenhx, you can use the $node variable directly from the tree ...
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM:
>
>   
>> Dear all,
>>
>> I am trying to convert a newick tree into an NHX tree, so I can add the 
>> taxid tag for each leaf.
>>
>> I am using the modules: Bio::TreeIO  & Bio::Tree::NodeNHX
>> The idea is
>> 1) to read the newick tree
>> 2) get the leaf, and get the corresponding taxid for it
>> 3) add the nhx species tag
>> 4) write the nhx tree
>>
>> I was able to do the first 2 steps, and I could create an object 
>> node_nhx and add the tag T,
>> but I don't know how to write an nhx Tree with the node_nhx previously 
>> created...
>>
>> Does anyone have an idea? any help are welcome.
>>
>> Thanks,
>>
>> laurence.
>>
>>
>> Here are my code and the samples files for better understanding:
>> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt
>>
>> _newick2nhx.pl:_
>> use strict;
>> use Bio::TreeIO;
>> use Bio::Tree::NodeNHX;
>> use Getopt::Long;
>>
>>
>> my $tree_file;
>> my $outfile;
>> my $codefile;
>> my %corresp;
>>
>> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' 
>> =>\$codefile);
>>
>> open (CODE, "< $codefile");
>> while ()
>> {
>>     chomp;
>>     my($a, $b)=split (/\t/);
>>     $corresp{$a}=$b;
>> }
>>
>>
>> my $treeio = new Bio::TreeIO (-format => 'newick', -file => 
>>     
> "$tree_file");
>   
>> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
>>
>> while (my $tree= $treeio->next_tree)
>> {
>>     my @nodes=$tree->get_nodes();
>>     foreach my $nd(@nodes)
>>     {
>>         if ($nd->is_Leaf())
>>         {
>>             my $id=$nd->id();
>>             print "$id TAXID ",$corresp{$id},"\n";
>>
>>             my $nodenhx=new Bio::Tree::NodeNHX();
>>             $nodenhx->nhx_tag({T=>$corresp{$id}});
>>         }
>>     }
>>     $treeout->write_tree($tree);
>> }
>>
>>
>> _test_tree.nwk_:
>>
>>     
> (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0,
>   
> 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0,
>   
>> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,
>> AAEL015662:100.0):100.0,
>> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0,
>> 42558941:100.0);
>>
>> _seq_taxid.txt:_
>> AAEL015662      7159
>> 42558969        9606
>> 42558981        10090
>> 42558942        9606
>> 42558970        6239
>> 42558929        10116
>> 42558987        9606
>> 42558930        10116
>> 42558943        9606
>> 148887393       10090
>> 42558958        10090
>> 42558941        9606
>> 56405380        10090
>> 90185247        9606
>> 66774197        6239
>>
>>
>> _And the tata resulting file:_
>> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,
>>
>>     
> (42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],(((((
>   
>> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,
>> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.
>> 0[&&NHX],AAEL01566
>> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],
>>
>>     
> (42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0);
>   
>>
>>
>> -- 
>> ====================================================================
>> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan           = 
>> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
>> ====================================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
>   


-- 
====================================================================
= Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan     	   = 
= Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
====================================================================





From hlapp at gmx.net  Mon Jan  7 03:02:32 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 6 Jan 2008 22:02:32 -0500
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
In-Reply-To: 
References: 
Message-ID: <640890C9-2D34-4C70-9179-26A9EAB397D2@gmx.net>

Hi Zhihua, you didn't ever respond to Marc's link to the Persistent  
Bioperl slides - did that help?

	-hilmar

On Dec 6, 2007, at 11:25 PM, zhihuali wrote:

>
> Hi netters,
>
> I've installed BioSQL and bioperl-db, and successfully created and  
> stored a persistent object:
>
> use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
> my $dbadp=Bio::DB::BioDB->new(- 
> database=>'biosql',                             - 
> user=>'annoymous',                             -dbname=>'bioseqdb');
>
> my $seqobj=Bio::Seq->new(- 
> accession_number=>"test",                      - 
> id=>"test1",                      - 
> seq=>"AGCTAGCT",                      -version=>1);my $dbobj=$dbadp- 
> >create_persistent($seqobj);$dbobj->create;$dbobj->commit;
>
> It's successful because I found corresponding rows in the bioseqdb  
> tables.
>
> Now I want to retrieve the object back from the database. There's  
> not much documents available and I've tried find_by_unique_key/ 
> primary_key but all failed. Maybe I didn't use them correctly.  
> Could anyone give me an example as how to retrieve the stored  
> Bio::Seq object?
>
> Thanks a lot!
>
> Zhihua Li
> _________________________________________________________________
> ? Live Search ???????
> http://www.live.com/?searchOnly=true
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








From cain.cshl at gmail.com  Mon Jan  7 17:24:02 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 12:24:02 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
Message-ID: <1199726642.6374.10.camel@frissell>

Hello,

I was trying to get bioperl-live this morning from either cvs or svn and
failed.  I was wondering if something was going on with the server.

Here are the things I tried:

  cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co bioperl-live

which resulted in this:

cvs checkout: warning: cannot write to history file /home/repository/bioperl/CVSROOT/history: Permission denied
cvs checkout: Updating bioperl-live
cvs checkout: failed to create lock directory for `/home/repository/bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/#cvs.lock): Permission denied
cvs checkout: failed to obtain dir lock in repository `/home/repository/bioperl/bioperl-live'
cvs [checkout aborted]: read lock failed - giving up

Then I thought I'd try the suggested svn checkout method from the
bioperl wiki:

  svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live

which resulted in

svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live'

Finally, I after looking at the openbio server, I thought I'd try this:

   svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/bioperl/bioperl-live

which resulted in repeated requests for my password (which I supplied
correctly at least once out of the several requests).

So, what's up?

Thanks much,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From hlapp at gmx.net  Mon Jan  7 17:36:02 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 7 Jan 2008 12:36:02 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <1199726642.6374.10.camel@frissell>
References: <1199726642.6374.10.camel@frissell>
Message-ID: 

I think we are still migrating to svn. It's probably better to wait  
for the announcement that everything is ready to go. (And then cvs  
won't work anymore except for anonymous checkout - which should  
actually continue to work while this is in progress. Have you tried  
that?)

	-hilmar

On Jan 7, 2008, at 12:24 PM, Scott Cain wrote:

> Hello,
>
> I was trying to get bioperl-live this morning from either cvs or  
> svn and
> failed.  I was wondering if something was going on with the server.
>
> Here are the things I tried:
>
>   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> bioperl-live
>
> which resulted in this:
>
> cvs checkout: warning: cannot write to history file /home/ 
> repository/bioperl/CVSROOT/history: Permission denied
> cvs checkout: Updating bioperl-live
> cvs checkout: failed to create lock directory for `/home/repository/ 
> bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> #cvs.lock): Permission denied
> cvs checkout: failed to obtain dir lock in repository `/home/ 
> repository/bioperl/bioperl-live'
> cvs [checkout aborted]: read lock failed - giving up
>
> Then I thought I'd try the suggested svn checkout method from the
> bioperl wiki:
>
>   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> bioperl-live
>
> which resulted in
>
> svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> hartzell/bioperl/bioperl-live'
>
> Finally, I after looking at the openbio server, I thought I'd try  
> this:
>
>    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> bioperl/bioperl-live
>
> which resulted in repeated requests for my password (which I supplied
> correctly at least once out of the several requests).
>
> So, what's up?
>
> Thanks much,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From jason at bioperl.org  Mon Jan  7 17:43:18 2008
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 7 Jan 2008 09:43:18 -0800
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <1199726642.6374.10.camel@frissell>
References: <1199726642.6374.10.camel@frissell>
Message-ID: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>

CVS r/w is locked because we are transitioning to SVN - you can still  
checkout via anonymous CVS on code.open-bio.org.

The SVN is going to be in /home/svn-repositories/bioperl not George's  
directory, but we are still monkeying around with the directory  
structure.  You can try a checkout but be warned it may change a few  
more times if we add another directory layer in there.

You will get requests for your password at least three times - I  
strongly suggest you use SSH keys to avoid getting prompted each time  
- I don't know why you get asked 3 times as it is a SVN thing I  
assume it is having to make 3 separate requests to do a checkout.

That's what is up for now.  We'll report when the final SVN migration  
is done.

-jason
On Jan 7, 2008, at 9:24 AM, Scott Cain wrote:

> Hello,
>
> I was trying to get bioperl-live this morning from either cvs or  
> svn and
> failed.  I was wondering if something was going on with the server.
>
> Here are the things I tried:
>
>   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> bioperl-live
>
> which resulted in this:
>
> cvs checkout: warning: cannot write to history file /home/ 
> repository/bioperl/CVSROOT/history: Permission denied
> cvs checkout: Updating bioperl-live
> cvs checkout: failed to create lock directory for `/home/repository/ 
> bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> #cvs.lock): Permission denied
> cvs checkout: failed to obtain dir lock in repository `/home/ 
> repository/bioperl/bioperl-live'
> cvs [checkout aborted]: read lock failed - giving up
>
> Then I thought I'd try the suggested svn checkout method from the
> bioperl wiki:
>
>   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> bioperl-live
>
> which resulted in
>
> svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> hartzell/bioperl/bioperl-live'
>
> Finally, I after looking at the openbio server, I thought I'd try  
> this:
>
>    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> bioperl/bioperl-live
>
> which resulted in repeated requests for my password (which I supplied
> correctly at least once out of the several requests).
>
> So, what's up?
>
> Thanks much,
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> ______________________________________________



From cain.cshl at gmail.com  Mon Jan  7 17:57:38 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 12:57:38 -0500
Subject: [Bioperl-l] Anything up with cvs/svn?
In-Reply-To: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>
References: <1199726642.6374.10.camel@frissell>
	<5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org>
Message-ID: <1199728658.6374.12.camel@frissell>

Hi Hilmar and Jason,

Thanks--for some reason, I thought svn was done.  I'll remain anonymous
for right now (Kind of difficult to do when you announce it publicly :-)

Thanks,
Scott

On Mon, 2008-01-07 at 09:43 -0800, Jason Stajich wrote:
> CVS r/w is locked because we are transitioning to SVN - you can still  
> checkout via anonymous CVS on code.open-bio.org.
> 
> The SVN is going to be in /home/svn-repositories/bioperl not George's  
> directory, but we are still monkeying around with the directory  
> structure.  You can try a checkout but be warned it may change a few  
> more times if we add another directory layer in there.
> 
> You will get requests for your password at least three times - I  
> strongly suggest you use SSH keys to avoid getting prompted each time  
> - I don't know why you get asked 3 times as it is a SVN thing I  
> assume it is having to make 3 separate requests to do a checkout.
> 
> That's what is up for now.  We'll report when the final SVN migration  
> is done.
> 
> -jason
> On Jan 7, 2008, at 9:24 AM, Scott Cain wrote:
> 
> > Hello,
> >
> > I was trying to get bioperl-live this morning from either cvs or  
> > svn and
> > failed.  I was wondering if something was going on with the server.
> >
> > Here are the things I tried:
> >
> >   cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co  
> > bioperl-live
> >
> > which resulted in this:
> >
> > cvs checkout: warning: cannot write to history file /home/ 
> > repository/bioperl/CVSROOT/history: Permission denied
> > cvs checkout: Updating bioperl-live
> > cvs checkout: failed to create lock directory for `/home/repository/ 
> > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ 
> > #cvs.lock): Permission denied
> > cvs checkout: failed to obtain dir lock in repository `/home/ 
> > repository/bioperl/bioperl-live'
> > cvs [checkout aborted]: read lock failed - giving up
> >
> > Then I thought I'd try the suggested svn checkout method from the
> > bioperl wiki:
> >
> >   svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ 
> > bioperl-live
> >
> > which resulted in
> >
> > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ 
> > hartzell/bioperl/bioperl-live'
> >
> > Finally, I after looking at the openbio server, I thought I'd try  
> > this:
> >
> >    svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ 
> > bioperl/bioperl-live
> >
> > which resulted in repeated requests for my password (which I supplied
> > correctly at least once out of the several requests).
> >
> > So, what's up?
> >
> > Thanks much,
> > Scott
> >
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                    
> > cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> > ______________________________________________
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From cain.cshl at gmail.com  Mon Jan  7 18:34:25 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 13:34:25 -0500
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
Message-ID: <1199730865.6374.18.camel@frissell>

Hello,

I was wanting to implement this myself (and probably still will,
assuming it's not already there...) but I am not a Module::Build guru.
Here's what I'd like to do: add a parameter that I can add when evoking
perl Build.PL so that the default answers will be used when it would
normally ask me a question while running perl Build.PL, something like
this:

  perl Build.PL --yes

Is this sort of thing already built into Module::Build and I can't see
it?  Or can somebody suggest the best way of going about this?

Thanks much,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From cjfields at uiuc.edu  Mon Jan  7 22:22:35 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 7 Jan 2008 16:22:35 -0600
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
In-Reply-To: <1199730865.6374.18.camel@frissell>
References: <1199730865.6374.18.camel@frissell>
Message-ID: <31AD254B-DABA-488D-BDA8-D690F949CC39@uiuc.edu>

I agree it would be nice.  Not sure how hard it would be to implement;  
maybe it would be best to have a mode of installation, say if one  
wanted 'minimal' (no optional module installation, no scripts),  
'full', 'dev', (assume minimal install but don't test), and so on,  
falling back to the query-based approach if nothing is indicated.

chris

On Jan 7, 2008, at 12:34 PM, Scott Cain wrote:

> Hello,
>
> I was wanting to implement this myself (and probably still will,
> assuming it's not already there...) but I am not a Module::Build guru.
> Here's what I'd like to do: add a parameter that I can add when  
> evoking
> perl Build.PL so that the default answers will be used when it would
> normally ask me a question while running perl Build.PL, something like
> this:
>
>  perl Build.PL --yes
>
> Is this sort of thing already built into Module::Build and I can't see
> it?  Or can somebody suggest the best way of going about this?
>
> Thanks much,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From bix at sendu.me.uk  Mon Jan  7 22:37:36 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 07 Jan 2008 22:37:36 +0000
Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL`
In-Reply-To: <1199730865.6374.18.camel@frissell>
References: <1199730865.6374.18.camel@frissell>
Message-ID: <4782A9B0.60203@sendu.me.uk>

Scott Cain wrote:
> Hello,
> 
> I was wanting to implement this myself (and probably still will,
> assuming it's not already there...) but I am not a Module::Build guru.
> Here's what I'd like to do: add a parameter that I can add when evoking
> perl Build.PL so that the default answers will be used when it would
> normally ask me a question while running perl Build.PL, something like
> this:
> 
>   perl Build.PL --yes
> 
> Is this sort of thing already built into Module::Build and I can't see
> it?  Or can somebody suggest the best way of going about this?

You should ask on the Module::Build mailing list. If it already exists I 
don't think it is obvious, however.

If your question is BioPerl related, and you're looking for a fast way 
of installing BioPerl without the annoying questions, I'm sure I could 
hack something into ModuleBuildBioperl.pm


From cain.cshl at gmail.com  Tue Jan  8 03:04:19 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 07 Jan 2008 22:04:19 -0500
Subject: [Bioperl-l] Automatically accepting defaults for `perl	Build.PL`
In-Reply-To: <4782A9B0.60203@sendu.me.uk>
References: <1199730865.6374.18.camel@frissell> <4782A9B0.60203@sendu.me.uk>
Message-ID: <1199761459.6017.1.camel@frissell>

Hi Sendu,

I just hacked something up (I only needed to change a few lines--once I
figured out where everything was).  I like Chris' idea though; before I
commit it back (Ha, no rush there), I'll flesh it out a little more to
give more options.

Scott

On Mon, 2008-01-07 at 22:37 +0000, Sendu Bala wrote:
> Scott Cain wrote:
> > Hello,
> > 
> > I was wanting to implement this myself (and probably still will,
> > assuming it's not already there...) but I am not a Module::Build guru.
> > Here's what I'd like to do: add a parameter that I can add when evoking
> > perl Build.PL so that the default answers will be used when it would
> > normally ask me a question while running perl Build.PL, something like
> > this:
> > 
> >   perl Build.PL --yes
> > 
> > Is this sort of thing already built into Module::Build and I can't see
> > it?  Or can somebody suggest the best way of going about this?
> 
> You should ask on the Module::Build mailing list. If it already exists I 
> don't think it is obvious, however.
> 
> If your question is BioPerl related, and you're looking for a fast way 
> of installing BioPerl without the annoying questions, I'm sure I could 
> hack something into ModuleBuildBioperl.pm
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From granjeau at tagc.univ-mrs.fr  Wed Jan  9 08:30:17 2008
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Wed, 09 Jan 2008 09:30:17 +0100
Subject: [Bioperl-l] Parsing SwissProt annotation in comment
Message-ID: <47848619.40109@tagc.univ-mrs.fr>

Hello,

I would like to retrieve the human reviewed annotation of SwissProt 
entries; these information are in the comment section of the sequence 
file. Here is an example:

CC   -!- FUNCTION: Actins are highly conserved proteins that are involved
CC       in various types of cell motility and are ubiquitously expressed
CC       in all eukaryotic cells.
CC   -!- SUBUNIT: Polymerization of globular actin (G-actin) leads to a
CC       structural filament (F-actin) in the form of a two-stranded helix.
CC       Each actin can bind to 4 others. Found in a complex with XPO6,
CC       Ran, ACTB and PFN1. Component of a complex composed at least of
CC       ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with XPO6.
CC   -!- INTERACTION:
CC       Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668;
CC       P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161;
CC   -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton.

Is there a specific method to do such a job?

Thanks much,
Samuel

-- 

Samuel GRANJEAUD                   granjeau at tagc.univ-mrs.fr
INSERM - ICIM - TAGC               Tel: +33  (0)491 82 87 24
http://tagc.univ-mrs.fr            Fax: +33  (0)491 82 87 01
http://icim.marseille.inserm.fr/proteomique



From robfsouza at gmail.com  Wed Jan  9 13:20:08 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Wed, 9 Jan 2008 11:20:08 -0200
Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs
Message-ID: 

Hello All!

Greetings for everybody and happy new year for those following an
western calendary!

I'm starting a new project to store and analyze distinct sets of
sequence annotation data which are related in a way suitable for
representation in a directed (e.g. transcript splicing) or undirected
(e.g. gene product interaction) graph. Analysis will require frequent
queries based on interval overlaps, feature neighbourhood, annotation
and, most importantly, feature relationships and stored paths.

At first, I thought of build an entire new database structure to store
project specific data (e.g. alternative splicing or protein interaction),
but as I have some experience with Lincon's
Bio::DB::SeqFeature::Store, I'm now considering extending it for the
purpose of storing graphs describing relationships among features.

I'm aware that some other bioperl related databases, specifically
BioSQL and Chado, do have  components which might be suitable for
storing all or some of these data but, since Lincon's feature storage
and interval binning implementations in
Bio::DB::SeqFeature::Store::mysql are both clean, simple and very fast,
perhaps extending it in a seemingly modular way is desirable. A good
extension to Lincon's database could include tables like
feature_relationship and feature_path, for edges and transitive
closures (just like in BioSQL) and feature_stored_path, for exclusion
of biologically irrelevant paths in DAGs, like certain splicing
isoforms. These tables could be used  to store sequence assemblies or
EST alignments efficiently, including scaffolds inferred by connecting
contigs.

Before starting, I would like to know if the BioSQL and Chado schemata
do have accelerators for quering intervals among billions of features
and feature relatioships (some examples using these databases would
also help, if they that these databases are efficient for such tasks).
If these or other databases are not as suitable as Bio::DB::SeqFeature
for feature retrieval based on interval overlap and attributes,  then
again I might consider extending Bio::DB::seqFeature
and contributing such extensions back to bioperl...

Any thoughts?

Best regards,
Robson

PS: sorry if anyone gets two copies of this post, but took me some
time to realize my new e-mail wasn't subscribed to bioperl-l...


From bix at sendu.me.uk  Wed Jan  9 13:59:08 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 09 Jan 2008 13:59:08 +0000
Subject: [Bioperl-l] bioperl based database infrastucture for directed
 graphs
In-Reply-To: 
References: 
Message-ID: <4784D32C.9070807@sendu.me.uk>

Robson Francisco de Souza wrote:
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,

I'm using Bio::DB::SeqFeature for that purpose, but just a warning: I 
found that with millions of features it made a db that was too large in 
terms of disc space and too slow in terms of query time. I had to hack 
out its storage of feature objects in the db, instead generating feature 
objects on request from the stored attributes. Doing this turned out to 
be faster than simply unfreezing certain kinds of feature objects!

(I also had to hack in support for retrieval by source, a patch that 
Lincoln hasn't gotten back to me about yet.)

While I can't answer your main questions, I wish you good luck with your 
project and request that you keep us posted with what you achieve.


From bosborne11 at verizon.net  Wed Jan  9 14:46:42 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 09 Jan 2008 09:46:42 -0500
Subject: [Bioperl-l] Parsing SwissProt annotation in comment
In-Reply-To: <47848619.40109@tagc.univ-mrs.fr>
References: <47848619.40109@tagc.univ-mrs.fr>
Message-ID: <3DAEDA67-B9A5-47A4-8108-0915659F1052@verizon.net>

Samuel,

The Feature-Annotation HOWTO addresses this specifically:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation


Brian O.


On Jan 9, 2008, at 3:30 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello,
>
> I would like to retrieve the human reviewed annotation of SwissProt  
> entries; these information are in the comment section of the  
> sequence file. Here is an example:
>
> CC   -!- FUNCTION: Actins are highly conserved proteins that are  
> involved
> CC       in various types of cell motility and are ubiquitously  
> expressed
> CC       in all eukaryotic cells.
> CC   -!- SUBUNIT: Polymerization of globular actin (G-actin) leads  
> to a
> CC       structural filament (F-actin) in the form of a two-stranded  
> helix.
> CC       Each actin can bind to 4 others. Found in a complex with  
> XPO6,
> CC       Ran, ACTB and PFN1. Component of a complex composed at  
> least of
> CC       ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with  
> XPO6.
> CC   -!- INTERACTION:
> CC       Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668;
> CC       P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161;
> CC   -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton.
>
> Is there a specific method to do such a job?
>
> Thanks much,
> Samuel
>
> -- 
>
> Samuel GRANJEAUD                   granjeau at tagc.univ-mrs.fr
> INSERM - ICIM - TAGC               Tel: +33  (0)491 82 87 24
> http://tagc.univ-mrs.fr            Fax: +33  (0)491 82 87 01
> http://icim.marseille.inserm.fr/proteomique
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From alexanderptok at web.de  Wed Jan  9 15:34:56 2008
From: alexanderptok at web.de (Alexander Ptok)
Date: Wed, 09 Jan 2008 16:34:56 +0100
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN]
Message-ID: <2011210591@web.de>

Hi,

I am a beginner to BioPerl and working through the Beginners HOWTO

Version of BioPerl is 1.4-1 running on Debian etch

In the Howto everything worked fine until the section

Retrieving multiple sequences from a database

from where i copied the following script:

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
 
$query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
$query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  -query => $query );
 
$gb_obj = Bio::DB::GenBank->new;
 
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
 
while ($seq_obj = $stream_obj->next_seq) {    
    # do something with the sequence object    
    print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}


If i cut the 0:3000[SLEN] query it works and returns a lot of sequences, when i alter the query to e.g. 1830[SLEN] it
finds the one sequence that has the length 1830, but i was not able to query a range of lengths.

Please, does anyone know what i am doing wrong.
Greetings
A. Ptok
_________________________________________________________________________
In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! 
Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114



From cjm at fruitfly.org  Wed Jan  9 16:52:21 2008
From: cjm at fruitfly.org (Chris Mungall)
Date: Wed, 9 Jan 2008 08:52:21 -0800
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: 
References: 
Message-ID: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>

[cc-d to gmod-schema]

Chado does have some views and pg functions for interval-based  
retrieval. AFAIK there are no accelerators for deep feature graphs,  
as most chado users have relatively shallow gene-model/SO feature  
graphs. It may not be so hard to extend cvterm code for doing this,  
depending on the characteristics of your graphs (the closure of  
feature neighbourhood graphs may be particularly large)

On Jan 9, 2008, at 5:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From cjfields at uiuc.edu  Wed Jan  9 15:00:38 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 Jan 2008 09:00:38 -0600
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: <4784D32C.9070807@sendu.me.uk>
References: 
	<4784D32C.9070807@sendu.me.uk>
Message-ID: 


On Jan 9, 2008, at 7:59 AM, Sendu Bala wrote:

> Robson Francisco de Souza wrote:
>> Before starting, I would like to know if the BioSQL and Chado  
>> schemata
>> do have accelerators for quering intervals among billions of features
>> and feature relatioships (some examples using these databases would
>> also help, if they that these databases are efficient for such  
>> tasks).
>> If these or other databases are not as suitable as  
>> Bio::DB::SeqFeature
>> for feature retrieval based on interval overlap and attributes,
>
> I'm using Bio::DB::SeqFeature for that purpose, but just a warning:  
> I found that with millions of features it made a db that was too  
> large in terms of disc space and too slow in terms of query time. I  
> had to hack out its storage of feature objects in the db, instead  
> generating feature objects on request from the stored attributes.  
> Doing this turned out to be faster than simply unfreezing certain  
> kinds of feature objects!

Would this be Bio::SF::Annotated objects? If so I bet Storable is  
storing the OntologyStore object information along with the SF (which  
argues for refactoring the FeatureIO/Bio::SF::Annotated stuff in 1.7).

Not sure what can be done about that beyond your hack, though it might  
be worth exploring whether one can optionally set the DB::Store to  
store the object instance.

> (I also had to hack in support for retrieval by source, a patch that  
> Lincoln hasn't gotten back to me about yet.)
>
> While I can't answer your main questions, I wish you good luck with  
> your project and request that you keep us posted with what you  
> achieve.

You can always try Lincoln on the GBrowse list as well.  I would say  
go ahead and commit the patch if it isn't a big deal.

chris


From cjfields at uiuc.edu  Wed Jan  9 18:12:55 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 9 Jan 2008 12:12:55 -0600
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: 
References: 
Message-ID: <128517E8-3A2A-45DD-83A0-0014863A25BC@uiuc.edu>

cc'ing the gbrowse list in case Lincoln hasn't seen this.

I believe the primary intent for Bio::DB::SeqFeature::Store was as a  
more GFF3-compatible replacement for Bio::DB::GFF (unlimited feature  
nesting, uses any SeqFeatureI, etc) and was streamlined for faster  
lookups by GBrowse.  I don't think adding tables would affect  
performance dramatically, though maybe Lincoln would have a better idea.

chris

On Jan 9, 2008, at 7:20 AM, Robson Francisco de Souza wrote:

> Hello All!
>
> Greetings for everybody and happy new year for those following an
> western calendary!
>
> I'm starting a new project to store and analyze distinct sets of
> sequence annotation data which are related in a way suitable for
> representation in a directed (e.g. transcript splicing) or undirected
> (e.g. gene product interaction) graph. Analysis will require frequent
> queries based on interval overlaps, feature neighbourhood, annotation
> and, most importantly, feature relationships and stored paths.
>
> At first, I thought of build an entire new database structure to store
> project specific data (e.g. alternative splicing or protein  
> interaction),
> but as I have some experience with Lincon's
> Bio::DB::SeqFeature::Store, I'm now considering extending it for the
> purpose of storing graphs describing relationships among features.
>
> I'm aware that some other bioperl related databases, specifically
> BioSQL and Chado, do have  components which might be suitable for
> storing all or some of these data but, since Lincon's feature storage
> and interval binning implementations in
> Bio::DB::SeqFeature::Store::mysql are both clean, simple and very  
> fast,
> perhaps extending it in a seemingly modular way is desirable. A good
> extension to Lincon's database could include tables like
> feature_relationship and feature_path, for edges and transitive
> closures (just like in BioSQL) and feature_stored_path, for exclusion
> of biologically irrelevant paths in DAGs, like certain splicing
> isoforms. These tables could be used  to store sequence assemblies or
> EST alignments efficiently, including scaffolds inferred by connecting
> contigs.
>
> Before starting, I would like to know if the BioSQL and Chado schemata
> do have accelerators for quering intervals among billions of features
> and feature relatioships (some examples using these databases would
> also help, if they that these databases are efficient for such tasks).
> If these or other databases are not as suitable as Bio::DB::SeqFeature
> for feature retrieval based on interval overlap and attributes,  then
> again I might consider extending Bio::DB::seqFeature
> and contributing such extensions back to bioperl...
>
> Any thoughts?
>
> Best regards,
> Robson
>
> PS: sorry if anyone gets two copies of this post, but took me some
> time to realize my new e-mail wasn't subscribed to bioperl-l...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From bosborne11 at verizon.net  Wed Jan  9 18:29:15 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 09 Jan 2008 13:29:15 -0500
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths
	0:3000[SLEN]
In-Reply-To: <2011210591@web.de>
References: <2011210591@web.de>
Message-ID: <0EB96131-7931-4FC3-802F-A8152B474A99@verizon.net>

Alexander,

I don't understand. By using the clause "0:3000[SLEN] " you are  
querying for sequences in the length range of 0 to 3000.


Brian O.


On Jan 9, 2008, at 10:34 AM, Alexander Ptok wrote:

> If i cut the 0:3000[SLEN] query it works and returns a lot of  
> sequences, when i alter the query to e.g. 1830[SLEN] it
> finds the one sequence that has the length 1830, but i was not able  
> to query a range of lengths.



From stefan.kirov at bms.com  Wed Jan  9 19:54:07 2008
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 09 Jan 2008 14:54:07 -0500
Subject: [Bioperl-l] pairwise_kaks.PLS: verbose rquired by PAML
Message-ID: <4785265F.6020500@bms.com>

Jason,
Even this last fix I still had problems with bp_pairwise_kaks.pl. It
turns out, verbose needs to be set on by default for codeml in order for
the sequences to appear in mlc file.\
That being said, we need instead of:
    $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
        (-verbose => $verbose,
         -params => { 'runmode' => -2,
                      'seqtype' => 1,
                  }
         );
this:

    $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
        (-verbose => $verbose,
         -params => { 'runmode' => -2,
                      'seqtype' => 1,
                      'verbose' => 1,
                  }
         );

verbose can 2 as well.... Just got this clarification from Ziheng. He
also offers to change the output so it becomes easier for us. I plan to
ask him to put the sequence in the mlc header by default.
Stefan



From robfsouza at gmail.com  Thu Jan 10 00:28:25 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Wed, 9 Jan 2008 22:28:25 -0200
Subject: [Bioperl-l] bioperl based database infrastucture for directed
	graphs
In-Reply-To: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
References: 
	<199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
Message-ID: 

Hi,

2008/1/9, Chris Mungall :
> [cc-d to gmod-schema]
>
> Chado does have some views and pg functions for interval-based
> retrieval. AFAIK there are no accelerators for deep feature graphs,
> as most chado users have relatively shallow gene-model/SO feature
> graphs. It may not be so hard to extend cvterm code for doing this,
> depending on the characteristics of your graphs (the closure of
> feature neighbourhood graphs may be particularly large)

Great! I'm studing Chado and I will have a look at the interval optimizations.
Did any of you compared BioSQL and Chado for huge feature and feature
graph storage/retrieval efficiency? As Sendu pointed to limitations in
Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms
(or maybe another one?) would be best suited for these tasks... for
the moment, I will either extend Sendu's hack of Lincon's modules or
adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to
Chado, if it turns out to be more efficient than the pg functions.

Best,
Robson

PS: I could not find the most recent version of gmod by following the
Download link to gmod(Chado) from GMOD's site to the Sourceforge
download page. Did I miss the right link on the download site or is
this unexpected? Is the version available at IUBio's mirror (0.003-10)
the most recent one?


From cain.cshl at gmail.com  Thu Jan 10 03:15:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 09 Jan 2008 22:15:29 -0500
Subject: [Bioperl-l] bioperl based database infrastucture for
	directed	graphs
In-Reply-To: 
References: 
	<199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org>
	
Message-ID: <1199934929.6229.44.camel@frissell>

Hi Robson,

I seem to be perennially working on the 1.0 release of Chado.  The
schema itself is quite stable but I'm always working on the tools to
make them handle more cases and be as stable as possible.  For the time
being, you need to get Chado from cvs; see 

  http://www.gmod.org/wiki/index.php/Chado_-_Getting_Started#Chado_From_CVS

I removed the 0.003 release from the SourceForge site because the schema
in it is out of date relative to what we've been working on for the last
year.

Scott

On Wed, 2008-01-09 at 22:28 -0200, Robson Francisco de Souza wrote:
> Hi,
> 
> 2008/1/9, Chris Mungall :
> > [cc-d to gmod-schema]
> >
> > Chado does have some views and pg functions for interval-based
> > retrieval. AFAIK there are no accelerators for deep feature graphs,
> > as most chado users have relatively shallow gene-model/SO feature
> > graphs. It may not be so hard to extend cvterm code for doing this,
> > depending on the characteristics of your graphs (the closure of
> > feature neighbourhood graphs may be particularly large)
> 
> Great! I'm studing Chado and I will have a look at the interval optimizations.
> Did any of you compared BioSQL and Chado for huge feature and feature
> graph storage/retrieval efficiency? As Sendu pointed to limitations in
> Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms
> (or maybe another one?) would be best suited for these tasks... for
> the moment, I will either extend Sendu's hack of Lincon's modules or
> adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to
> Chado, if it turns out to be more efficient than the pg functions.
> 
> Best,
> Robson
> 
> PS: I could not find the most recent version of gmod by following the
> Download link to gmod(Chado) from GMOD's site to the Sourceforge
> download page. Did I miss the right link on the download site or is
> this unexpected? Is the version available at IUBio's mirror (0.003-10)
> the most recent one?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From bosborne11 at verizon.net  Thu Jan 10 14:16:16 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 10 Jan 2008 09:16:16 -0500
Subject: [Bioperl-l] Beginners HOWTO query a range of lengths
	0:3000[SLEN]
In-Reply-To: <2013325230@web.de>
References: <2013325230@web.de>
Message-ID: <932550FF-8414-4B3E-92BB-1895FD9658AE@verizon.net>

Alexander,

OK, that is odd (meaning, this did work a while back but it's not  
clear to me what could have changed).

First thing to do, upgrade to Bioperl version 1.52. Can you do this?  
Version 1.4 is very old and you could run into other problems using it.


Brian O.



On Jan 10, 2008, at 8:54 AM, Alexander Ptok wrote:

> Hallo Brian,
>
> thanks for your answer. The principle is clear, but it doesn't work
> like it should, on my computer. So maybe i should repeat what i did
> step by step.
>
> 1. i took the following script:
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
> $query_obj = Bio::DB::Query::GenBank->new(-db    => 'nucleotide',  - 
> query => $query );
>
> $gb_obj = Bio::DB::GenBank->new;
>
> $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
> while ($seq_obj = $stream_obj->next_seq) {
>    # do something with the sequence object
>    print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
> }
>
> and then on the terminal
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script1.pl
> sv1494 at r04102:~/Desktop/bioperl$
>
> 2. i took out the 0:3000[SLEN]:
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]";
>
> and then on the terminal
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script2.pl
> NM_128760       2775
> NM_125788       2874
> NM_124913       3068
> NM_124912       3117
> NM_124775       871
> NM_120360       1655
> NM_111862       2199
> NM_001036386    2734
> NM_119270       3996
> NM_105072       1656
> NM_113294       4824
> NM_180431       1673
> NM_120495       2515
> NM_120493       2050
> NM_112156       1089
> .
> .
> and a lot more of hits, and one can clearly see, there are some with  
> a lenght between 0 and 3000
>
> 3. to have a look at the [SLEN] i tried another script with e.g.  
> 2199[SLEN]
>
> $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 2199[SLEN]";
>
> on the terminal:
>
> sv1494 at r04102:~/Desktop/bioperl$ perl script3.pl
> NM_111862       2199
> sv1494 at r04102:~/Desktop/bioperl$
>
>
>
> It think everthing works fine, except that bioperl or maybe the  
> genbank doesn't understand
> the range clause 0:3000, but in every documentation says i have to  
> do it that way. Did
> i misunterstand something or is it just a problem of my computer/ 
> bioperl installation?
> Maybe you can tell me if the script does what it is suppose to do on  
> your computer?
>
> Thanks and greetings
>
> Alexander Ptok
>>
>> Alexander,
>>
>> I don't understand. By using the clause "0:3000[SLEN] " you are
>> querying for sequences in the length range of 0 to 3000.
>>
>
>
> _______________________________________________________________________
> Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 30 Tage
> kostenlos testen. http://www.pc-sicherheit.web.de/startseite/? 
> mc=022220
>




From pmiguel at purdue.edu  Fri Jan 11 16:22:38 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 11:22:38 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
Message-ID: <478797CE.9050202@purdue.edu>

No problem getting sequence from genbank via a myriad of methods. But as 
the volume of non-finished sequence in genbank increases the importance 
of also obtaining quality values for a given sequence increases. Some 
records include quality values.

I typically use bp_fetch.pl to grab a sequence from genbank:

bp_fetch.pl -fmt fasta net::genbank:AC207960

sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't designed 
to pull down quals evidently:

bp_fetch.pl -fmt qual net::genbank:AC207960

gives:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual object 
to write_seq() as a parameter named "source"
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::qual::write_seq 
/usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205
STACK: /usr/local/perl/bin/bp_fetch.pl:313
-----------------------------------------------------------

(running under bioperl 1.5.2)

The quality values for this accession are in genbank as these URLs 
demonstrate:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual

What is the best way to pull down these qual values? They aren't present 
in "GenBank(Full)" format. They are present in an ASN.1 format.

Advice would be appreciated.

-- 
Phillip
Purdue Genomics Core Facility






From cjfields at uiuc.edu  Fri Jan 11 17:09:40 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 11 Jan 2008 11:09:40 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <478797CE.9050202@purdue.edu>
References: <478797CE.9050202@purdue.edu>
Message-ID: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>

I don't think this is possible with the current setup for  
Bio::DB::GenBank (which the script uses).  We'll have to investigate  
whether it is possible to retrieve this data via NCBI's eutils; if so  
we can try adding it in.  If you want you can submit this as an  
enhancement request via bugzilla for tracking:

http://bugzilla.open-bio.org/

chris

On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:

> No problem getting sequence from genbank via a myriad of methods.  
> But as the volume of non-finished sequence in genbank increases the  
> importance of also obtaining quality values for a given sequence  
> increases. Some records include quality values.
>
> I typically use bp_fetch.pl to grab a sequence from genbank:
>
> bp_fetch.pl -fmt fasta net::genbank:AC207960
>
> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't  
> designed to pull down quals evidently:
>
> bp_fetch.pl -fmt qual net::genbank:AC207960
>
> gives:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual  
> object to write_seq() as a parameter named "source"
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ 
> 5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::qual::write_seq /usr/local/perl_5.8/lib/site_perl/ 
> 5.8.8/Bio/SeqIO/qual.pm:205
> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> -----------------------------------------------------------
>
> (running under bioperl 1.5.2)
>
> The quality values for this accession are in genbank as these URLs  
> demonstrate:
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual
>
> What is the best way to pull down these qual values? They aren't  
> present in "GenBank(Full)" format. They are present in an ASN.1  
> format.
>
> Advice would be appreciated.
>
> -- 
> Phillip
> Purdue Genomics Core Facility
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From MEC at stowers-institute.org  Fri Jan 11 19:14:10 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 11 Jan 2008 13:14:10 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
Message-ID: 

Indeed eutil is capable of this

The following use of my ncbi_eutil (attached) script yeilds what you
want:

ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

It depends on the version of NCBI_PowerScripting.pm , such as is
included in 

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Friday, January 11, 2008 11:10 AM
> To: Phillip San Miguel
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> I don't think this is possible with the current setup for 
> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate whether it is possible to retrieve this data via 
> NCBI's eutils; if so we can try adding it in.  If you want 
> you can submit this as an enhancement request via bugzilla 
> for tracking:
> 
> http://bugzilla.open-bio.org/
> 
> chris
> 
> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> 
> > No problem getting sequence from genbank via a myriad of methods.  
> > But as the volume of non-finished sequence in genbank increases the 
> > importance of also obtaining quality values for a given sequence 
> > increases. Some records include quality values.
> >
> > I typically use bp_fetch.pl to grab a sequence from genbank:
> >
> > bp_fetch.pl -fmt fasta net::genbank:AC207960
> >
> > sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> > designed to pull down quals evidently:
> >
> > bp_fetch.pl -fmt qual net::genbank:AC207960
> >
> > gives:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> > object to write_seq() as a parameter named "source"
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> > 5.8.8/Bio/Root/Root.pm:359
> > STACK: Bio::SeqIO::qual::write_seq 
> /usr/local/perl_5.8/lib/site_perl/
> > 5.8.8/Bio/SeqIO/qual.pm:205
> > STACK: /usr/local/perl/bin/bp_fetch.pl:313
> > -----------------------------------------------------------
> >
> > (running under bioperl 1.5.2)
> >
> > The quality values for this accession are in genbank as these URLs
> > demonstrate:
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
> > 4937460&dopt=fasta
> >
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
> > 4937460&dopt=qual
> >
> > What is the best way to pull down these qual values? They aren't 
> > present in "GenBank(Full)" format. They are present in an ASN.1 
> > format.
> >
> > Advice would be appreciated.
> >
> > --
> > Phillip
> > Purdue Genomics Core Facility
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From pmiguel at purdue.edu  Fri Jan 11 19:33:13 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 14:33:13 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: 
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
Message-ID: <4787C479.8070600@purdue.edu>

Hi Malcolm,
    Looks like your email was (inadvertantly?) redacted in some way. (No 
attachment and last sentence truncated.) Would it be possible to get a 
complete version so I can be sure I'm following you?
Thanks,
Phillip

Cook, Malcolm wrote:
> Indeed eutil is capable of this
>
> The following use of my ncbi_eutil (attached) script yeilds what you
> want:
>
> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
> AC207960.qual
>
> It depends on the version of NCBI_PowerScripting.pm , such as is
> included in 
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Chris Fields
>> Sent: Friday, January 11, 2008 11:10 AM
>> To: Phillip San Miguel
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>> files from Genbank?
>>
>> I don't think this is possible with the current setup for 
>> Bio::DB::GenBank (which the script uses).  We'll have to 
>> investigate whether it is possible to retrieve this data via 
>> NCBI's eutils; if so we can try adding it in.  If you want 
>> you can submit this as an enhancement request via bugzilla 
>> for tracking:
>>
>> http://bugzilla.open-bio.org/
>>
>> chris
>>
>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>>
>>     
>>> No problem getting sequence from genbank via a myriad of methods.  
>>> But as the volume of non-finished sequence in genbank increases the 
>>> importance of also obtaining quality values for a given sequence 
>>> increases. Some records include quality values.
>>>
>>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>>
>>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>>
>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>>> designed to pull down quals evidently:
>>>
>>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>>
>>> gives:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>>> object to write_seq() as a parameter named "source"
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
>>> 5.8.8/Bio/Root/Root.pm:359
>>> STACK: Bio::SeqIO::qual::write_seq 
>>>       
>> /usr/local/perl_5.8/lib/site_perl/
>>     
>>> 5.8.8/Bio/SeqIO/qual.pm:205
>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>>> -----------------------------------------------------------
>>>
>>> (running under bioperl 1.5.2)
>>>
>>> The quality values for this accession are in genbank as these URLs
>>> demonstrate:
>>>
>>>
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>>     
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
>>     
>>> 4937460&dopt=fasta
>>>
>>>
>>>       
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15
>>     
>>> 4937460&dopt=qual
>>>
>>> What is the best way to pull down these qual values? They aren't 
>>> present in "GenBank(Full)" format. They are present in an ASN.1 
>>> format.
>>>
>>> Advice would be appreciated.
>>>
>>> --
>>> Phillip
>>> Purdue Genomics Core Facility
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>   



From pmiguel at purdue.edu  Fri Jan 11 19:37:24 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 14:37:24 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
Message-ID: <4787C574.8020003@purdue.edu>

Hi Chris,
Thanks. I have submitted this as an enhancement request to bugzilla.
Phillip

Chris Fields wrote:
> I don't think this is possible with the current setup for 
> Bio::DB::GenBank (which the script uses).  We'll have to investigate 
> whether it is possible to retrieve this data via NCBI's eutils; if so 
> we can try adding it in.  If you want you can submit this as an 
> enhancement request via bugzilla for tracking:
>
> http://bugzilla.open-bio.org/
>
> chris
>
> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>
>> No problem getting sequence from genbank via a myriad of methods. But 
>> as the volume of non-finished sequence in genbank increases the 
>> importance of also obtaining quality values for a given sequence 
>> increases. Some records include quality values.
>>
>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>
>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>
>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>> designed to pull down quals evidently:
>>
>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>
>> gives:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>> object to write_seq() as a parameter named "source"
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::SeqIO::qual::write_seq 
>> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205
>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>> -----------------------------------------------------------
>>
>> (running under bioperl 1.5.2)
>>
>> The quality values for this accession are in genbank as these URLs 
>> demonstrate:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta 
>>
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual 
>>
>>
>> What is the best way to pull down these qual values? They aren't 
>> present in "GenBank(Full)" format. They are present in an ASN.1 format.
>>
>> Advice would be appreciated.
>>
>> -- 
>> Phillip
>> Purdue Genomics Core Facility
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>



From pmiguel at purdue.edu  Fri Jan 11 20:46:59 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 11 Jan 2008 15:46:59 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: 
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
	<4787C479.8070600@purdue.edu>
	
Message-ID: <4787D5C3.1030308@purdue.edu>

Hi Malcolm,
Yes that works great!
Well, one caveat:
    If you download both the fasta and the qual files:
ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=fasta > 
AC207960.fasta
ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > 
AC207960.fasta.qual

The "primary IDs" don't match. The fasta comes out:
 >gi|154937460|gb|AC207960.1|

and the qual comes out:
 >AC207960.1

which seems to choke most programs that use seq and qual (eg 
cross_match) because they want the primary IDs of the seq and qual files 
to match.

Otherwise fine, though.
Thanks,
Phillip

Cook, Malcolm wrote:
> Phillip:
>
> Of course - mea culpa - here's the full monty....
>
> Indeed NCBI's eutils can do this:
>
>   
>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
>>     
> AC207960.qual
>
> which uses my script (attached) to wrap NCBI's eutils.
>
> It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
> by NCBI in their "Jul 24-27, 2007" course found at
> http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html
>
> I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
> very beginning so that trace messages are not printed on STDOUT, such as
> this echoed header:
> 	 Retrieving 1 records from nucleotide...
> ... and footer:
> 	Received records 1 - 1.
> 	Wrote data to -.
>
> (otherwise they are interspersed with downloaded qual files)
>
> It also depends on recent version of GetOpt::Long.
>
> Hope it helps.
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
>   
>> -----Original Message-----
>> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
>> Sent: Friday, January 11, 2008 1:33 PM
>> To: Cook, Malcolm
>> Cc: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>> files from Genbank?
>>
>> Hi Malcolm,
>>     Looks like your email was (inadvertantly?) redacted in 
>> some way. (No attachment and last sentence truncated.) Would 
>> it be possible to get a complete version so I can be sure I'm 
>> following you?
>> Thanks,
>> Phillip
>>
>> Cook, Malcolm wrote:
>>     
>>> Indeed eutil is capable of this
>>>
>>> The following use of my ncbi_eutil (attached) script yeilds what you
>>> want:
>>>
>>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
>>>       
>> rettype=qual > 
>>     
>>> AC207960.qual
>>>
>>> It depends on the version of NCBI_PowerScripting.pm , such as is 
>>> included in
>>>
>>> Malcolm Cook
>>> Database Applications Manager - Bioinformatics Stowers 
>>>       
>> Institute for 
>>     
>>> Medical Research - Kansas City, Missouri
>>>   
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
>>>> Fields
>>>> Sent: Friday, January 11, 2008 11:10 AM
>>>> To: Phillip San Miguel
>>>> Cc: bioperl-l
>>>> Subject: Re: [Bioperl-l] Recommended way to download qual 
>>>>         
>> files from 
>>     
>>>> Genbank?
>>>>
>>>> I don't think this is possible with the current setup for 
>>>> Bio::DB::GenBank (which the script uses).  We'll have to 
>>>>         
>> investigate 
>>     
>>>> whether it is possible to retrieve this data via NCBI's 
>>>>         
>> eutils; if so 
>>     
>>>> we can try adding it in.  If you want you can submit this as an 
>>>> enhancement request via bugzilla for tracking:
>>>>
>>>> http://bugzilla.open-bio.org/
>>>>
>>>> chris
>>>>
>>>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
>>>>
>>>>     
>>>>         
>>>>> No problem getting sequence from genbank via a myriad of 
>>>>>           
>> methods.  
>>     
>>>>> But as the volume of non-finished sequence in genbank 
>>>>>           
>> increases the 
>>     
>>>>> importance of also obtaining quality values for a given sequence 
>>>>> increases. Some records include quality values.
>>>>>
>>>>> I typically use bp_fetch.pl to grab a sequence from genbank:
>>>>>
>>>>> bp_fetch.pl -fmt fasta net::genbank:AC207960
>>>>>
>>>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
>>>>> designed to pull down quals evidently:
>>>>>
>>>>> bp_fetch.pl -fmt qual net::genbank:AC207960
>>>>>
>>>>> gives:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
>>>>> object to write_seq() as a parameter named "source"
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
>>>>> 5.8.8/Bio/Root/Root.pm:359
>>>>> STACK: Bio::SeqIO::qual::write_seq
>>>>>       
>>>>>           
>>>> /usr/local/perl_5.8/lib/site_perl/
>>>>     
>>>>         
>>>>> 5.8.8/Bio/SeqIO/qual.pm:205
>>>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
>>>>> -----------------------------------------------------------
>>>>>
>>>>> (running under bioperl 1.5.2)
>>>>>
>>>>> The quality values for this accession are in genbank as these URLs
>>>>> demonstrate:
>>>>>
>>>>>
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
>>     
>>>> 0
>>>>     
>>>>         
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
>>     
>>>> 5
>>>>     
>>>>         
>>>>> 4937460&dopt=fasta
>>>>>
>>>>>
>>>>>       
>>>>>           
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
>>     
>>>> 5
>>>>     
>>>>         
>>>>> 4937460&dopt=qual
>>>>>
>>>>> What is the best way to pull down these qual values? They aren't 
>>>>> present in "GenBank(Full)" format. They are present in an ASN.1 
>>>>> format.
>>>>>
>>>>> Advice would be appreciated.
>>>>>
>>>>> --
>>>>> Phillip
>>>>> Purdue Genomics Core Facility
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>       
>>>>>           
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>     
>>>>         
>>>   
>>>       
>>
>>     



From MEC at stowers-institute.org  Fri Jan 11 19:40:14 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 11 Jan 2008 13:40:14 -0600
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
In-Reply-To: <4787C479.8070600@purdue.edu>
References: <478797CE.9050202@purdue.edu>
	<14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu>
	
	<4787C479.8070600@purdue.edu>
Message-ID: 

Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
	 Retrieving 1 records from nucleotide...
... and footer:
	Received records 1 - 1.
	Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in 
> some way. (No attachment and last sentence truncated.) Would 
> it be possible to get a complete version so I can be sure I'm 
> following you?
> Thanks,
> Phillip
> 
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
> rettype=qual > 
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is 
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers 
> Institute for 
> > Medical Research - Kansas City, Missouri
> >   
> >
> >   
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from 
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for 
> >> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate 
> >> whether it is possible to retrieve this data via NCBI's 
> eutils; if so 
> >> we can try adding it in.  If you want you can submit this as an 
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>     
> >>> No problem getting sequence from genbank via a myriad of 
> methods.  
> >>> But as the volume of non-finished sequence in genbank 
> increases the 
> >>> importance of also obtaining quality values for a given sequence 
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>       
> >> /usr/local/perl_5.8/lib/site_perl/
> >>     
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>     
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't 
> >>> present in "GenBank(Full)" format. They are present in an ASN.1 
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>       
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >   
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ncbi_eutil
Type: application/octet-stream
Size: 1854 bytes
Desc: ncbi_eutil
URL: 

From cain.cshl at gmail.com  Mon Jan 14 18:46:39 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Mon, 14 Jan 2008 13:46:39 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
Message-ID: <1200336399.6056.12.camel@frissell>

Hi all,

Last month, I got a bug report on the GBrowse bug tracker:

  http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291

about a problem with dumping invalid GenBank files.  GBrowse uses
Bio::SeqIO::genbank to create these dumps.  

In his bug report, he claims that feature names over 15 characters long
are invalid, and provided and example GenBank file where a feature is
named 'BAC_cloned_genomic_insert', which is over 15 characters.  What I
want to know is this: is this truly a restriction on the GenBank format,
or is it a software problem with some other package?  Do we need to fix
genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
believe this is really a bug.

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




From lstein at cshl.edu  Mon Jan 14 18:53:15 2008
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Jan 2008 13:53:15 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <1200336399.6056.12.camel@frissell>
References: <1200336399.6056.12.camel@frissell>
Message-ID: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>

Hi Scott,

He is correct about the limitation, but we deliberately relaxed it because
we were running into situations where we lost information during
roundtripping from other formats into genbank.

Lincoln

On Jan 14, 2008 1:46 PM, Scott Cain  wrote:

> Hi all,
>
> Last month, I got a bug report on the GBrowse bug tracker:
>
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
>
> about a problem with dumping invalid GenBank files.  GBrowse uses
> Bio::SeqIO::genbank to create these dumps.
>
> In his bug report, he claims that feature names over 15 characters long
> are invalid, and provided and example GenBank file where a feature is
> named 'BAC_cloned_genomic_insert', which is over 15 characters.  What I
> want to know is this: is this truly a restriction on the GenBank format,
> or is it a software problem with some other package?  Do we need to fix
> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> believe this is really a bug.
>
> Thanks,
> Scott
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Mon Jan 14 19:35:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 14 Jan 2008 13:35:46 -0600
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
Message-ID: 

It looks like the keys in the feature table run into the location  
string w/o intervening space, which would probably cause havoc with  
roundtripping from this output.  A few examples:

      BAC_cloned_genomic_insert<1..>1000
      combined_genscanjoin(<1..347,400..498,794..>1000)
      splign_na_dbEST_ncbi<1..>1000

I would think at least a space in between the location and the key  
would be required for round-tripping out of genbank format.

chris

On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:

> Hi Scott,
>
> He is correct about the limitation, but we deliberately relaxed it  
> because
> we were running into situations where we lost information during
> roundtripping from other formats into genbank.
>
> Lincoln
>
> On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
>
>> Hi all,
>>
>> Last month, I got a bug report on the GBrowse bug tracker:
>>
>>
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
>>
>> about a problem with dumping invalid GenBank files.  GBrowse uses
>> Bio::SeqIO::genbank to create these dumps.
>>
>> In his bug report, he claims that feature names over 15 characters  
>> long
>> are invalid, and provided and example GenBank file where a feature is
>> named 'BAC_cloned_genomic_insert', which is over 15 characters.   
>> What I
>> want to know is this: is this truly a restriction on the GenBank  
>> format,
>> or is it a software problem with some other package?  Do we need to  
>> fix
>> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
>> believe this is really a bug.
>>
>> Thanks,
>> Scott
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From lstein at cshl.edu  Mon Jan 14 19:46:20 2008
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Jan 2008 14:46:20 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: 
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
	
Message-ID: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>

That's a new bug. The version I worked on inserted a space after the name.

Lincoln

On Jan 14, 2008 2:35 PM, Chris Fields  wrote:

> It looks like the keys in the feature table run into the location
> string w/o intervening space, which would probably cause havoc with
> roundtripping from this output.  A few examples:
>
>      BAC_cloned_genomic_insert<1..>1000
>      combined_genscanjoin(<1..347,400..498,794..>1000)
>      splign_na_dbEST_ncbi<1..>1000
>
> I would think at least a space in between the location and the key
> would be required for round-tripping out of genbank format.
>
> chris
>
> On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:
>
> > Hi Scott,
> >
> > He is correct about the limitation, but we deliberately relaxed it
> > because
> > we were running into situations where we lost information during
> > roundtripping from other formats into genbank.
> >
> > Lincoln
> >
> > On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
> >
> >> Hi all,
> >>
> >> Last month, I got a bug report on the GBrowse bug tracker:
> >>
> >>
> >>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
> >>
> >> about a problem with dumping invalid GenBank files.  GBrowse uses
> >> Bio::SeqIO::genbank to create these dumps.
> >>
> >> In his bug report, he claims that feature names over 15 characters
> >> long
> >> are invalid, and provided and example GenBank file where a feature is
> >> named 'BAC_cloned_genomic_insert', which is over 15 characters.
> >> What I
> >> want to know is this: is this truly a restriction on the GenBank
> >> format,
> >> or is it a software problem with some other package?  Do we need to
> >> fix
> >> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> >> believe this is really a bug.
> >>
> >> Thanks,
> >> Scott
> >>
> >> --
> >>
> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.
> cain.cshl at gmail.com
> >> GMOD Coordinator (http://www.gmod.org/)
> >> 216-392-3087
> >> Cold Spring Harbor Laboratory
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From diogoat at gmail.com  Tue Jan 15 13:40:10 2008
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 15 Jan 2008 11:40:10 -0200
Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS
Message-ID: <638512560801150540m108db442r227d82c709a954@mail.gmail.com>

Hello,

I want to extract protein_id and transcript from a CDS tag, from genome in
genbak format but i have one problem, when the sequence in the file don't
have the protein_id or the transcript the script gives me this error:

------------- EXCEPTION  -------------
MSG: asking for tag value that does not exist protein_id
STACK Bio::SeqFeature::Generic::get_tag_values
/usr/share/perl5/Bio/SeqFeature/Generic.pm:504
STACK toplevel parser_cds.pl:25
--------------------------------------

Bellow I past the script

##############################################
use Bio::SeqIO;
use warnings;

my $infile = $ARGV[0];
my $outfile = "$infile.out";
open (OUT, ">>$outfile");

          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
                                      '-format' => 'Genbank');

         while (my $inseq = $seq_in->next_seq) {

        for my $feat_object ($inseq->get_SeqFeatures){
            if ($feat_object->primary_tag eq "CDS"){
                print OUT $feat_object->get_tag_values('protein_id')," ";
            print OUT $feat_object->get_tag_values('translation'),"\n";
        }
    }
}
###############################################

Somebody can helps me?

Thank

Diogo Tschoeke


From Marc.Logghe at ablynx.com  Tue Jan 15 14:44:54 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Tue, 15 Jan 2008 15:44:54 +0100
Subject: [Bioperl-l] Problem to extract protein_id and transcript from
	CDS
In-Reply-To: <638512560801150540m108db442r227d82c709a954@mail.gmail.com>
Message-ID: <03C512635899144083CADB0EE2220189013E2BEC@alpaca.lan.ablynx.com>

Hi,
Try testing for existence first using the has_tag() method.
It is provided by Bio::AnnotatableI.

print OUT $feat_object->get_tag_values('protein_id')," " if
($feat->has_tag('protein_id'));


HTH,
Marc

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Diogo Tschoeke
> Sent: dinsdag 15 januari 2008 14:40
> To: Bioperl-list
> Subject: [Bioperl-l] Problem to extract protein_id and transcript from
CDS
> 
> Hello,
> 
> I want to extract protein_id and transcript from a CDS tag, from
genome in
> genbak format but i have one problem, when the sequence in the file
don't
> have the protein_id or the transcript the script gives me this error:
> 
> ------------- EXCEPTION  -------------
> MSG: asking for tag value that does not exist protein_id
> STACK Bio::SeqFeature::Generic::get_tag_values
> /usr/share/perl5/Bio/SeqFeature/Generic.pm:504
> STACK toplevel parser_cds.pl:25
> --------------------------------------
> 
> Bellow I past the script
> 
> ##############################################
> use Bio::SeqIO;
> use warnings;
> 
> my $infile = $ARGV[0];
> my $outfile = "$infile.out";
> open (OUT, ">>$outfile");
> 
>           my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => 'Genbank');
> 
>          while (my $inseq = $seq_in->next_seq) {
> 
>         for my $feat_object ($inseq->get_SeqFeatures){
>             if ($feat_object->primary_tag eq "CDS"){
>                 print OUT $feat_object->get_tag_values('protein_id'),"
";
>             print OUT
$feat_object->get_tag_values('translation'),"\n";
>         }
>     }
> }
> ###############################################
> 
> Somebody can helps me?
> 
> Thank
> 
> Diogo Tschoeke
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cuiw at ncbi.nlm.nih.gov  Tue Jan 15 16:50:53 2008
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Tue, 15 Jan 2008 11:50:53 -0500
Subject: [Bioperl-l] Recommended way to download qual files from Genbank?
References: <478797CE.9050202@purdue.edu><14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu><4787C479.8070600@purdue.edu>
	
Message-ID: <18C407FD4FFB424292D769FBD68C1987048E95CC@NIHCESMLBX8.nih.gov>

There is an alternative way if you can download and compile NCBI C++ Toolkit (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/2007/Aug_27_2007/) . Simply call the binary like:
 
id1_fetch -fmt quality -gi 13508865
 
Wenwu Cui

________________________________

From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Fri 1/11/2008 2:40 PM
To: Phillip San Miguel
Cc: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] Recommended way to download qual files from Genbank?



Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
         Retrieving 1 records from nucleotide...
... and footer:
        Received records 1 - 1.
        Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu]
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from Genbank?
>
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in
> some way. (No attachment and last sentence truncated.) Would
> it be possible to get a complete version so I can be sure I'm
> following you?
> Thanks,
> Phillip
>
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch
> rettype=qual >
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers
> Institute for
> > Medical Research - Kansas City, Missouri
> >  
> >
> >  
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for
> >> Bio::DB::GenBank (which the script uses).  We'll have to
> investigate
> >> whether it is possible to retrieve this data via NCBI's
> eutils; if so
> >> we can try adding it in.  If you want you can submit this as an
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>    
> >>> No problem getting sequence from genbank via a myriad of
> methods. 
> >>> But as the volume of non-finished sequence in genbank
> increases the
> >>> importance of also obtaining quality values for a given sequence
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>      
> >> /usr/local/perl_5.8/lib/site_perl/
> >>    
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>    
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't
> >>> present in "GenBank(Full)" format. They are present in an ASN.1
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>      
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>    
> >
> >  
>
>
>





From singhal at berkeley.edu  Tue Jan 15 22:50:12 2008
From: singhal at berkeley.edu (Sonal Singhal)
Date: Tue, 15 Jan 2008 14:50:12 -0800
Subject: [Bioperl-l] redundant sequences
Message-ID: 

Hi all,

I am mining a few genomes to find all the genes in a gene family, and
of course multiple BLAST searches of different paralogs are returning
a lot of redundant hits.   I have searched the BioPerl documentation,
and I cannot find an easy way to cluster and then purge redundant
sequences.  Any ideas?

Cheers,
sonal


From MEC at stowers-institute.org  Tue Jan 15 23:21:00 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 15 Jan 2008 17:21:00 -0600
Subject: [Bioperl-l] redundant sequences
In-Reply-To: 
References: 
Message-ID: 

Cd-hit: http://bioinformatics.burnham.org/cd-hi/

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Sonal Singhal
> Sent: Tuesday, January 15, 2008 4:50 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] redundant sequences
> 
> Hi all,
> 
> I am mining a few genomes to find all the genes in a gene 
> family, and of course multiple BLAST searches of different 
> paralogs are returning
> a lot of redundant hits.   I have searched the BioPerl documentation,
> and I cannot find an easy way to cluster and then purge 
> redundant sequences.  Any ideas?
> 
> Cheers,
> sonal
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From cain.cshl at gmail.com  Wed Jan 16 02:24:50 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 15 Jan 2008 21:24:50 -0500
Subject: [Bioperl-l] GenBank format and feature names > 15 char
In-Reply-To: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>
References: <1200336399.6056.12.camel@frissell>
	<6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com>
	
	<6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com>
Message-ID: <1200450290.7276.3.camel@frissell>

Hi Chris and Lincoln,

I've attached my suggested patch.  So, can I use svn to check it in?  It
only adds a space after the feature type name; I suspect that will be
enough to fix the file format for most uses.

Scott

On Mon, 2008-01-14 at 14:46 -0500, Lincoln Stein wrote:
> That's a new bug. The version I worked on inserted a space after the name.
> 
> Lincoln
> 
> On Jan 14, 2008 2:35 PM, Chris Fields  wrote:
> 
> > It looks like the keys in the feature table run into the location
> > string w/o intervening space, which would probably cause havoc with
> > roundtripping from this output.  A few examples:
> >
> >      BAC_cloned_genomic_insert<1..>1000
> >      combined_genscanjoin(<1..347,400..498,794..>1000)
> >      splign_na_dbEST_ncbi<1..>1000
> >
> > I would think at least a space in between the location and the key
> > would be required for round-tripping out of genbank format.
> >
> > chris
> >
> > On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote:
> >
> > > Hi Scott,
> > >
> > > He is correct about the limitation, but we deliberately relaxed it
> > > because
> > > we were running into situations where we lost information during
> > > roundtripping from other formats into genbank.
> > >
> > > Lincoln
> > >
> > > On Jan 14, 2008 1:46 PM, Scott Cain  wrote:
> > >
> > >> Hi all,
> > >>
> > >> Last month, I got a bug report on the GBrowse bug tracker:
> > >>
> > >>
> > >>
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291
> > >>
> > >> about a problem with dumping invalid GenBank files.  GBrowse uses
> > >> Bio::SeqIO::genbank to create these dumps.
> > >>
> > >> In his bug report, he claims that feature names over 15 characters
> > >> long
> > >> are invalid, and provided and example GenBank file where a feature is
> > >> named 'BAC_cloned_genomic_insert', which is over 15 characters.
> > >> What I
> > >> want to know is this: is this truly a restriction on the GenBank
> > >> format,
> > >> or is it a software problem with some other package?  Do we need to
> > >> fix
> > >> genbank.pm?  I'm perfectly willing to do it; I'm just hesitant to
> > >> believe this is really a bug.
> > >>
> > >> Thanks,
> > >> Scott
> > >>
> > >> --
> > >>
> > ------------------------------------------------------------------------
> > >> Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > >> GMOD Coordinator (http://www.gmod.org/)
> > >> 216-392-3087
> > >> Cold Spring Harbor Laboratory
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >
> > >
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> >
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: genbank.pm.patch
Type: text/x-patch
Size: 1110 bytes
Desc: not available
URL: 

From cjfields at uiuc.edu  Wed Jan 16 03:15:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 15 Jan 2008 21:15:51 -0600
Subject: [Bioperl-l] Subversion migration complete
Message-ID: 

On behalf of the BioPerl core developers, I am proud to announce that  
the BioPerl SVN migration has been completed.  We would like to thank  
everyone who helped, in particular George Hartzell and Chris  
Dagdigian, both of who played instrumental roles in the CVS->SVN  
conversion and anonymous SVN setup for BioPerl.

Anonymous SVN checkouts for bioperl-live are now possible using:
svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live

Developers can obtain a checkout from:
svn co svn+ssh://USER at dev.open-bio.org/home/svn-repositories/bioperl/ 
bioperl-live/trunk bioperl-live

Browsable repository:
http://code.open-bio.org/svnweb/index.cgi/bioperl/

Basic instructions:
http://www.bioperl.org/wiki/Using_Subversion

We are still in the midst of implementing a few extra details related  
to SVN migration; the status on these can be viewed here:
http://www.bioperl.org/wiki/CVS_to_SVN_Migration

Enjoy!

chris



From bug-bioperl at rt.cpan.org  Thu Jan 17 03:35:30 2008
From: bug-bioperl at rt.cpan.org (Chris Fields via RT)
Date: Wed, 16 Jan 2008 22:35:30 -0500
Subject: [Bioperl-l] [rt.cpan.org #29533] Bio::SeqIO::interpro depends on
	XML::DOM::XPath
In-Reply-To: 
References:   
	
Message-ID: 


       Queue: bioperl
 Ticket 

On Fri Sep 21 10:28:52 2007, support at helpdesk.open-bio.org wrote:
> Hi Mike,
> 
> The proper place to submit this fix is the bioperl-l at lists.open-bio.org
> mailing list or the OBF Bugzilla queue at:
> http://bugzilla.open-bio.org/, this RT system is mainly for sysadmin
> activities rather than for tracking code changes. Would you be so kind
> to re-send your request to one of the places above? Thanks for the heads
> up! :)
> 
> Regards,
> Mauricio.

This has been fixed.  I'll get the CPAN maintainer to close this out.


From vipingjo at gmail.com  Thu Jan 17 08:48:36 2008
From: vipingjo at gmail.com (viping)
Date: Thu, 17 Jan 2008 16:48:36 +0800
Subject: [Bioperl-l] Can't locate object method "is_compatible" via package
	"Bio::Tree::Tree"
Message-ID: <200801171648332965577@gmail.com>

Hi Everyone?

I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + Windows XP SP2.
When running example codes(attched below as t.pl) within Bio\Tree\Compatible.pm , I got this error:

Can't locate object method "is_compatible" via package "Bio::Tree::Tree"

I replaced "$t1->is_compatible($t2)" with "is_compatible Bio::Tree::Compatible ($t1,$t2)", the error changed:
Can't locate object method "get_nodes" via package "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252,  line 1.

I modified Compatible.pm, changed code for "get_nodes" like this "get_nodes Bio::Tree::Tree($self);", new error arised :
Can't use string ("Bio::Tree::Tree") as a HASH ref while "strict refs" in use at i:/Perl/site/lib/Bio\Tree\Tree.pm line 198,  line 1.

I gived up. Any help will be deeply appreciated.




# this is the example script in Bio::Tree::Compatible?t.pl
  use Bio::Tree::Compatible;
  use Bio::TreeIO;
  my $input = new Bio::TreeIO('-format' => 'newick',
                              '-file'   => 'input.tre');
  my $t1 = $input->next_tree;
  my $t2 = $input->next_tree;

  my ($incompat, $ilabels, $inodes) = $t1->is_compatible($t2);
  if ($incompat) {
    my %cluster1 = %{ $t1->cluster_representation };
    my %cluster2 = %{ $t2->cluster_representation };
    print "incompatible trees\n";
    if (scalar(@$ilabels)) {
      foreach my $label (@$ilabels) {
        my $node1 = $t1->find_node(-id => $label);
        my $node2 = $t2->find_node(-id => $label);
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "label $label";
        print " cluster"; map { print " ",$_ } @c1;
        print " cluster"; map { print " ",$_ } @c2; print "\n";
      }
    }
    if (scalar(@$inodes)) {
      while (@$inodes) {
        my $node1 = shift @$inodes;
        my $node2 = shift @$inodes;
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "cluster"; map { print " ",$_ } @c1;
        print " properly intersects cluster";
        map { print " ",$_ } @c2; print "\n";
      }
    }
  } else {
    print "compatible trees\n";
  }

__END__;

# this is the file 'input.tre':
(((A,B)C,D),(E,F,G));
((A,B)H,E,(J,(K)G)I);

# this is the full messages I got running like this: "perl.exe -w t.pl"
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96.
Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145.
Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162.
Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196.
Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211.
Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257.
Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278.
Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314.
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100.
Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152.
Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190.
Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252.
Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300.
Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334.
Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375.
Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399.
Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449.
Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491.
Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505.
Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526.
Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552.
Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577.
Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597.
Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617.
Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637.
Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653.
Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669.
Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685.
Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690.
Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717.
Can't locate object method "is_compatible" via package "Bio::Tree::Tree" at Z:\bp\t.pl line 8,  line 2.




From bix at sendu.me.uk  Thu Jan 17 11:18:56 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 17 Jan 2008 11:18:56 +0000
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
 package "Bio::Tree::Tree"
In-Reply-To: <200801171648332965577@gmail.com>
References: <200801171648332965577@gmail.com>
Message-ID: <478F39A0.2030508@sendu.me.uk>

viping wrote:
> Hi Everyone?
> 
> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + 
> Windows XP SP2. When running example codes(attched below as t.pl) 
> within Bio\Tree\Compatible.pm , I got this error:
> 
> Can't locate object method "is_compatible" via package 
> "Bio::Tree::Tree"
> 
> I replaced "$t1->is_compatible($t2)" with "is_compatible 
> Bio::Tree::Compatible ($t1,$t2)",

Yup, you had the right idea; unfortunately the synopsis code for
Bio::Tree::Compatible is wrong.
I've now fixed it in svn.


> the error changed: Can't locate object method "get_nodes" via package
>  "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm 
> line 252,  line 1.

I didn't get quite that error; instead I had an issue with TreeIO: for
whatever reason it is only returning one tree from your input file (ie.
$t2 is undefined).

I therefore got "Can't call method "get_nodes" on an undefined value [...]"

Can someone look into/confirm that?



From bix at sendu.me.uk  Thu Jan 17 11:35:57 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 17 Jan 2008 11:35:57 +0000
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
 package "Bio::Tree::Tree"
In-Reply-To: <478F39A0.2030508@sendu.me.uk>
References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk>
Message-ID: <478F3D9D.6050306@sendu.me.uk>

Sendu Bala wrote:
>> the error changed: Can't locate object method "get_nodes" via
>> package "Bio::Tree::Compatible" at
>> i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252,  line 1.
> 
> I didn't get quite that error; instead I had an issue with TreeIO:
> for whatever reason it is only returning one tree from your input
> file (ie. $t2 is undefined).
> 
> I therefore got "Can't call method "get_nodes" on an undefined value
> [...]"
> 
> Can someone look into/confirm that?

... Yeah, I think I'm losing my mind. The code below is 'ok' using the
commented out -fh input for TreeIO, but is 'not ok' using the -file
input, where the specified file contains the exact same data as
__DATA__. Huh?


#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::Tree::Compatible;
use Bio::TreeIO;
my $input = new Bio::TreeIO('-format' => 'newick',
                             #-fh      => \*DATA,
                             -file    => 'input.tre'
                             );
my $t1 = $input->next_tree;
my $t2 = $input->next_tree;

if ($t2) {
    print "ok\n";
}
else {
    print "not ok\n";
}

__DATA__
(((A,B)C,D),(E,F,G));
((A,B)H,E,(J,(K)G)I);




From vipingjo at gmail.com  Thu Jan 17 13:23:14 2008
From: vipingjo at gmail.com (viping)
Date: Thu, 17 Jan 2008 21:23:14 +0800
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
	package"Bio::Tree::Tree"
References: <200801171648332965577@gmail.com>, <478F39A0.2030508@sendu.me.uk>
Message-ID: <200801172123112184046@gmail.com>

I got latest  code modified by Sendu Bala vi SVN. It works well while "input.tre" and "t.pl" are in the same directory. Thank you, Sendu Bala.  

This is output:
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96.
Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145.
Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162.
Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196.
Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211.
Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257.
Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278.
Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314.
Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100.
Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152.
Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190.
Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252.
Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300.
Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334.
Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375.
Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399.
Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420.
Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449.
Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491.
Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505.
Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526.
Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552.
Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577.
Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597.
Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617.
Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637.
Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653.
Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669.
Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685.
Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690.
Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717.
incompatible trees
label G cluster G cluster G K
cluster A B C properly intersects cluster A B H
cluster A B C properly intersects cluster A B E G H I J K
cluster A B C D properly intersects cluster A B H
cluster A B C D properly intersects cluster A B E G H I J K
cluster E F G properly intersects cluster G K
cluster E F G properly intersects cluster G I J K
cluster E F G properly intersects cluster A B E G H I J K
cluster A B C D E F G properly intersects cluster A B H
cluster A B C D E F G properly intersects cluster G K
cluster A B C D E F G properly intersects cluster G I J K
cluster A B C D E F G properly intersects cluster A B E G H I J K

#this is latest code:
  use Bio::Tree::Compatible;
  use Bio::TreeIO;
  my $input = Bio::TreeIO->new('-format' => 'newick',
                               '-file'   => 'input.tre');
  my $t1 = $input->next_tree;
  my $t2 = $input->next_tree;

  my ($incompat, $ilabels, $inodes) = Bio::Tree::Compatible::is_compatible($t1,$t2);
  if ($incompat) {
    my %cluster1 = %{ Bio::Tree::Compatible::cluster_representation($t1) };
    my %cluster2 = %{ Bio::Tree::Compatible::cluster_representation($t2) };
    print "incompatible trees\n";
    if (scalar(@$ilabels)) {
      foreach my $label (@$ilabels) {
        my $node1 = $t1->find_node(-id => $label);
        my $node2 = $t2->find_node(-id => $label);
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "label $label";
        print " cluster"; map { print " ",$_ } @c1;
        print " cluster"; map { print " ",$_ } @c2; print "\n";
      }
    }
    if (scalar(@$inodes)) {
      while (@$inodes) {
        my $node1 = shift @$inodes;
        my $node2 = shift @$inodes;
        my @c1 = sort @{ $cluster1{$node1} };
        my @c2 = sort @{ $cluster2{$node2} };
        print "cluster"; map { print " ",$_ } @c1;
        print " properly intersects cluster";
        map { print " ",$_ } @c2; print "\n";
      }
    }
  } else {
    print "compatible trees\n";
  }


------------------				 
viping
2008-01-17

-------------------------------------------------------------
From: Sendu Bala
Date: 2008-01-17 19:19:30
To: viping
Cc: bioperl-l
Subject: Re: [Bioperl-l] Can't locate object method "is_compatible" via package"Bio::Tree::Tree"

viping wrote:
> Hi Everyone?
> 
> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + 
> Windows XP SP2. When running example codes(attched below as t.pl) 
> within Bio\Tree\Compatible.pm , I got this error:
> 
> Can't locate object method "is_compatible" via package 
> "Bio::Tree::Tree"
> 
> I replaced "$t1->is_compatible($t2)" with "is_compatible 
> Bio::Tree::Compatible ($t1,$t2)",

Yup, you had the right idea; unfortunately the synopsis code for
Bio::Tree::Compatible is wrong.
I've now fixed it in svn.


> the error changed: Can't locate object method "get_nodes" via package
>  "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm 
> line 252,  line 1.

I didn't get quite that error; instead I had an issue with TreeIO: for
whatever reason it is only returning one tree from your input file (ie.
$t2 is undefined).

I therefore got "Can't call method "get_nodes" on an undefined value [...]"

Can someone look into/confirm that?



From cjfields at uiuc.edu  Thu Jan 17 13:25:41 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 17 Jan 2008 07:25:41 -0600
Subject: [Bioperl-l] Can't locate object method "is_compatible" via
	package "Bio::Tree::Tree"
In-Reply-To: <478F39A0.2030508@sendu.me.uk>
References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk>
Message-ID: <7BF3650B-F1D4-4F21-9C59-3AC13CA35945@uiuc.edu>

Probably need to file this as a bug.  There is a similar issue with  
Bio::TreeIO::nexus, but it probably isn't related unless it is using  
the same parsing logic:

http://bugzilla.open-bio.org/show_bug.cgi?id=2356

chris

On Jan 17, 2008, at 5:18 AM, Sendu Bala wrote:

> viping wrote:
>> Hi Everyone?
>>
>> I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 +
>> Windows XP SP2. When running example codes(attched below as t.pl)
>> within Bio\Tree\Compatible.pm , I got this error:
>>
>> Can't locate object method "is_compatible" via package
>> "Bio::Tree::Tree"
>>
>> I replaced "$t1->is_compatible($t2)" with "is_compatible
>> Bio::Tree::Compatible ($t1,$t2)",
>
> Yup, you had the right idea; unfortunately the synopsis code for
> Bio::Tree::Compatible is wrong.
> I've now fixed it in svn.
>
>
>> the error changed: Can't locate object method "get_nodes" via package
>> "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm
>> line 252,  line 1.
>
> I didn't get quite that error; instead I had an issue with TreeIO: for
> whatever reason it is only returning one tree from your input file  
> (ie.
> $t2 is undefined).
>
> I therefore got "Can't call method "get_nodes" on an undefined value  
> [...]"
>
> Can someone look into/confirm that?
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






From N.Haigh at sheffield.ac.uk  Fri Jan 18 12:47:48 2008
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 18 Jan 2008 12:47:48 +0000
Subject: [Bioperl-l] Parsing Primer3 output
Message-ID: <1200660468.47909ff498dd0@webmail.shef.ac.uk>

I might be overlooking something, but is it possible to parse primer3 output?

Cheers
Nath



From cjfields at uiuc.edu  Fri Jan 18 13:27:47 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 18 Jan 2008 07:27:47 -0600
Subject: [Bioperl-l] Parsing Primer3 output
In-Reply-To: <1200660468.47909ff498dd0@webmail.shef.ac.uk>
References: <1200660468.47909ff498dd0@webmail.shef.ac.uk>
Message-ID: <8C8BF818-FC04-42E3-9210-3FE23F92EA8F@uiuc.edu>

Bio::Tools::Primer3.

chris

On Jan 18, 2008, at 6:47 AM, Nathan S. Haigh wrote:

> I might be overlooking something, but is it possible to parse  
> primer3 output?
>
> Cheers
> Nath
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hangsyin at gmail.com  Sat Jan 19 18:25:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sat, 19 Jan 2008 10:25:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an undefined
 value at BIO::DB::GFF.pl
Message-ID: <14971922.post@talk.nabble.com>


Hi, everyone,

I met this problem when I was running this script to extract features
overlaps with 4:20,000..25,000. It always responds like "Can't call method
"features" on an undefined value at BIO::DB::GFF.pl line XX".
==============================================================
use Bio::DB::GFF;
use Bio::Tools::GFF;
my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
                                        -dsn =>
'dbi:mysql:dmel_gff:localhost',
                                        -user => 'XXXX',
                                        -pass => 'XXXX') || die "database
open failed";

my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
my @features = $segment->features(-types => ['gene', 'exon', 'intron',
'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
print(scalar(@features)."\n");

================================================================
I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
Other methods failed also. 

Any help will be deeply appreciated!

Best,
Jon

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14971922.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain.cshl at gmail.com  Sun Jan 20 03:36:44 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Sat, 19 Jan 2008 22:36:44 -0500
Subject: [Bioperl-l] Problem: Can't call method "features" on
	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <14971922.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com>
Message-ID: <1200800204.6069.5.camel@frissell>

Hi Jon,

I think it's funny that you have "or die" on the database opening line,
"or die" on the @features line, but you didn't put one on the $segment
line.  Try adding "or die: $!" to the $segment line to see what it says,
also add a 'print $segment' after you create it and before you try to
get the features from it.  

Clearly, the problem is that $segment is not defined (that is, nothing
is in it, not that the wrong thing is in it).  The next trick is to find
out why.  My first guess, without looking at the data set, is that the
arm is not really named '4'.

Scott

On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> Hi, everyone,
> 
> I met this problem when I was running this script to extract features
> overlaps with 4:20,000..25,000. It always responds like "Can't call method
> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> ==============================================================
> use Bio::DB::GFF;
> use Bio::Tools::GFF;
> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>                                         -dsn =>
> 'dbi:mysql:dmel_gff:localhost',
>                                         -user => 'XXXX',
>                                         -pass => 'XXXX') || die "database
> open failed";
> 
> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> print(scalar(@features)."\n");
> 
> ================================================================
> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
> Other methods failed also. 
> 
> Any help will be deeply appreciated!
> 
> Best,
> Jon
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hangsyin at gmail.com  Sun Jan 20 03:49:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sat, 19 Jan 2008 19:49:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on
	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <1200800204.6069.5.camel@frissell>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
Message-ID: <14978241.post@talk.nabble.com>


Hi, Scott,

After adding die $!, I know something is wrong at line:
"my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);"

my gff file is like this:
##gff-version 3
##sequence-region 4 1 1351857
4	FlyBase	transposable_element	2	611	.	+	.
ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
4	repeatmasker_dummy	match	2	347	.	+	.
ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
4	repeatmasker_dummy	match_part	2	347	2367	+	.
ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
5860 6210 +;
...
...
I really got confused. Any further suggestion? Thank you!

Jon





Scott Cain-3 wrote:
> 
> Hi Jon,
> 
> I think it's funny that you have "or die" on the database opening line,
> "or die" on the @features line, but you didn't put one on the $segment
> line.  Try adding "or die: $!" to the $segment line to see what it says,
> also add a 'print $segment' after you create it and before you try to
> get the features from it.  
> 
> Clearly, the problem is that $segment is not defined (that is, nothing
> is in it, not that the wrong thing is in it).  The next trick is to find
> out why.  My first guess, without looking at the data set, is that the
> arm is not really named '4'.
> 
> Scott
> 
> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>> Hi, everyone,
>> 
>> I met this problem when I was running this script to extract features
>> overlaps with 4:20,000..25,000. It always responds like "Can't call
>> method
>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>> ==============================================================
>> use Bio::DB::GFF;
>> use Bio::Tools::GFF;
>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>                                         -dsn =>
>> 'dbi:mysql:dmel_gff:localhost',
>>                                         -user => 'XXXX',
>>                                         -pass => 'XXXX') || die "database
>> open failed";
>> 
>> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
>> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
>> print(scalar(@features)."\n");
>> 
>> ================================================================
>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
>> Other methods failed also. 
>> 
>> Any help will be deeply appreciated!
>> 
>> Best,
>> Jon
>> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14978241.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain.cshl at gmail.com  Sun Jan 20 04:08:04 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Sat, 19 Jan 2008 23:08:04 -0500
Subject: [Bioperl-l] Problem: Can't call method "features"
	on	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <14978241.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com>
	<1200800204.6069.5.camel@frissell>  <14978241.post@talk.nabble.com>
Message-ID: <1200802084.6069.11.camel@frissell>

Hi Jon,

Well, seeing the error message would be helpful, but my first guess
without is that there are a few things you can try:

  * removing the "sequence-region" line from the GFF file, adding a line
like this:

  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4

and then reloading the database.

  * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
is, with three levels of features (like gene, mRNA and CDS)).

Scott

On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
> Hi, Scott,
> 
> After adding die $!, I know something is wrong at line:
> "my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);"
> 
> my gff file is like this:
> ##gff-version 3
> ##sequence-region 4 1 1351857
> 4	FlyBase	transposable_element	2	611	.	+	.
> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
> 4	repeatmasker_dummy	match	2	347	.	+	.
> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
> 5860 6210 +;
> ...
> ...
> I really got confused. Any further suggestion? Thank you!
> 
> Jon
> 
> 
> 
> 
> 
> Scott Cain-3 wrote:
> > 
> > Hi Jon,
> > 
> > I think it's funny that you have "or die" on the database opening line,
> > "or die" on the @features line, but you didn't put one on the $segment
> > line.  Try adding "or die: $!" to the $segment line to see what it says,
> > also add a 'print $segment' after you create it and before you try to
> > get the features from it.  
> > 
> > Clearly, the problem is that $segment is not defined (that is, nothing
> > is in it, not that the wrong thing is in it).  The next trick is to find
> > out why.  My first guess, without looking at the data set, is that the
> > arm is not really named '4'.
> > 
> > Scott
> > 
> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> >> Hi, everyone,
> >> 
> >> I met this problem when I was running this script to extract features
> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
> >> method
> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> >> ==============================================================
> >> use Bio::DB::GFF;
> >> use Bio::Tools::GFF;
> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >>                                         -dsn =>
> >> 'dbi:mysql:dmel_gff:localhost',
> >>                                         -user => 'XXXX',
> >>                                         -pass => 'XXXX') || die "database
> >> open failed";
> >> 
> >> my $segment = $db->segment(-name => '4', -start => 20000, -end => 25000);
> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> >> print(scalar(@features)."\n");
> >> 
> >> ================================================================
> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any error.
> >> Other methods failed also. 
> >> 
> >> Any help will be deeply appreciated!
> >> 
> >> Best,
> >> Jon
> >> 
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hangsyin at gmail.com  Sun Jan 20 15:08:59 2008
From: hangsyin at gmail.com (Hang)
Date: Sun, 20 Jan 2008 07:08:59 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features"
	on	an	undefined value at BIO::DB::GFF.pl
In-Reply-To: <1200802084.6069.11.camel@frissell>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
Message-ID: <14982665.post@talk.nabble.com>


Hi, Scott,
I tried to change sequence-region line to "4   FlyBase  chromosome_arm  1 
1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say anything but
"died at line 12".

So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to
load the dmel-all-r5.4.gff(from Flybase) to a test database:
=============================================================
use Bio::DB::SeqFeature::Store;
use Bio::DB::SeqFeature::Store::GFF3Loader;
my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
                                         -dsn     => 'dbi:mysql:test',
                                         -user    => 'root',
                                         -pass    => 'XXXXX',
                                         -write   =>  1 );
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    => $db,
                                                         -verbose  => 1);
$loader->load(./'dmel-all-r5.4.gff');
=============================================================
I got bunch of errors like this:
"DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at
C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316".
The line 1316 in mysql.pm looks like this: $sth->execute($name) or die
$sth->errstr;
I checked the database test after failed loading. There is only one table
created, which call 'meta'. I also tried 'grant all on test to
XXX at localhost' and used that -user and -pass to load gff, it didn't work
either.

Jon


Scott Cain-3 wrote:
> 
> Hi Jon,
> 
> Well, seeing the error message would be helpful, but my first guess
> without is that there are a few things you can try:
> 
>   * removing the "sequence-region" line from the GFF file, adding a line
> like this:
> 
>   4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
> 
> and then reloading the database.
> 
>   * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
> is, with three levels of features (like gene, mRNA and CDS)).
> 
> Scott
> 
> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>> Hi, Scott,
>> 
>> After adding die $!, I know something is wrong at line:
>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>> 25000);"
>> 
>> my gff file is like this:
>> ##gff-version 3
>> ##sequence-region 4 1 1351857
>> 4	FlyBase	transposable_element	2	611	.	+	.
>> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>> 4	repeatmasker_dummy	match	2	347	.	+	.
>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
>> 5860 6210 +;
>> ...
>> ...
>> I really got confused. Any further suggestion? Thank you!
>> 
>> Jon
>> 
>> 
>> 
>> 
>> 
>> Scott Cain-3 wrote:
>> > 
>> > Hi Jon,
>> > 
>> > I think it's funny that you have "or die" on the database opening line,
>> > "or die" on the @features line, but you didn't put one on the $segment
>> > line.  Try adding "or die: $!" to the $segment line to see what it
>> says,
>> > also add a 'print $segment' after you create it and before you try to
>> > get the features from it.  
>> > 
>> > Clearly, the problem is that $segment is not defined (that is, nothing
>> > is in it, not that the wrong thing is in it).  The next trick is to
>> find
>> > out why.  My first guess, without looking at the data set, is that the
>> > arm is not really named '4'.
>> > 
>> > Scott
>> > 
>> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>> >> Hi, everyone,
>> >> 
>> >> I met this problem when I was running this script to extract features
>> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
>> >> method
>> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>> >> ==============================================================
>> >> use Bio::DB::GFF;
>> >> use Bio::Tools::GFF;
>> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>> >>                                         -dsn =>
>> >> 'dbi:mysql:dmel_gff:localhost',
>> >>                                         -user => 'XXXX',
>> >>                                         -pass => 'XXXX') || die
>> "database
>> >> open failed";
>> >> 
>> >> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>> 25000);
>> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
>> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
>> >> print(scalar(@features)."\n");
>> >> 
>> >> ================================================================
>> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
>> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>> error.
>> >> Other methods failed also. 
>> >> 
>> >> Any help will be deeply appreciated!
>> >> 
>> >> Best,
>> >> Jon
>> >> 
>> > -- 
>> >
>> ------------------------------------------------------------------------
>> > Scott Cain, Ph. D.                                        
>> cain at cshl.edu
>> > GMOD Coordinator (http://www.gmod.org/)                    
>> 216-392-3087
>> > Cold Spring Harbor Laboratory
>> > 
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > 
>> > 
>> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cain at cshl.edu  Sun Jan 20 15:25:16 2008
From: cain at cshl.edu (Scott Cain)
Date: Sun, 20 Jan 2008 10:25:16 -0500 (EST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an
 undefined value at BIO::DB::GFF.pl
In-Reply-To: <14982665.post@talk.nabble.com>
Message-ID: 

Jon,

There is a script for loading a SeqFeature database just like the GFF
database, though I don't know what it's called off hand (I'm not at my
normal computer right now).  Be sure to read the documentation and you
will probably want to use the 'fast' option (I don't remember what it is
called either).

Scott


----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain at cshl.edu
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Sun, 20 Jan 2008, Hang wrote:

> 
> Hi, Scott,
> I tried to change sequence-region line to "4   FlyBase  chromosome_arm  1 
> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say anything but
> "died at line 12".
> 
> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my code to
> load the dmel-all-r5.4.gff(from Flybase) to a test database:
> =============================================================
> use Bio::DB::SeqFeature::Store;
> use Bio::DB::SeqFeature::Store::GFF3Loader;
> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>                                          -dsn     => 'dbi:mysql:test',
>                                          -user    => 'root',
>                                          -pass    => 'XXXXX',
>                                          -write   =>  1 );
> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    => $db,
>                                                          -verbose  => 1);
> $loader->load(./'dmel-all-r5.4.gff');
> =============================================================
> I got bunch of errors like this:
> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't exist at
> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line 1316".
> The line 1316 in mysql.pm looks like this: $sth->execute($name) or die
> $sth->errstr;
> I checked the database test after failed loading. There is only one table
> created, which call 'meta'. I also tried 'grant all on test to
> XXX at localhost' and used that -user and -pass to load gff, it didn't work
> either.
> 
> Jon
> 
> 
> Scott Cain-3 wrote:
> > 
> > Hi Jon,
> > 
> > Well, seeing the error message would be helpful, but my first guess
> > without is that there are a few things you can try:
> > 
> >   * removing the "sequence-region" line from the GFF file, adding a line
> > like this:
> > 
> >   4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
> > 
> > and then reloading the database.
> > 
> >   * Or, you may want to consider using Bio::DB::SeqFeature::Store, since
> > Bio::DB::GFF3 doesn't always behave correctly with complex GFF3 (that
> > is, with three levels of features (like gene, mRNA and CDS)).
> > 
> > Scott
> > 
> > On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
> >> Hi, Scott,
> >> 
> >> After adding die $!, I know something is wrong at line:
> >> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
> >> 25000);"
> >> 
> >> my gff file is like this:
> >> ##gff-version 3
> >> ##sequence-region 4 1 1351857
> >> 4	FlyBase	transposable_element	2	611	.	+	.
> >> ID=FBti0062890;Name=ninja-Dsim-like{}4829;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
> >> 4	repeatmasker_dummy	match	2	347	.	+	.
> >> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-RepeatMasker;
> >> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
> >> ID=:5142029_dummy;Name=:5142029;Parent=:1395923_repeatmasker_dummy;target_type=so;Target=D83207
> >> 5860 6210 +;
> >> ...
> >> ...
> >> I really got confused. Any further suggestion? Thank you!
> >> 
> >> Jon
> >> 
> >> 
> >> 
> >> 
> >> 
> >> Scott Cain-3 wrote:
> >> > 
> >> > Hi Jon,
> >> > 
> >> > I think it's funny that you have "or die" on the database opening line,
> >> > "or die" on the @features line, but you didn't put one on the $segment
> >> > line.  Try adding "or die: $!" to the $segment line to see what it
> >> says,
> >> > also add a 'print $segment' after you create it and before you try to
> >> > get the features from it.  
> >> > 
> >> > Clearly, the problem is that $segment is not defined (that is, nothing
> >> > is in it, not that the wrong thing is in it).  The next trick is to
> >> find
> >> > out why.  My first guess, without looking at the data set, is that the
> >> > arm is not really named '4'.
> >> > 
> >> > Scott
> >> > 
> >> > On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
> >> >> Hi, everyone,
> >> >> 
> >> >> I met this problem when I was running this script to extract features
> >> >> overlaps with 4:20,000..25,000. It always responds like "Can't call
> >> >> method
> >> >> "features" on an undefined value at BIO::DB::GFF.pl line XX".
> >> >> ==============================================================
> >> >> use Bio::DB::GFF;
> >> >> use Bio::Tools::GFF;
> >> >> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >> >>                                         -dsn =>
> >> >> 'dbi:mysql:dmel_gff:localhost',
> >> >>                                         -user => 'XXXX',
> >> >>                                         -pass => 'XXXX') || die
> >> "database
> >> >> open failed";
> >> >> 
> >> >> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
> >> 25000);
> >> >> my @features = $segment->features(-types => ['gene', 'exon', 'intron',
> >> >> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no features";
> >> >> print(scalar(@features)."\n");
> >> >> 
> >> >> ================================================================
> >> >> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I loaded
> >> >> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
> >> error.
> >> >> Other methods failed also. 
> >> >> 
> >> >> Any help will be deeply appreciated!
> >> >> 
> >> >> Best,
> >> >> Jon
> >> >> 
> >> > -- 
> >> >
> >> ------------------------------------------------------------------------
> >> > Scott Cain, Ph. D.                                        
> >> cain at cshl.edu
> >> > GMOD Coordinator (http://www.gmod.org/)                    
> >> 216-392-3087
> >> > Cold Spring Harbor Laboratory
> >> > 
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > 
> >> > 
> >> 
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> 
> -- 
> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From cjfields at uiuc.edu  Sun Jan 20 17:10:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 20 Jan 2008 11:10:27 -0600
Subject: [Bioperl-l] Problem: Can't call method "features" on an
	undefined value at BIO::DB::GFF.pl
In-Reply-To: 
References: 
Message-ID: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>

It's bp_seqfeature_load.pl (if you have the full bioperl core  
distribution, it's in script/Bio-SeqFeature/Store).  I had some  
problems with the fast-loading option but it was likely just my gff  
formatting; example data loaded just fine.

As for the error, you need to use the '-create' flag when initializing  
a database (or wiping data from a current one):

=============================================================
use Bio::DB::SeqFeature::Store;
use Bio::DB::SeqFeature::Store::GFF3Loader;
my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
                                         -dsn     => 'dbi:mysql:test',
                                         -user    => 'root',
                                         -pass    => 'XXXXX',
                                         -write   =>  1
                                         -create  => 1);
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>  
$db,
                                                         -verbose  =>  
1);
$loader->load(./'dmel-all-r5.4.gff');
=============================================================

chris

On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:

> Jon,
>
> There is a script for loading a SeqFeature database just like the GFF
> database, though I don't know what it's called off hand (I'm not at my
> normal computer right now).  Be sure to read the documentation and you
> will probably want to use the 'fast' option (I don't remember what  
> it is
> called either).
>
> Scott
>
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain at cshl.edu
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Sun, 20 Jan 2008, Hang wrote:
>
>>
>> Hi, Scott,
>> I tried to change sequence-region line to "4   FlyBase   
>> chromosome_arm  1
>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say  
>> anything but
>> "died at line 12".
>>
>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my  
>> code to
>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>> =============================================================
>> use Bio::DB::SeqFeature::Store;
>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>                                         -dsn     => 'dbi:mysql:test',
>>                                         -user    => 'root',
>>                                         -pass    => 'XXXXX',
>>                                         -write   =>  1 );
>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store     
>> => $db,
>>                                                         -verbose   
>> => 1);
>> $loader->load(./'dmel-all-r5.4.gff');
>> =============================================================
>> I got bunch of errors like this:
>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't  
>> exist at
>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line  
>> 1316".
>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or  
>> die
>> $sth->errstr;
>> I checked the database test after failed loading. There is only one  
>> table
>> created, which call 'meta'. I also tried 'grant all on test to
>> XXX at localhost' and used that -user and -pass to load gff, it didn't  
>> work
>> either.
>>
>> Jon
>>
>>
>> Scott Cain-3 wrote:
>>>
>>> Hi Jon,
>>>
>>> Well, seeing the error message would be helpful, but my first guess
>>> without is that there are a few things you can try:
>>>
>>>  * removing the "sequence-region" line from the GFF file, adding a  
>>> line
>>> like this:
>>>
>>>  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>
>>> and then reloading the database.
>>>
>>>  * Or, you may want to consider using Bio::DB::SeqFeature::Store,  
>>> since
>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3  
>>> (that
>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>
>>> Scott
>>>
>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>> Hi, Scott,
>>>>
>>>> After adding die $!, I know something is wrong at line:
>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>> 25000);"
>>>>
>>>> my gff file is like this:
>>>> ##gff-version 3
>>>> ##sequence-region 4 1 1351857
>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>> ID=FBti0062890;Name=ninja-Dsim- 
>>>> like 
>>>> {}4829 
>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- 
>>>> RepeatMasker;
>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>> ID=:5142029_dummy;Name=:5142029;Parent=: 
>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>> 5860 6210 +;
>>>> ...
>>>> ...
>>>> I really got confused. Any further suggestion? Thank you!
>>>>
>>>> Jon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Scott Cain-3 wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> I think it's funny that you have "or die" on the database  
>>>>> opening line,
>>>>> "or die" on the @features line, but you didn't put one on the  
>>>>> $segment
>>>>> line.  Try adding "or die: $!" to the $segment line to see what it
>>>> says,
>>>>> also add a 'print $segment' after you create it and before you  
>>>>> try to
>>>>> get the features from it.
>>>>>
>>>>> Clearly, the problem is that $segment is not defined (that is,  
>>>>> nothing
>>>>> is in it, not that the wrong thing is in it).  The next trick is  
>>>>> to
>>>> find
>>>>> out why.  My first guess, without looking at the data set, is  
>>>>> that the
>>>>> arm is not really named '4'.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>> Hi, everyone,
>>>>>>
>>>>>> I met this problem when I was running this script to extract  
>>>>>> features
>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't  
>>>>>> call
>>>>>> method
>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>> ==============================================================
>>>>>> use Bio::DB::GFF;
>>>>>> use Bio::Tools::GFF;
>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>                                        -dsn =>
>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>                                        -user => 'XXXX',
>>>>>>                                        -pass => 'XXXX') || die
>>>> "database
>>>>>> open failed";
>>>>>>
>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>> 25000);
>>>>>> my @features = $segment->features(-types => ['gene', 'exon',  
>>>>>> 'intron',
>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no  
>>>>>> features";
>>>>>> print(scalar(@features)."\n");
>>>>>>
>>>>>> ================================================================
>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I  
>>>>>> loaded
>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>>>> error.
>>>>>> Other methods failed also.
>>>>>>
>>>>>> Any help will be deeply appreciated!
>>>>>>
>>>>>> Best,
>>>>>> Jon
>>>>>>
>>>>> -- 
>>>>>
>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>> -- 
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                         cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From ykumagai at biken.osaka-u.ac.jp  Mon Jan 21 16:56:53 2008
From: ykumagai at biken.osaka-u.ac.jp (Yutaro Kumagai)
Date: Tue, 22 Jan 2008 01:56:53 +0900
Subject: [Bioperl-l] Problem with Bio::ASN1::EntrezGene::Indexer
Message-ID: <4794CED5.3070307@biken.osaka-u.ac.jp>

Hi, everyone,

I'm working on Bio::ASN1::EntrezGene::Indexer as below:

###
use Bio::ASN1::EntrezGene::Indexer
use Bio::ASN1::EntrezGene
use Bio::SeqIO;

my $inx = Bio::ASN1::EntrezGene::Indexer->new(-filename =>
					      'c:/chrm/asn/entrezgene.idx');

# The index file has already been made successfully. I checked it
# by counting the num. of records by $inx -> count_records etc. etc.

my $seq1 = $inx -> fetch_hash(15959);

# The ID 15969 surely exists, because I had no err message and
# by dumpening $seq1, I confirmed that $seq1 contains some data.

my $seq2 = $inx -> fetch(15969);
###

However, the last method returned this error:
"you must pass in a file name or handle through new() or input_file() first
before calling next_seq!
at C:/Perl/site/lib/Bio\SeqIO\entrezgene.pm line 136".

I chased the programm by the debugger, and found that somehow _fh()
in Bio::Index::AbstractSeq failed to pass the filehandle to fetch.

Now, I have two questions:

1) what's wrong with the above methods? Is this a bug? Or just my
fault? If so, what is my fault?

2) If I could'nt work with "fetch", how can I extract the data
of sequences (position in genomic contig, strand etc.) from
the data obtained by "fetch_hash"? Now I can't understand how
the data structure of results by "fetch_hash" is...

Thank you in advance.

Yutaro Kumagai.

-- 
**********************************
Yutaro Kumagai
Dept. of Host Defense
Res. Inst. for Microbial Diseases
Osaka University
Japan
ykumagai at biken.osaka-u.ac.jp
**********************************


From hangsyin at gmail.com  Mon Jan 21 19:22:55 2008
From: hangsyin at gmail.com (Hang)
Date: Mon, 21 Jan 2008 11:22:55 -0800 (PST)
Subject: [Bioperl-l] Problem: Can't call method "features" on an
 undefined value at BIO::DB::GFF.pl
In-Reply-To: <3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
	<14982665.post@talk.nabble.com>
	
	<3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
Message-ID: <15004412.post@talk.nabble.com>


Hi, Chris:

Following your suggestion, I added -create flag and the GFF3loader started
to work. Thanks alot!
When I load dmel-all-5.4.gff into mysql with -fast, I had the following
error:
   Data too long for column 'attribute_value' at c:/../../../mysql.pm line
510
If I don't use -fast, it is OK, except for the annoying slow speed. Do you
have any suggestion on this?

Best,
Hang




Chris Fields wrote:
> 
> It's bp_seqfeature_load.pl (if you have the full bioperl core  
> distribution, it's in script/Bio-SeqFeature/Store).  I had some  
> problems with the fast-loading option but it was likely just my gff  
> formatting; example data loaded just fine.
> 
> As for the error, you need to use the '-create' flag when initializing  
> a database (or wiping data from a current one):
> 
> =============================================================
> use Bio::DB::SeqFeature::Store;
> use Bio::DB::SeqFeature::Store::GFF3Loader;
> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>                                          -dsn     => 'dbi:mysql:test',
>                                          -user    => 'root',
>                                          -pass    => 'XXXXX',
>                                          -write   =>  1
>                                          -create  => 1);
> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>  
> $db,
>                                                          -verbose  =>  
> 1);
> $loader->load(./'dmel-all-r5.4.gff');
> =============================================================
> 
> chris
> 
> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:
> 
>> Jon,
>>
>> There is a script for loading a SeqFeature database just like the GFF
>> database, though I don't know what it's called off hand (I'm not at my
>> normal computer right now).  Be sure to read the documentation and you
>> will probably want to use the 'fast' option (I don't remember what  
>> it is
>> called either).
>>
>> Scott
>>
>>
>> ----------------------------------------------------------------------
>> Scott Cain, Ph. D.				 	 cain at cshl.edu
>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>> ----------------------------------------------------------------------
>>
>>
>> On Sun, 20 Jan 2008, Hang wrote:
>>
>>>
>>> Hi, Scott,
>>> I tried to change sequence-region line to "4   FlyBase   
>>> chromosome_arm  1
>>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say  
>>> anything but
>>> "died at line 12".
>>>
>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my  
>>> code to
>>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>>> =============================================================
>>> use Bio::DB::SeqFeature::Store;
>>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>>                                         -dsn     => 'dbi:mysql:test',
>>>                                         -user    => 'root',
>>>                                         -pass    => 'XXXXX',
>>>                                         -write   =>  1 );
>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store     
>>> => $db,
>>>                                                         -verbose   
>>> => 1);
>>> $loader->load(./'dmel-all-r5.4.gff');
>>> =============================================================
>>> I got bunch of errors like this:
>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't  
>>> exist at
>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line  
>>> 1316".
>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or  
>>> die
>>> $sth->errstr;
>>> I checked the database test after failed loading. There is only one  
>>> table
>>> created, which call 'meta'. I also tried 'grant all on test to
>>> XXX at localhost' and used that -user and -pass to load gff, it didn't  
>>> work
>>> either.
>>>
>>> Jon
>>>
>>>
>>> Scott Cain-3 wrote:
>>>>
>>>> Hi Jon,
>>>>
>>>> Well, seeing the error message would be helpful, but my first guess
>>>> without is that there are a few things you can try:
>>>>
>>>>  * removing the "sequence-region" line from the GFF file, adding a  
>>>> line
>>>> like this:
>>>>
>>>>  4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>>
>>>> and then reloading the database.
>>>>
>>>>  * Or, you may want to consider using Bio::DB::SeqFeature::Store,  
>>>> since
>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3  
>>>> (that
>>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>>
>>>> Scott
>>>>
>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>>> Hi, Scott,
>>>>>
>>>>> After adding die $!, I know something is wrong at line:
>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>>> 25000);"
>>>>>
>>>>> my gff file is like this:
>>>>> ##gff-version 3
>>>>> ##sequence-region 4 1 1351857
>>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>>> ID=FBti0062890;Name=ninja-Dsim- 
>>>>> like 
>>>>> {}4829 
>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy- 
>>>>> RepeatMasker;
>>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>>> ID=:5142029_dummy;Name=:5142029;Parent=: 
>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>>> 5860 6210 +;
>>>>> ...
>>>>> ...
>>>>> I really got confused. Any further suggestion? Thank you!
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Scott Cain-3 wrote:
>>>>>>
>>>>>> Hi Jon,
>>>>>>
>>>>>> I think it's funny that you have "or die" on the database  
>>>>>> opening line,
>>>>>> "or die" on the @features line, but you didn't put one on the  
>>>>>> $segment
>>>>>> line.  Try adding "or die: $!" to the $segment line to see what it
>>>>> says,
>>>>>> also add a 'print $segment' after you create it and before you  
>>>>>> try to
>>>>>> get the features from it.
>>>>>>
>>>>>> Clearly, the problem is that $segment is not defined (that is,  
>>>>>> nothing
>>>>>> is in it, not that the wrong thing is in it).  The next trick is  
>>>>>> to
>>>>> find
>>>>>> out why.  My first guess, without looking at the data set, is  
>>>>>> that the
>>>>>> arm is not really named '4'.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>>> Hi, everyone,
>>>>>>>
>>>>>>> I met this problem when I was running this script to extract  
>>>>>>> features
>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't  
>>>>>>> call
>>>>>>> method
>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>>> ==============================================================
>>>>>>> use Bio::DB::GFF;
>>>>>>> use Bio::Tools::GFF;
>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>>                                        -dsn =>
>>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>>                                        -user => 'XXXX',
>>>>>>>                                        -pass => 'XXXX') || die
>>>>> "database
>>>>>>> open failed";
>>>>>>>
>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, -end =>
>>>>> 25000);
>>>>>>> my @features = $segment->features(-types => ['gene', 'exon',  
>>>>>>> 'intron',
>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no  
>>>>>>> features";
>>>>>>> print(scalar(@features)."\n");
>>>>>>>
>>>>>>> ================================================================
>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I  
>>>>>>> loaded
>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without any
>>>>> error.
>>>>>>> Other methods failed also.
>>>>>>>
>>>>>>> Any help will be deeply appreciated!
>>>>>>>
>>>>>>> Best,
>>>>>>> Jon
>>>>>>>
>>>>>> -- 
>>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>> Scott Cain, Ph. D.
>>>>> cain at cshl.edu
>>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>> 216-392-3087
>>>>>> Cold Spring Harbor Laboratory
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                        
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)                      
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



From cjfields at uiuc.edu  Tue Jan 22 04:21:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 21 Jan 2008 22:21:27 -0600
Subject: [Bioperl-l] Problem: Can't call method "features" on an
	undefined value at BIO::DB::GFF.pl
In-Reply-To: <15004412.post@talk.nabble.com>
References: <14971922.post@talk.nabble.com> <1200800204.6069.5.camel@frissell>
	<14978241.post@talk.nabble.com> <1200802084.6069.11.camel@frissell>
	<14982665.post@talk.nabble.com>
	
	<3DBA08D4-5FFC-4C66-8676-756F4A25EAEE@uiuc.edu>
	<15004412.post@talk.nabble.com>
Message-ID: <8B1956B2-1380-4E73-8F14-F79CA5435697@uiuc.edu>

I'm cc'ing this to the gbrowse list just in case Lincoln or Scott have  
an idea.  My guess is it's a bug in the fast loader.  Could you file  
this in bugzilla?

http://bugzilla.open-bio.org/

chris

On Jan 21, 2008, at 1:22 PM, Hang wrote:

>
> Hi, Chris:
>
> Following your suggestion, I added -create flag and the GFF3loader  
> started
> to work. Thanks alot!
> When I load dmel-all-5.4.gff into mysql with -fast, I had the  
> following
> error:
>   Data too long for column 'attribute_value' at c:/../../../mysql.pm  
> line
> 510
> If I don't use -fast, it is OK, except for the annoying slow speed.  
> Do you
> have any suggestion on this?
>
> Best,
> Hang
>
>
>
>
> Chris Fields wrote:
>>
>> It's bp_seqfeature_load.pl (if you have the full bioperl core
>> distribution, it's in script/Bio-SeqFeature/Store).  I had some
>> problems with the fast-loading option but it was likely just my gff
>> formatting; example data loaded just fine.
>>
>> As for the error, you need to use the '-create' flag when  
>> initializing
>> a database (or wiping data from a current one):
>>
>> =============================================================
>> use Bio::DB::SeqFeature::Store;
>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>                                         -dsn     => 'dbi:mysql:test',
>>                                         -user    => 'root',
>>                                         -pass    => 'XXXXX',
>>                                         -write   =>  1
>>                                         -create  => 1);
>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store    =>
>> $db,
>>                                                         -verbose  =>
>> 1);
>> $loader->load(./'dmel-all-r5.4.gff');
>> =============================================================
>>
>> chris
>>
>> On Jan 20, 2008, at 9:25 AM, Scott Cain wrote:
>>
>>> Jon,
>>>
>>> There is a script for loading a SeqFeature database just like the  
>>> GFF
>>> database, though I don't know what it's called off hand (I'm not  
>>> at my
>>> normal computer right now).  Be sure to read the documentation and  
>>> you
>>> will probably want to use the 'fast' option (I don't remember what
>>> it is
>>> called either).
>>>
>>> Scott
>>>
>>>
>>> ----------------------------------------------------------------------
>>> Scott Cain, Ph. D.				 	 cain at cshl.edu
>>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>>> ----------------------------------------------------------------------
>>>
>>>
>>> On Sun, 20 Jan 2008, Hang wrote:
>>>
>>>>
>>>> Hi, Scott,
>>>> I tried to change sequence-region line to "4   FlyBase
>>>> chromosome_arm  1
>>>> 1351857 .  .  .  ID=4;Name=4", it doesn't work. "$!" didn't say
>>>> anything but
>>>> "died at line 12".
>>>>
>>>> So, I went ahead with the Bio::DB::SeqFeature::Store. Here is my
>>>> code to
>>>> load the dmel-all-r5.4.gff(from Flybase) to a test database:
>>>> =============================================================
>>>> use Bio::DB::SeqFeature::Store;
>>>> use Bio::DB::SeqFeature::Store::GFF3Loader;
>>>> my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql',
>>>>                                        -dsn     =>  
>>>> 'dbi:mysql:test',
>>>>                                        -user    => 'root',
>>>>                                        -pass    => 'XXXXX',
>>>>                                        -write   =>  1 );
>>>> my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store
>>>> => $db,
>>>>                                                        -verbose
>>>> => 1);
>>>> $loader->load(./'dmel-all-r5.4.gff');
>>>> =============================================================
>>>> I got bunch of errors like this:
>>>> "DBD::mysql::execute failed: Table 'test.locationlist' doesn't
>>>> exist at
>>>> C:\Biology\perl\site\lib\Bio\DB\SeqFeature\Store\DBI\mysql.pm line
>>>> 1316".
>>>> The line 1316 in mysql.pm looks like this: $sth->execute($name) or
>>>> die
>>>> $sth->errstr;
>>>> I checked the database test after failed loading. There is only one
>>>> table
>>>> created, which call 'meta'. I also tried 'grant all on test to
>>>> XXX at localhost' and used that -user and -pass to load gff, it didn't
>>>> work
>>>> either.
>>>>
>>>> Jon
>>>>
>>>>
>>>> Scott Cain-3 wrote:
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> Well, seeing the error message would be helpful, but my first  
>>>>> guess
>>>>> without is that there are a few things you can try:
>>>>>
>>>>> * removing the "sequence-region" line from the GFF file, adding a
>>>>> line
>>>>> like this:
>>>>>
>>>>> 4   FlyBase  chromosome_arm  1  1351857 .  .  .  ID=4;Name=4
>>>>>
>>>>> and then reloading the database.
>>>>>
>>>>> * Or, you may want to consider using Bio::DB::SeqFeature::Store,
>>>>> since
>>>>> Bio::DB::GFF3 doesn't always behave correctly with complex GFF3
>>>>> (that
>>>>> is, with three levels of features (like gene, mRNA and CDS)).
>>>>>
>>>>> Scott
>>>>>
>>>>> On Sat, 2008-01-19 at 19:49 -0800, Hang wrote:
>>>>>> Hi, Scott,
>>>>>>
>>>>>> After adding die $!, I know something is wrong at line:
>>>>>> "my $segment = $db->segment(-name => '4', -start => 20000, -end  
>>>>>> =>
>>>>>> 25000);"
>>>>>>
>>>>>> my gff file is like this:
>>>>>> ##gff-version 3
>>>>>> ##sequence-region 4 1 1351857
>>>>>> 4	FlyBase	transposable_element	2	611	.	+	.
>>>>>> ID=FBti0062890;Name=ninja-Dsim-
>>>>>> like
>>>>>> {}4829
>>>>>> ;Dbxref=FlyBase_Annotation_IDs:TE62890;derived_cyto_location=-;
>>>>>> 4	repeatmasker_dummy	match	2	347	.	+	.
>>>>>> ID=:1395923_repeatmasker_dummy;Name=1%2C347-AE003845.4-dummy-
>>>>>> RepeatMasker;
>>>>>> 4	repeatmasker_dummy	match_part	2	347	2367	+	.
>>>>>> ID=:5142029_dummy;Name=:5142029;Parent=:
>>>>>> 1395923_repeatmasker_dummy;target_type=so;Target=D83207
>>>>>> 5860 6210 +;
>>>>>> ...
>>>>>> ...
>>>>>> I really got confused. Any further suggestion? Thank you!
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Scott Cain-3 wrote:
>>>>>>>
>>>>>>> Hi Jon,
>>>>>>>
>>>>>>> I think it's funny that you have "or die" on the database
>>>>>>> opening line,
>>>>>>> "or die" on the @features line, but you didn't put one on the
>>>>>>> $segment
>>>>>>> line.  Try adding "or die: $!" to the $segment line to see  
>>>>>>> what it
>>>>>> says,
>>>>>>> also add a 'print $segment' after you create it and before you
>>>>>>> try to
>>>>>>> get the features from it.
>>>>>>>
>>>>>>> Clearly, the problem is that $segment is not defined (that is,
>>>>>>> nothing
>>>>>>> is in it, not that the wrong thing is in it).  The next trick is
>>>>>>> to
>>>>>> find
>>>>>>> out why.  My first guess, without looking at the data set, is
>>>>>>> that the
>>>>>>> arm is not really named '4'.
>>>>>>>
>>>>>>> Scott
>>>>>>>
>>>>>>> On Sat, 2008-01-19 at 10:25 -0800, Hang wrote:
>>>>>>>> Hi, everyone,
>>>>>>>>
>>>>>>>> I met this problem when I was running this script to extract
>>>>>>>> features
>>>>>>>> overlaps with 4:20,000..25,000. It always responds like "Can't
>>>>>>>> call
>>>>>>>> method
>>>>>>>> "features" on an undefined value at BIO::DB::GFF.pl line XX".
>>>>>>>> ==============================================================
>>>>>>>> use Bio::DB::GFF;
>>>>>>>> use Bio::Tools::GFF;
>>>>>>>> my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>>>>>>>>                                       -dsn =>
>>>>>>>> 'dbi:mysql:dmel_gff:localhost',
>>>>>>>>                                       -user => 'XXXX',
>>>>>>>>                                       -pass => 'XXXX') || die
>>>>>> "database
>>>>>>>> open failed";
>>>>>>>>
>>>>>>>> my $segment = $db->segment(-name => '4', -start => 20000, - 
>>>>>>>> end =>
>>>>>> 25000);
>>>>>>>> my @features = $segment->features(-types => ['gene', 'exon',
>>>>>>>> 'intron',
>>>>>>>> 'five_prime_UTR', 'three_prime_UTR', 'CDS']) or die "no
>>>>>>>> features";
>>>>>>>> print(scalar(@features)."\n");
>>>>>>>>
>>>>>>>> = 
>>>>>>>> ===============================================================
>>>>>>>> I am using activeperl 5.8.8 and bioperl 1.5.2 under Win32. I
>>>>>>>> loaded
>>>>>>>> dmel_5.4.gff into mysql database by bulk_load_gff.pl without  
>>>>>>>> any
>>>>>> error.
>>>>>>>> Other methods failed also.
>>>>>>>>
>>>>>>>> Any help will be deeply appreciated!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jon
>>>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>> Scott Cain, Ph. D.
>>>>>> cain at cshl.edu
>>>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>>> 216-392-3087
>>>>>>> Cold Spring Harbor Laboratory
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>> -- 
>>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.
>>>>> cain at cshl.edu
>>>>> GMOD Coordinator (http://www.gmod.org/)
>>>>> 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p14982665.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Problem%3A-Can%27t-call-method-%22features%22-on-an-undefined-value-at-BIO%3A%3ADB%3A%3AGFF.pl-tp14971922p15004412.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From jason at bioperl.org  Wed Jan 23 08:14:06 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 23 Jan 2008 00:14:06 -0800
Subject: [Bioperl-l] [Bioperl-guts-l] [14455]
	bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm: fixed up the
	gene glyph so that it works properly with CDS-only genes
In-Reply-To: <200801222048.m0MKmhiI007977@dev.open-bio.org>
References: <200801222048.m0MKmhiI007977@dev.open-bio.org>
Message-ID: <91659EDD-B102-47C8-BF93-92576C2CF324@bioperl.org>

Lincoln -- Thank you, Thank you for this fix!  This takes care of  
inconsistency problems I was having with GFF3 and GFF2 data.  It  
works so much more beautifully now!

-jason
On Jan 22, 2008, at 12:48 PM, Lincoln Stein wrote:

> Revision: 14455
> Author:   lstein
> Date:     2008-01-22 15:48:42 -0500 (Tue, 22 Jan 2008)
>
> Log Message:
> -----------
> fixed up the gene glyph so that it works properly with CDS-only genes
>
> Modified Paths:
> --------------
>     bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm
>
> Modified: bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm
> ===================================================================
> --- bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm	2008-01-22  
> 00:16:02 UTC (rev 14454)
> +++ bioperl-live/trunk/Bio/Graphics/Glyph/gene.pm	2008-01-22  
> 20:48:42 UTC (rev 14455)
> @@ -44,7 +44,9 @@
>
>  sub bump {
>    my $self = shift;
> -  return 1 if $self->{level} == 0; # top level bumps, other levels  
> don't unless specified in config
> +  return 1
> +    if $self->{level} == 0
> +      && lc $self->feature->primary_tag eq 'gene'; # top level  
> bumps, other levels don't unless specified in config
>    return $self->SUPER::bump;
>  }
>
> @@ -92,12 +94,16 @@
>  sub _subfeat {
>    my $class   = shift;
>    my $feature = shift;
> -  if ($feature->primary_tag eq 'gene') {
> +  if (lc $feature->primary_tag eq 'gene') {
>      my @transcripts;
>      for my $t (qw/mRNA tRNA snRNA snoRNA miRNA ncRNA pseudogene/) {
>        push @transcripts, $feature->get_SeqFeatures($t);
>      }
>      return @transcripts;
> +  } elsif (lc $feature->primary_tag eq 'cds') {
> +    my @parts = $feature->get_SeqFeatures();
> +    return ($feature) if $class->{level} == 0 and !@parts;
> +    return @parts;
>    }
>
>    my @subparts;
>
>
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l



From ste.ghi at libero.it  Thu Jan 24 13:42:49 2008
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Thu, 24 Jan 2008 14:42:49 +0100
Subject: [Bioperl-l] parsing ACE file
Message-ID: 

Dear All,
    dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job?

Any suggestion about how to start is welcome...
Cheers

Stefano




From pmiguel at purdue.edu  Thu Jan 24 19:06:35 2008
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Thu, 24 Jan 2008 14:06:35 -0500
Subject: [Bioperl-l] parsing ACE file
In-Reply-To: 
References: 
Message-ID: <4798E1BB.2020809@purdue.edu>

Stefano Ghignone wrote:
> Dear All,
>     dealing with an assembly .ace file and a list of contigs (from that assembly), how can I extract from the .ace file the read names forming each listed contig? Is there any module doing this job?
>
> Any suggestion about how to start is welcome...
> Cheers
>
> Stefano
>
>   
 perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace

will give you a list of each the contigs followed by the reads in each 
contig, if "acefile.ace" is a phrap ace file.

There is a bioperl module for handling phrap ace file, but I'm not sure 
what its current status is. Last time I looked (probably a couple of 
years ago) it seemed to have been abandoned half-finished.

-- 
Phillip


From golharam at umdnj.edu  Thu Jan 24 19:36:29 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 24 Jan 2008 14:36:29 -0500
Subject: [Bioperl-l] Wiki inconsistency?
Message-ID: <4798E8BD.7030107@umdnj.edu>

Hi,

I haven't used Bioperl in a while but recently started using it.  I was 
using 1.4.0 but see on the website that 1.5.2 has been released.   If I 
click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), 
I see a two versions:

bioperl-1.5.2_102

and

bioperl-1.5.2_100

However, If I click on the Downloads link on the left toolbar, then 
scroll down, I see 1.5.2 Developer Release.  The tar file here points to 
  current_core_unstable.tar.gz.

Is this supposed to be this way?  It seems a bit confusing.  I think it 
might be appropriate to put all the download links in one 
location...just my two cents...

Ryan



From cjfields at uiuc.edu  Thu Jan 24 20:58:25 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 24 Jan 2008 14:58:25 -0600
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <4798E8BD.7030107@umdnj.edu>
References: <4798E8BD.7030107@umdnj.edu>
Message-ID: 

Maybe Sendu can answer more specifically, but I believe the extra  
designation referred to the release candidate (of which bioperl-core  
was the only one with '102').  You definitely want the core package.   
The other ones with '100' are other bioperl-related distributions  
which require the core package but have additional functionality  
(BioSQL-related functions, wrapper modules, etc.).

chris

On Jan 24, 2008, at 1:36 PM, Ryan Golhar wrote:

> Hi,
>
> I haven't used Bioperl in a while but recently started using it.  I  
> was using 1.4.0 but see on the website that 1.5.2 has been  
> released.   If I click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2 
> ), I see a two versions:
>
> bioperl-1.5.2_102
>
> and
>
> bioperl-1.5.2_100
>
> However, If I click on the Downloads link on the left toolbar, then  
> scroll down, I see 1.5.2 Developer Release.  The tar file here  
> points to  current_core_unstable.tar.gz.
>
> Is this supposed to be this way?  It seems a bit confusing.  I think  
> it might be appropriate to put all the download links in one  
> location...just my two cents...
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From florent.angly at gmail.com  Thu Jan 24 22:06:29 2008
From: florent.angly at gmail.com (Florent Angly)
Date: Thu, 24 Jan 2008 14:06:29 -0800
Subject: [Bioperl-l] parsing ACE file
In-Reply-To: <4798E1BB.2020809@purdue.edu>
References: 
	<4798E1BB.2020809@purdue.edu>
Message-ID: <47990BE5.2010005@gmail.com>

That would be the module Bio::Assembly::IO::ace
It works fine as far as I know.
To parse an assembly, use Bio::Assembly::IO: 
http://doc.bioperl.org/bioperl-live/Bio/Assembly/IO.html
Regards,
Florent

Phillip San Miguel wrote:
> Stefano Ghignone wrote:
>> Dear All,
>>     dealing with an assembly .ace file and a list of contigs (from 
>> that assembly), how can I extract from the .ace file the read names 
>> forming each listed contig? Is there any module doing this job?
>>
>> Any suggestion about how to start is welcome...
>> Cheers
>>
>> Stefano
>>
>>   
> perl -ne 'next unless (/^(?:CO)|(?:RD)/);print' acefile.ace
>
> will give you a list of each the contigs followed by the reads in each 
> contig, if "acefile.ace" is a phrap ace file.
>
> There is a bioperl module for handling phrap ace file, but I'm not 
> sure what its current status is. Last time I looked (probably a couple 
> of years ago) it seemed to have been abandoned half-finished.
>



From golharam at umdnj.edu  Thu Jan 24 21:17:14 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 24 Jan 2008 16:17:14 -0500
Subject: [Bioperl-l] GenBank updated sequence not being retrieved
Message-ID: <4799005A.5030204@umdnj.edu>

I'm using Bioperl 1.4 (and tried with 1.5.1).

I'm trying to download GenBank sequence for which I have accession #'s. 
  One of the sequences has been replaced with a newer version.  I'm 
using get_Seq_by_acc, which returns the warning:

-------------------- WARNING ---------------------
MSG: acc (gb|XM_087386) does not exist
---------------------------------------------------

If I check NCBI's website for the sequence, it has indeed been replaced 
by an NM_ sequence.  How can I get BioPerl to retrieve the latest 
version of a sequence?



From johan.nilsson at sh.se  Thu Jan 24 22:33:42 2008
From: johan.nilsson at sh.se (Johan Nilsson)
Date: Thu, 24 Jan 2008 23:33:42 +0100
Subject: [Bioperl-l] Quickest Codon Based MSA?
Message-ID: <47991246.6010106@sh.se>

Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign', 
but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage 
of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson


From e-just at northwestern.edu  Thu Jan 24 23:07:57 2008
From: e-just at northwestern.edu (Eric Just)
Date: Thu, 24 Jan 2008 17:07:57 -0600
Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago
Message-ID: 

Hello everyone,

We have an opening at dictyBase (Northwestern University in Chicago) for a
Bioinformatics Software Engineer.  This job involves writing and maintaining
software for a genome database using Chado/OO-Perl/ Bioperl and many other
state-of-the-art technologies.

For more information please see:
http://dictybase.org/dictybase_jobs.htm

Thanks,
Eric


From bix at sendu.me.uk  Thu Jan 24 23:16:14 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 24 Jan 2008 23:16:14 +0000
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <4798E8BD.7030107@umdnj.edu>
References: <4798E8BD.7030107@umdnj.edu>
Message-ID: <47991C3E.2010908@sendu.me.uk>

Ryan Golhar wrote:
> Hi,
> 
> I haven't used Bioperl in a while but recently started using it.  I was 
> using 1.4.0 but see on the website that 1.5.2 has been released.   If I 
> click on the link for 1.5.2 (http://www.bioperl.org/wiki/Release_1.5.2), 
> I see a two versions:
> 
> bioperl-1.5.2_102
> 
> and
> 
> bioperl-1.5.2_100

Where do you see this older version? I did a search on the page and that 
term isn't found. _100 was the first version of 1.5.2 core to go out. 
There were then 2 minor revisions released, as detailed in the 'Updates' 
section of the page.


> However, If I click on the Downloads link on the left toolbar, then 
> scroll down, I see 1.5.2 Developer Release.  The tar file here points to 
> current_core_unstable.tar.gz.

Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest 
version happens to be. So that people don't need to worry about the 
actual version, they can just have one static bookmark.


> Is this supposed to be this way?  It seems a bit confusing.  I think it 
> might be appropriate to put all the download links in one 
> location...just my two cents...

Well the primary page where all the links are found is the Downloads 
page. The Release_1.5.2 page is specific to 1.5.2 and will remain for 
historic reasons (so at some point there will be 1.5.3 or something and 
the appropriate links on the main Downloads page will be updated to 
that, but if someone specifically wants 1.5.2 they can still find the 
1.5.2 downloads on its own dedicated page).


From jason at bioperl.org  Fri Jan 25 02:17:02 2008
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 24 Jan 2008 18:17:02 -0800
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
References: <47991246.6010106@sh.se>
Message-ID: 

I don't know if it is faster or slower than what you have tried but  
the aa_to_dna_aln translates a protein alignment back to CDS.  You  
can see example code of it in use in the pairwise_kaks script in  
scripts/utilities/pairwise_kaks.PLS

-jason
On Jan 24, 2008, at 2:33 PM, Johan Nilsson wrote:

> Hello,
>
> I have a question which might not necessarily be related to  
> Bioperl, although I do believe the expertise is available here. I  
> have a couple of thousand FASTA files, each containing 20 CDS  
> sequence orthologues of rather high sequence similarity. I would  
> like to create a codon-based multiple sequence alignment for each  
> of these FASTA files (i.e. a nucleotide sequence alignment inferred  
> from alignment of the translated peptide sequences, to assure that  
> no frame shifts will occur). I first tried running Dialign2, which  
> can perform the translation/back-translation in one go, but this  
> turned out to be far too slow. I next tried to build protein  
> alignments using ClustalW and subsequently built the coding region  
> alignment using EMBOSS 'tranalign', but this also was too slow.
>
> Is there any method available which significantly speeds up the  
> codon-preserving alignment??? As I mentioned, the sequences to be  
> aligned are in general very conserved, so any heuristic taking  
> advantage of the low divergence would be very helpful! Also, is  
> there any adjustable parameter in dialign2/dialign-T that might  
> speed up the program when looking at highly similar sequences?
>
> Best regards
> /Johan Nilsson
> _______________________________________________
> Bioperl-l mailing list



From tristan.lefebure at gmail.com  Fri Jan 25 03:07:52 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Thu, 24 Jan 2008 22:07:52 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree, and how to combine trees
Message-ID: <200801242207.52991.tristan.lefebure@gmail.com>

Hi,

I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would 
like to merge several "one leaf taxonomic trees" into a taxonomic tree with 
several leafs. For example:

#####BEGINNING#####
#! /usr/bin/perl

use strict;
use warnings;
use Bio::DB::Taxonomy;
use Bio::TreeIO;

# The taxonomic database
# You might want to switch to a different flatfile or to Entrez 
my $dbh = new Bio::DB::Taxonomy(-source   => 'flatfile',
                                  -directory=> '/tmp',  
                                  -nodesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/nodes.dmp', 
                                  -namesfile=> '/home/tristan/Documents/db/NCBI/taxonomy/names.dmp');

# Fetch 4 taxa for the example
my $tax_decapoda =  $dbh->get_taxon(-name => 'Decapoda');
my $tax_heteroptera =  $dbh->get_taxon(-name => 'Heteroptera');
my $tax_coleoptera =  $dbh->get_taxon(-name => 'Coleoptera');
my $tax_copepoda =  $dbh->get_taxon(-name => 'Copepoda');

# Transform to tree objects
my $decapoda_tree = new Bio::Tree::Tree(-node => $tax_decapoda);
my $heteroptera_tree = new Bio::Tree::Tree(-node => $tax_heteroptera);
my $coleoptera_tree = new Bio::Tree::Tree(-node => $tax_coleoptera);
my $copepoda_tree = new Bio::Tree::Tree(-node => $tax_copepoda);

# Reduce the number of nodes to the following ranks
my @ranks = qw(kingdom phylum subphylum superclass class subclass superorder 
order family);

$decapoda_tree->splice(-keep_rank => \@ranks);
$heteroptera_tree->splice(-keep_rank => \@ranks);
$coleoptera_tree->splice(-keep_rank => \@ranks);
$copepoda_tree->splice(-keep_rank => \@ranks);

# Print the trees
my $out = new Bio::TreeIO('-format' => 'newick',
                                   '-file'   => ">four.tree");
$out->write_tree($decapoda_tree);
$out->write_tree($heteroptera_tree);
$out->write_tree($coleoptera_tree);
$out->write_tree($copepoda_tree);

#####END#######

This gives the following "trees":
(((((7524)33340)50557)6960)6656)33208;
(((((7041)33340)50557)6960)6656)33208;
((((((6683)6682)72041)6681)6657)6656)33208;
((((6830)72037)6657)6656)33208;

They are really special trees, as they contain only one leaf. I would like to 
combine them and remove the 'unused' nodes to obtain something like that:

((7524,7041)33340,(6683,6830)6657)6656;

or even better:

((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda;

Any suggestions?

Thanks!

-Tristan



From anjan.purkayastha at gmail.com  Thu Jan 24 23:32:20 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Thu, 24 Jan 2008 18:32:20 -0500
Subject: [Bioperl-l] Question from a bioperl newbie
Message-ID: 

hi,
i recently installed bioperl on my mac-machine.
tried to use it in a simple script with a "use Bio::Perl" command. however,
i get an error message "Can't locate Bio/Perl.pm in @INC".
the BioPerl folder is in my desktop. so i tried use: use lib
"/Users/anjan/Desktop/bioperl-1.5.2_102/Bio";
This time it returned me another error: Undefined subroutine
&main::get_sequence.

so, when BioPerl is installed, which directory does it reside in.( it's not
present in the .cpan/build directory.)

appreciate your prompt reply.

anjan

-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From bosborne11 at verizon.net  Fri Jan 25 04:04:50 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 24 Jan 2008 23:04:50 -0500
Subject: [Bioperl-l] Question from a bioperl newbie
In-Reply-To: 
References: 
Message-ID: <3B13E81A-66E1-418A-8915-9E877C2B751D@verizon.net>

Anjan,

use lib "/Users/anjan/Desktop/bioperl-1.5.2_102/";

Brian O.


On Jan 24, 2008, at 6:32 PM, ANJAN PURKAYASTHA wrote:

> hi,
> i recently installed bioperl on my mac-machine.
> tried to use it in a simple script with a "use Bio::Perl" command.  
> however,
> i get an error message "Can't locate Bio/Perl.pm in @INC".
> the BioPerl folder is in my desktop. so i tried use: use lib
> "/Users/anjan/Desktop/bioperl-1.5.2_102/Bio";
> This time it returned me another error: Undefined subroutine
> &main::get_sequence.
>
> so, when BioPerl is installed, which directory does it reside in. 
> ( it's not
> present in the .cpan/build directory.)
>
> appreciate your prompt reply.
>
> anjan
>
> -- 
> ANJAN PURKAYASTHA, PhD.
> Senior Computational Biologist
> ==========================
>
> 1101 King Street, Suite 310,
> Alexandria, VA 22314.
> 703.518.8040 (office)
> 703.740.6939 (mobile)
>
> email:
> anjan at vbi.vt.edu;
> anjan.purkayastha at gmail.com
>
> http://www.vbi.vt.edu
>
> ==========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From n.haigh at sheffield.ac.uk  Fri Jan 25 07:32:10 2008
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Fri, 25 Jan 2008 07:32:10 +0000
Subject: [Bioperl-l] Wiki inconsistency?
In-Reply-To: <47991C3E.2010908@sendu.me.uk>
References: <4798E8BD.7030107@umdnj.edu> <47991C3E.2010908@sendu.me.uk>
Message-ID: <4799907A.9060301@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sendu,

Have you thought about using a template for the latest stable release and the latest developer release? That way, any article/link that always needs
to point to the latest version simply has to include the correct template? So once a new release is made, you simply update the one template, and
changes automatically propagate through the wiki - might save some wiki admin each time there's a new release. You could get more intricate, and use a
template to show the latest version of any particular release series so you could do something like:

{{latest release|series=1.5.x|full=y}}
and
{{latest release|series=1.4.x|full=y}}

or even:

{{latest release|series=stable|full=y}}
and
{{latest release|series=dev|full=y}}

these templates could return 1.5.2_102 if the "full" param is set to something or simply 1.5.2 if the "full" param is missing.

Just a thought.
Nath


Sendu Bala wrote:
> Ryan Golhar wrote:
>> Hi,
>>
>> I haven't used Bioperl in a while but recently started using it.  I
>> was using 1.4.0 but see on the website that 1.5.2 has been released.  
>> If I click on the link for 1.5.2
>> (http://www.bioperl.org/wiki/Release_1.5.2), I see a two versions:
>>
>> bioperl-1.5.2_102
>>
>> and
>>
>> bioperl-1.5.2_100
> 
> Where do you see this older version? I did a search on the page and that
> term isn't found. _100 was the first version of 1.5.2 core to go out.
> There were then 2 minor revisions released, as detailed in the 'Updates'
> section of the page.
> 
> 
>> However, If I click on the Downloads link on the left toolbar, then
>> scroll down, I see 1.5.2 Developer Release.  The tar file here points
>> to current_core_unstable.tar.gz.
> 
> Yes, that is just an alias to bioperl-1.5.2_102, ie. whatever the latest
> version happens to be. So that people don't need to worry about the
> actual version, they can just have one static bookmark.
> 
> 
>> Is this supposed to be this way?  It seems a bit confusing.  I think
>> it might be appropriate to put all the download links in one
>> location...just my two cents...
> 
> Well the primary page where all the links are found is the Downloads
> page. The Release_1.5.2 page is specific to 1.5.2 and will remain for
> historic reasons (so at some point there will be 1.5.3 or something and
> the appropriate links on the main Downloads page will be updated to
> that, but if someone specifically wants 1.5.2 they can still find the
> 1.5.2 downloads on its own dedicated page).
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHmZB69gTv6QYzVL4RAnRpAJwOyWjZXzD0UJBNFNP8H1Hrn4c66ACfRyzA
NsJEZydsG+aMzNltrBw+Nx4=
=kHt0
-----END PGP SIGNATURE-----


From derek.fairley at belfasttrust.hscni.net  Fri Jan 25 08:31:28 2008
From: derek.fairley at belfasttrust.hscni.net (Fairley, Derek)
Date: Fri, 25 Jan 2008 08:31:28 -0000
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
Message-ID: 

Johan,

There is currently no Bioperl-run wrapper for this program, but you
might want to have a look at Codon Align 2.0 as well:
http://homepage.mac.com/barryghall/CodonAlign.html

Derek

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Johan Nilsson
Sent: 24 January 2008 22:34
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Quickest Codon Based MSA?

Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign',

but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage

of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From ewijaya at gmail.com  Fri Jan 25 09:26:05 2008
From: ewijaya at gmail.com (Edward Wijaya)
Date: Fri, 25 Jan 2008 17:26:05 +0800
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
Message-ID: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>

Dear Experts,

Suppose I have the following list of gene names and Ensemble Ids.

RBL1	ENSG00000080839
RB1	ENSG00000139687
CDC2	ENSG00000170312
CDC25A	ENSG00000164045
CCNA2	ENSG00000145386
E2F3	ENSG00000112242
E2F2	ENSG00000007968
CDK2	ENSG00000123374
...etc...

Is there a way to extract the gene sequence from those list?
And then output them in FASTA format.

- Edward


From bix at sendu.me.uk  Fri Jan 25 10:55:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 25 Jan 2008 10:55:50 +0000
Subject: [Bioperl-l] Quickest Codon Based MSA?
In-Reply-To: <47991246.6010106@sh.se>
References: <47991246.6010106@sh.se>
Message-ID: <4799C036.5060404@sendu.me.uk>

Johan Nilsson wrote:
> Hello,
> 
> I have a question which might not necessarily be related to Bioperl, 
> although I do believe the expertise is available here. I have a couple 
> of thousand FASTA files, each containing 20 CDS sequence orthologues of 
> rather high sequence similarity. I would like to create a codon-based 
> multiple sequence alignment for each of these FASTA files (i.e. a 
> nucleotide sequence alignment inferred from alignment of the translated 
> peptide sequences, to assure that no frame shifts will occur). I first 
> tried running Dialign2, which can perform the 
> translation/back-translation in one go, but this turned out to be far 
> too slow. I next tried to build protein alignments using ClustalW and 
> subsequently built the coding region alignment using EMBOSS 'tranalign', 
> but this also was too slow.
> 
> Is there any method available which significantly speeds up the 
> codon-preserving alignment??? As I mentioned, the sequences to be 
> aligned are in general very conserved, so any heuristic taking advantage 
> of the low divergence would be very helpful! Also, is there any 
> adjustable parameter in dialign2/dialign-T that might speed up the 
> program when looking at highly similar sequences?

Do you know which is the slow part? For example, when using ClustalW, 
are the alignments slower than the creating the codon alignment from the 
protein?

If ClustalW is the problem, you can try using other alignment programs 
famous for their speed, such as Muscle. If it's the protein->codon bit 
that's slow, try using other programs to do that, like Pal2Nal or the 
BioPerl method.


From David.Messina at sbc.su.se  Fri Jan 25 11:35:16 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 25 Jan 2008 12:35:16 +0100
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
Message-ID: <628aabb70801250335l2a2754efn3e73e44a9dae6a35@mail.gmail.com>

Hi Edward,

I don't think there's a direct BioPerl interface to Ensembl, but BioMart at
Ensembl itself will get you sequences (and lots of other things if you want)
given a list of Ensembl IDs.

http://www.ensembl.org/biomart/martview

Note that as of this writing, the Ensembl BioMart server appears to be down
temporarily.

If you want to be able to get Ensembl sequences from a program, there's the
Ensembl API:

http://www.ensembl.org/info/using/api/core/core_tutorial.html



Dave


From bix at sendu.me.uk  Fri Jan 25 11:07:42 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 25 Jan 2008 11:07:42 +0000
Subject: [Bioperl-l] Bio::DB::Taxonomy, Bio::Tree,
 and how to combine trees
In-Reply-To: <200801242207.52991.tristan.lefebure@gmail.com>
References: <200801242207.52991.tristan.lefebure@gmail.com>
Message-ID: <4799C2FE.8080700@sendu.me.uk>

Tristan Lefebure wrote:
> Hi,
> 
> I'm just starting to play with Bio::DB::Taxonomy and Bio::Tree, and I would 
> like to merge several "one leaf taxonomic trees" into a taxonomic tree with 
> several leafs.
[...]
> or even better:
> 
> ((Hemiptera,Coleoptera)Neoptera,(Decapoda,Copepoda)Crustacea)Arthropoda;

The BioPerl script taxonomy2tree.pl generates:

(((Decapoda,Copepoda)Crustacea,(Heteroptera,Coleoptera)Neoptera)Pancrustacea)"cellular 
organisms";

I think you can modify it similar to your own script to only output the 
classes you're interested in.



http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/taxa/taxonomy2tree.PLS


From bosborne11 at verizon.net  Fri Jan 25 13:53:36 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 25 Jan 2008 08:53:36 -0500
Subject: [Bioperl-l] BioPerl module to extract sequence from gene names
In-Reply-To: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
References: <3521d3670801250126s1f6f7b70h49f093fd060f10cf@mail.gmail.com>
Message-ID: <9CE20DF3-ED5F-4432-A191-4123896E5815@verizon.net>

Edward,

Various approaches are discussed here:

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

Since you have ENSEMBL ids I'd think that would be the way to go.


Brian O.

On Jan 25, 2008, at 4:26 AM, Edward Wijaya wrote:

> Dear Experts,
>
> Suppose I have the following list of gene names and Ensemble Ids.
>
> RBL1	ENSG00000080839
> RB1	ENSG00000139687
> CDC2	ENSG00000170312
> CDC25A	ENSG00000164045
> CCNA2	ENSG00000145386
> E2F3	ENSG00000112242
> E2F2	ENSG00000007968
> CDK2	ENSG00000123374
> ...etc...
>
> Is there a way to extract the gene sequence from those list?
> And then output them in FASTA format.
>
> - Edward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From snoze.pa at gmail.com  Fri Jan 25 23:30:56 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Fri, 25 Jan 2008 17:30:56 -0600
Subject: [Bioperl-l] bioperl DB error
Message-ID: <10f848910801251530j6eacfcb0x81780ae312cf19c5@mail.gmail.com>

Dear Users,
 I am using bioperl/iosql and trying to install ncbi taxonomy. But I am
getting following error message.
any help? thanks in advance

perl load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser
root
Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568.


From snoze.pa at gmail.com  Fri Jan 25 23:49:28 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Fri, 25 Jan 2008 17:49:28 -0600
Subject: [Bioperl-l] bioseqDB error
Message-ID: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>

Hi Anyone know why i am getting this error message!!

Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568


From wkath83 at vbi.vt.edu  Thu Jan 24 18:19:06 2008
From: wkath83 at vbi.vt.edu (Katherine Wendelsdorf)
Date: Thu, 24 Jan 2008 13:19:06 -0500 (EST)
Subject: [Bioperl-l] bioperl on mac
Message-ID: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>

Dear one who knows,

I have a macbook with Leopard OSX and I am having trouble running scripts
that call for bioperl modules.

Here is my history: Using Fink I installed bioperl-pm586 version 1.5.2-4
and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl-pm586 in
to the command line I get nothing. Spotlight says that the path is
/sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.

1. I tried to run test2.pl script that was literally copied and pasted
from the HOWTO manual, but it wouldnt run. The two attached docs are the
script I tried to run and the output (which is nonexistant). I read
something that said to "go in to" Bioperl to execute a command. I could
not enter the bioperl directory when it was in the sw/shared directory so
I copied the bioperl folder to the Desktop just so I could try executing
the script inside bioperl. Where am I going wrong here?

Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
somewhere else on my computer? Shoudl they be in the same directory as
perl (usr/bin/perl)?

2. How do I know what modules are included in the bioperl-pm586 I
downloaded? Specifically I want to use Bio::SeqIO.

3. What is the best way to download/install new modules as I need them?


Any answers you coudl give me for any of these questions would be greatly
appreciated!

Thank you so much, kind volunteer!
-Kate
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL: 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test2.pl
URL: 

From bosborne11 at verizon.net  Sat Jan 26 16:14:13 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Sat, 26 Jan 2008 11:14:13 -0500
Subject: [Bioperl-l] bioperl on mac
In-Reply-To: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
Message-ID: 

Katherine,

Perl keeps the addresses of all the module directories in its @INC  
array. What do you see when you do:

perl -e 'print @INC'

?

If '/sw/share/bioperl-pm586' is not in @INC then you need to put it  
there, perhaps by adding something like:

setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586

to the .tcshrc file in your home directory (if you use tcsh that is,  
most use bash, .bashrc, and 'set' these days).

You asked some other questions, the general answer is that all the  
modules you'll need are in the 2 packages you've installed, and you  
don't need to move them from /sw.


Brian O.


On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote:

> Dear one who knows,
>
> I have a macbook with Leopard OSX and I am having trouble running  
> scripts
> that call for bioperl modules.
>
> Here is my history: Using Fink I installed bioperl-pm586 version  
> 1.5.2-4
> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- 
> pm586 in
> to the command line I get nothing. Spotlight says that the path is
> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.
>
> 1. I tried to run test2.pl script that was literally copied and pasted
> from the HOWTO manual, but it wouldnt run. The two attached docs are  
> the
> script I tried to run and the output (which is nonexistant). I read
> something that said to "go in to" Bioperl to execute a command. I  
> could
> not enter the bioperl directory when it was in the sw/shared  
> directory so
> I copied the bioperl folder to the Desktop just so I could try  
> executing
> the script inside bioperl. Where am I going wrong here?
>
> Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
> somewhere else on my computer? Shoudl they be in the same directory as
> perl (usr/bin/perl)?
>
> 2. How do I know what modules are included in the bioperl-pm586 I
> downloaded? Specifically I want to use Bio::SeqIO.
>
> 3. What is the best way to download/install new modules as I need  
> them?
>
>
> Any answers you coudl give me for any of these questions would be  
> greatly
> appreciated!
>
> Thank you so much, kind volunteer!
> - 
> Kate 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason at bioperl.org  Sat Jan 26 20:30:11 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 26 Jan 2008 12:30:11 -0800
Subject: [Bioperl-l] bioperl on mac
In-Reply-To: 
References: <49708.198.82.30.57.1201198746.squirrel@webmail.vbi.vt.edu>
	
Message-ID: 

Usually this is done by fink by adding a line to your .tcshrc (if you  
are running that shell) or .bash_profile or .bashrc.

On my machine I have this at the top of my .bash_profile file:
test -r /sw/bin/init.sh && . /sw/bin/init.sh

if that is not there you need to add it to insure that all the fink  
tools are setup properly.

On Jan 26, 2008, at 8:14 AM, Brian Osborne wrote:

> Katherine,
>
> Perl keeps the addresses of all the module directories in its @INC  
> array. What do you see when you do:
>
> perl -e 'print @INC'
>
> ?
>
> If '/sw/share/bioperl-pm586' is not in @INC then you need to put it  
> there, perhaps by adding something like:
>
> setenv PERL5LIB ${PERL5LIB}:/sw/share/bioperl-pm586
>
> to the .tcshrc file in your home directory (if you use tcsh that  
> is, most use bash, .bashrc, and 'set' these days).
>
> You asked some other questions, the general answer is that all the  
> modules you'll need are in the 2 packages you've installed, and you  
> don't need to move them from /sw.
>
>
> Brian O.
>
>
> On Jan 24, 2008, at 1:19 PM, Katherine Wendelsdorf wrote:
>
>> Dear one who knows,
>>
>> I have a macbook with Leopard OSX and I am having trouble running  
>> scripts
>> that call for bioperl modules.
>>
>> Here is my history: Using Fink I installed bioperl-pm586 version  
>> 1.5.2-4
>> and bioperl-run-pm586 version 1.5.2-1. when I type >which bioperl- 
>> pm586 in
>> to the command line I get nothing. Spotlight says that the path is
>> /sw/share/bioperl-pm586. The same goes for bioperl-run-pm586.
>>
>> 1. I tried to run test2.pl script that was literally copied and  
>> pasted
>> from the HOWTO manual, but it wouldnt run. The two attached docs  
>> are the
>> script I tried to run and the output (which is nonexistant). I read
>> something that said to "go in to" Bioperl to execute a command. I  
>> could
>> not enter the bioperl directory when it was in the sw/shared  
>> directory so
>> I copied the bioperl folder to the Desktop just so I could try  
>> executing
>> the script inside bioperl. Where am I going wrong here?
>>
>> Should I place these folders (bioperl-pm586 and bioperl-run-pm586)
>> somewhere else on my computer? Shoudl they be in the same  
>> directory as
>> perl (usr/bin/perl)?
>>
>> 2. How do I know what modules are included in the bioperl-pm586 I
>> downloaded? Specifically I want to use Bio::SeqIO.
>>
>> 3. What is the best way to download/install new modules as I need  
>> them?
>>
>>
>> Any answers you coudl give me for any of these questions would be  
>> greatly
>> appreciated!
>>
>> Thank you so much, kind volunteer!
>> - 
>> Kate_____________________________________________ 
>> __
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason at bioperl.org  Sun Jan 27 00:14:45 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 26 Jan 2008 16:14:45 -0800
Subject: [Bioperl-l] a question on "move_id_to_bootstrap" usage
In-Reply-To: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com>
References: <67386e470801231357k11938664wcf0d6c9d9bed8e7b@mail.gmail.com>
Message-ID: <8273f6c20801261614p312886d5x562593aa0cde60da@mail.gmail.com>

I'm not sure why you still have the __DATA__ block if you are reading data
in from a file or are you trying to send an example of the code but forgot
to specify a different input point?

If you are reading from a file that looks like the tree in the __DATA__
block you notice that the bootstrap info is encoded as the branch_length,
NOT the id - the move_id_to_bootstrap only moves the ID to the BOOTSTRAP.
you'll have to write a custom routine or just run a simple loop on your tree
to move the data to the bootstrap - it would look just the
move_id_to_bootstrap except you'd use branch_length instead of id to get the
data that you want to set in the bootstrap.  I leave it as an exercise for
the reader, but if you can't figure it out let us know.


In the future please ask your questions on the mailing list as I don't have
much time to answer questions individually when someone else can help.

-jason

On Jan 23, 2008 1:57 PM, Anand  wrote:

> HI Jason,
>
> Thanks a lot. I followed your suggestion and updated both the modules.
>
> I followed the code example on http://www.bioperl.org/wiki/HOWTO:Trees and
> tried to extract bootstrap values for my tree (which is output after
> seqboot, protdist, fitch and consense)
>
> When I try running my script, I am not able to print the bootstrap
> values...and it doesn't throw any error messages. Am I missing something?
>
> ====START of Code====
> #!/usr/bin/perl -w
> use strict;
> use lib "/home/anand/myperlmodules/lib/perl5/";
> use Bio::TreeIO;
> # $usage: $0 
>
> my $infile = shift;
>
> my $treeio = Bio::TreeIO->new(-format => 'newick',
>                          -file => $infile,
>                          -internal_node_id => 'bootstrap',
>                          );
>
> while( my $tree = $treeio->next_tree ) {
>    for my $node ( $tree->get_nodes ) {
>        printf "id: %s bootstrap: %s\n", $node->id || '', $node->bootstrap
> || '', "\n";
>    }
> }
> __END__
> ((5815_1:100.0,(((5815_5:100.0,5815_7:100.0):100.0,5815_6:100.0):97.0
> ,5815_8:100.0):
> 98.0,5815_4:100.0,5815_2:100.0):100.0,5815_3:100.0);
> ====END of Code====
>
> Thanks in advance for your time and help,
>
> Anand
>
> PS: Just to preserve formatting, I have attached the consense_output_file
>
> On Jan 22, 2008 8:02 AM, Jason Stajich  wrote:
>
> > I suspect you may want to update everything in Bio/TreeIO and Bio/
> > Tree to be safe, I'm not exactly sure what was changed - you can look
> > at the commit logs to see what else changed at the time - http://
> > code.open-bio.org/.   You can also use that same server to grab a
> > fresh checkout of what is the current state of the code base.
> >
> > -jason
> > On Jan 22, 2008, at 12:59 AM, Anand wrote:
> >
> > > Hi Jason
> > >
> > > I have a question on the method "move_id_to_bootstrap". From this
> > > post:
> > > http://portal.open-bio.org/pipermail/bioperl-guts-l/2007-May/
> > > 025718.html
> > >
> > > it looks like it has been added very recently. As luck would have
> > > it, the
> > > TreeFunctionsI.pm in my bioperl installation is missing that method.
> > >
> > > My question: What is the best method to update TreeFunctionsI.pm so
> > > that it
> > > can have the "move_id_to_bootstrap" method? Does it have other update
> > > dependencies.
> > >
> > > Thanks in advance for your help and time,
> > >
> > > Anand
> >
> >
>



-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From hlapp at duke.edu  Mon Jan 28 05:27:34 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 28 Jan 2008 00:27:34 -0500
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
References: <4795292E.4030401@sdsc.edu>
Message-ID: 

Some folks may remember that CIPRES (http://www.phylo.org) released  
their portal with access to remote execution of several phylogenetic  
tree reconstruction programs in spring last year.

It took a while but they have now also built a really nice REST-based  
API that makes the service fully programmable instead of screen- 
scraping 5 pages:

http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)

It should be relatively straightforward to build the equivalent of  
RemoteBlast on top of this. Would anyone be keen to take this on?

	-hilmar

P.S. Sorry for the cross-posting - I thought this is relevant to both  
communities. When responding in a project-specific way please make  
sure you remove the list that is no longer pertinent.


Begin forwarded message:

> From: Lucie Chan 
> Date: January 21, 2008 6:22:22 PM EST
> To: Hilmar Lapp 
> Cc: Mark Miller , Rutger Vos ,  
> Terri Liebowitz , Paul Hoover ,  
> mtholder at ku.edu
> Subject: Re: REST APIs for Cipres Web Portal
> Reply-To: lcchan at sdsc.edu
>
> Hilmar, et al.,
>
> I just released the first version of our REST Web Services API for  
> job submission, and job status query, and
> job result file retrieval. I'd like to get some feedbacks (issues,  
> problems, improvements, suggestions, etc) from you. For  
> documentation on how to access the services, check it out at:
>
> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
> API" below the "CIPRES PORTAL" banner.
>
> Lucie
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================





From cjfields at uiuc.edu  Mon Jan 28 06:04:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 00:04:46 -0600
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: 
References: <4795292E.4030401@sdsc.edu>
	
Message-ID: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>

We can certainly add it to the to-do list; just need to sort out the  
details (how often to allow posts, etc).  I guess we would want this  
in the Bio::Tools::Run namespace, same as RemoteBlast?

chris

On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:

> Some folks may remember that CIPRES (http://www.phylo.org) released  
> their portal with access to remote execution of several phylogenetic  
> tree reconstruction programs in spring last year.
>
> It took a while but they have now also built a really nice REST- 
> based API that makes the service fully programmable instead of  
> screen-scraping 5 pages:
>
> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>
> It should be relatively straightforward to build the equivalent of  
> RemoteBlast on top of this. Would anyone be keen to take this on?
>
> 	-hilmar
>
> P.S. Sorry for the cross-posting - I thought this is relevant to  
> both communities. When responding in a project-specific way please  
> make sure you remove the list that is no longer pertinent.
>
>
> Begin forwarded message:
>
>> From: Lucie Chan 
>> Date: January 21, 2008 6:22:22 PM EST
>> To: Hilmar Lapp 
>> Cc: Mark Miller , Rutger Vos ,  
>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu
>> Subject: Re: REST APIs for Cipres Web Portal
>> Reply-To: lcchan at sdsc.edu
>>
>> Hilmar, et al.,
>>
>> I just released the first version of our REST Web Services API for  
>> job submission, and job status query, and
>> job result file retrieval. I'd like to get some feedbacks (issues,  
>> problems, improvements, suggestions, etc) from you. For  
>> documentation on how to access the services, check it out at:
>>
>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>> API" below the "CIPRES PORTAL" banner.
>>
>> Lucie
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From hlapp at duke.edu  Mon Jan 28 13:42:39 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 28 Jan 2008 08:42:39 -0500
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: <7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
References: <4795292E.4030401@sdsc.edu>
	
	<7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
Message-ID: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>

Yep that's what I was thinking.

BTW the API needs multipart/form-data encoding for input (due to file  
upload); I'm assuming that that's supported well in LWP but if anyone  
knows where to start digging for that the pointer would be appreciated.

	-hilmar

On Jan 28, 2008, at 1:04 AM, Chris Fields wrote:

> We can certainly add it to the to-do list; just need to sort out  
> the details (how often to allow posts, etc).  I guess we would want  
> this in the Bio::Tools::Run namespace, same as RemoteBlast?
>
> chris
>
> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:
>
>> Some folks may remember that CIPRES (http://www.phylo.org)  
>> released their portal with access to remote execution of several  
>> phylogenetic tree reconstruction programs in spring last year.
>>
>> It took a while but they have now also built a really nice REST- 
>> based API that makes the service fully programmable instead of  
>> screen-scraping 5 pages:
>>
>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>>
>> It should be relatively straightforward to build the equivalent of  
>> RemoteBlast on top of this. Would anyone be keen to take this on?
>>
>> 	-hilmar
>>
>> P.S. Sorry for the cross-posting - I thought this is relevant to  
>> both communities. When responding in a project-specific way please  
>> make sure you remove the list that is no longer pertinent.
>>
>>
>> Begin forwarded message:
>>
>>> From: Lucie Chan 
>>> Date: January 21, 2008 6:22:22 PM EST
>>> To: Hilmar Lapp 
>>> Cc: Mark Miller , Rutger Vos ,  
>>> Terri Liebowitz , Paul Hoover ,  
>>> mtholder at ku.edu
>>> Subject: Re: REST APIs for Cipres Web Portal
>>> Reply-To: lcchan at sdsc.edu
>>>
>>> Hilmar, et al.,
>>>
>>> I just released the first version of our REST Web Services API  
>>> for job submission, and job status query, and
>>> job result file retrieval. I'd like to get some feedbacks  
>>> (issues, problems, improvements, suggestions, etc) from you. For  
>>> documentation on how to access the services, check it out at:
>>>
>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>>> API" below the "CIPRES PORTAL" banner.
>>>
>>> Lucie
>>>
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================





From cjfields at uiuc.edu  Mon Jan 28 13:50:08 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 07:50:08 -0600
Subject: [Bioperl-l] Fwd: REST APIs for Cipres Web Portal
In-Reply-To: <2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>
References: <4795292E.4030401@sdsc.edu>
	
	<7055915B-4080-4D71-AEAD-306668339C26@uiuc.edu>
	<2D5B60AB-6B0D-47AB-9916-703D36163A9C@duke.edu>
Message-ID: 

Googled it.

 From http://www.issociate.de/board/post/258535/LWP_-_multipart/form-data_file_upload_from_scalar_rather_than_local_file.html 
  :

my $ua = new LWP::UserAgent;
$response=$ua->request(POST $URL,
Content_Type => 'multipart/form-data',
Content => [ $PARAM => [undef,$FILENAME, Content => $CONTENTS ] ]);

Where $PARAM is the name of the parameter, $FILENAME is what you want
to call the file, and $CONTENTS is a scalar holding the contents of the
file.

Could probably use HTTP::Request in there, but whatever works.

chris

On Jan 28, 2008, at 7:42 AM, Hilmar Lapp wrote:

> Yep that's what I was thinking.
>
> BTW the API needs multipart/form-data encoding for input (due to  
> file upload); I'm assuming that that's supported well in LWP but if  
> anyone knows where to start digging for that the pointer would be  
> appreciated.
>
> 	-hilmar
>
> On Jan 28, 2008, at 1:04 AM, Chris Fields wrote:
>
>> We can certainly add it to the to-do list; just need to sort out  
>> the details (how often to allow posts, etc).  I guess we would want  
>> this in the Bio::Tools::Run namespace, same as RemoteBlast?
>>
>> chris
>>
>> On Jan 27, 2008, at 11:27 PM, Hilmar Lapp wrote:
>>
>>> Some folks may remember that CIPRES (http://www.phylo.org)  
>>> released their portal with access to remote execution of several  
>>> phylogenetic tree reconstruction programs in spring last year.
>>>
>>> It took a while but they have now also built a really nice REST- 
>>> based API that makes the service fully programmable instead of  
>>> screen-scraping 5 pages:
>>>
>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do (click on REST API)
>>>
>>> It should be relatively straightforward to build the equivalent of  
>>> RemoteBlast on top of this. Would anyone be keen to take this on?
>>>
>>> 	-hilmar
>>>
>>> P.S. Sorry for the cross-posting - I thought this is relevant to  
>>> both communities. When responding in a project-specific way please  
>>> make sure you remove the list that is no longer pertinent.
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: Lucie Chan 
>>>> Date: January 21, 2008 6:22:22 PM EST
>>>> To: Hilmar Lapp 
>>>> Cc: Mark Miller , Rutger Vos ,  
>>>> Terri Liebowitz , Paul Hoover , mtholder at ku.edu
>>>> Subject: Re: REST APIs for Cipres Web Portal
>>>> Reply-To: lcchan at sdsc.edu
>>>>
>>>> Hilmar, et al.,
>>>>
>>>> I just released the first version of our REST Web Services API  
>>>> for job submission, and job status query, and
>>>> job result file retrieval. I'd like to get some feedbacks  
>>>> (issues, problems, improvements, suggestions, etc) from you. For  
>>>> documentation on how to access the services, check it out at:
>>>>
>>>> http://8ball.sdsc.edu:8888/cipres-web/Home.do and click on "REST  
>>>> API" below the "CIPRES PORTAL" banner.
>>>>
>>>> Lucie
>>>>
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>>> ===========================================================
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From shandar at nibio.go.jp  Sun Jan 27 06:50:40 2008
From: shandar at nibio.go.jp (Shandar Ahmad)
Date: Sun, 27 Jan 2008 15:50:40 +0900
Subject: [Bioperl-l] PRIB 2008
Message-ID: <1201416640.31793.7.camel@boe>

******* Our apologies if you received multiple copies ***********
If you wish not to receive PRIB 2008 related emails, please write to
Madhu Chetty 
and CC to me at shandar at nibio.go.jp
******************************************************************



PRELIMINARY CALL FOR PAPERS AND INVITED SESSIONS

********************************************************************************************
Third IAPR International Conference on Pattern Recognition in 
Bioinformatics (PRIB 2008)
October 15 ? 17, 2008
Melbourne, Australia

http://www.infotech.monash.edu.au/prib08
********************************************************************************************

PRIB 2008 is aimed at bringing together top researchers, practitioners, 
and students from around the world to discuss the applications of 
pattern recognition methods in the field of bioinformatics to solve 
problems in life sciences. Pattern recognition techniques of interest 
include: statistical, syntactic, and structural approaches, Bayesian, 
hidden Markov and graphical models, neural networks, fuzzy and genetic 
algorithms, data mining, and their hybrids. Papers in areas of (but not 
limited to) bio-sequence analysis, gene and protein expression
analysis, 
structure prediction, protein folding, docking, metabolic pathway 
analysis and regulatory networks, system biology, drug design, and 
bioimaging, are solicited for presentation at the conference.

All papers will be peer reviewed and accepted papers will be published 
in the conference proceedings as an edited volume in Lecture Notes in 
Bioinformatics by Springer. Submission of papers will be electronic and 
through the conference website. Proposals for special sessions and 
tutorials at the conference are also invited in all related areas of 
research. Authors of selected papers presented at the conference will 
also be invited for publication in Special Issues of reputed journals.

Location:
Melbourne is a sophisticated city in the south-east corner of mainland 
Australia. It is known for its attractive site seeing places, great 
events, passion for food and wine and fabulous scenery. Boasting as a 
style-setter, Melbourne is home to continuous program of festivals, art 
exhibitions and musical extravaganzas. Warning: you might never want to 
go home.

For latest information on PRIB 2008, visit the conference web site:
http://www.infotech.monash.edu.au/prib08

or email the secretariat at prib2008.melb at infotech.monash.edu.au

Important Deadlines
Paper submission: 15 April 2008
Proposals for Special Sessions/Tutorials: 15 March 2008
Author notification: 15 May 2008
Camera-ready papers: 15 June 2008


Organising Committee, PRIB 2008



From snoze.pa at gmail.com  Mon Jan 28 21:07:37 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Mon, 28 Jan 2008 15:07:37 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
Message-ID: <10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>

Still I am getting the same error message..

My question is:

Do i need to install bioperl-DB for biosql?

When I am using biosql and trying to load NCBI taxonomy then it is working
fine. but when I am trying to install bioperl-DB then it is giving me
following error message when loading NCBI taxonomy.

Any help?



Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
failed to insert node (10090;10090;10088;species;1;2): Duplicate entry
'10090' for key 2 at load_ncbi_taxonomy.pl line 568


From susantoroy at gmail.com  Mon Jan 28 21:05:49 2008
From: susantoroy at gmail.com (Susanta Roy)
Date: Tue, 29 Jan 2008 02:35:49 +0530
Subject: [Bioperl-l] Please remove my letter from your site
Message-ID: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>

Dear Sir,
Please remove my letter appearing at your below URL:
http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html
http://bioperl.org/pipermail/bioperl-l/2007-December.txt
http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html


It is not supposed to appear online.
Thanks in advance.

Regards
Suisanta


From cjfields at uiuc.edu  Mon Jan 28 21:53:33 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 28 Jan 2008 15:53:33 -0600
Subject: [Bioperl-l] Please remove my letter from your site
In-Reply-To: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>
References: <236a58340801281305oa318771p6753715c6cebdb74@mail.gmail.com>
Message-ID: 

Um, you posted to a public mailing list (hence the list is open to the  
public, for searching, indexing via Google, etc).  Terms of usage are  
here:

http://lists.open-bio.org/mailman/listinfo/bioperl-l

with more info here:

http://www.bioperl.org/wiki/Mailing_lists

BTW, this post will also appear.  C'est la vie!

chris

On Jan 28, 2008, at 3:05 PM, Susanta Roy wrote:

> Dear Sir,
> Please remove my letter appearing at your below URL:
> http://bioperl.org/pipermail/bioperl-l/2007-December/027004.html
> http://bioperl.org/pipermail/bioperl-l/2007-December.txt
> http://www.nabble.com/Enquiry-about-bioperl-project-td14522622.html
>
>
> It is not supposed to appear online.
> Thanks in advance.
>
> Regards
> Suisanta
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Tue Jan 29 17:15:41 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 29 Jan 2008 11:15:41 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
Message-ID: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>

Dear Users,
I tried the to refresh installation and seems it is working. But when I
loading sequences then it is giving me following warning messages. Am i
doing alright? or i am missing huge chunk of sequences..Thanks in advance
s

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were ("","1") FKs (27,3,4)
Duplicate entry '27-3-4-1' for key 2
---------------------------------------------------
...
...
and so on


From tristan.lefebure at gmail.com  Tue Jan 29 17:19:23 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 29 Jan 2008 12:19:23 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
Message-ID: <200801291219.23172.tristan.lefebure@gmail.com>

Hello,

I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers.
I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. 
My script works well for short request, but it gives the following error with the long request:

 ------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
500 short write
Content-Type: text/plain
Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
Client-Warning: Internal response

500 short write

STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: ./fetch_from_genbank.pl:58
---------------------------------------------------------

Does that mean that we can only fetch 500 sequences at a time?
Should I split my list in 500 ids framents and submit them one after the other?

Any suggestions very welcomed...
Thanks,
-Tristan


Here is the script:

##################################
use strict;
use warnings;
use Bio::DB::GenBank;
# use Bio::DB::EUtilities;
use Bio::SeqIO;
use Getopt::Long;

# 2008-01-22 T Lefebure
# I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank.
# The following procedure is not really good as the stream is first copied to a temporary file,
# and than re-used by BioPerl to generate the final file.

my $db = 'nucleotide';
my $format = 'genbank';
my $help= '';
my $dformat = 'gb';

GetOptions(
	'help|?' => \$help,
	'format=s'  => \$format,
	'database=s'	=> \$db,
);


my $printhelp = "\nUsage: $0 [options]  

Will download the corresponding data from GenBank. BioPerl is required.

Options:
	-h
		print this help
	-format: genbank|fasta|...
		give output format (default=genbank)
	-database: nucleotide|genome|protein|...
		define the database to search in (default=nucleotide)

The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";

if ($#ARGV<1) {
	print $printhelp;
	exit;
}

open LIST, $ARGV[0];
my @list = ;

if ($format eq 'fasta') { $dformat = 'fasta' }

my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
				-format => $dformat,
				-db => $db,
			);
my $seqio = $gb->get_Stream_by_acc(\@list);

my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
				-format => $format,
			);
while (my $seqo = $seqio->next_seq ) {
	print $seqo->id, "\n";
	$seqout->write_seq($seqo);
}


From cjfields at uiuc.edu  Tue Jan 29 18:06:08 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 29 Jan 2008 12:06:08 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>

Yes, you can only retrieve ~500 sequences at a time using either  
Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities  
interact with NCBI's EUtilities (the former module returns raw data  
from the URL to be processed later, the latter module returns Bio::Seq/ 
Bio::SeqIO objects).

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets

You can usually post more IDs using epost and fetch sequence referring  
to the WebEnv/key combo (batch posting).  I try to make this a bit  
easier with EUtilities but it is woefully lacking in documentation (my  
fault), but there is some code up on the wiki which should work.

chris

On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:

> Hello,
>
> I would like to download a large number of sequences from GenBank  
> (122,146 to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and  
> finally used Bio::DB::GenBank.
> My script works well for short request, but it gives the following  
> error with the long request:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after  
> the other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get  
> back to Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first  
> copied to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
> 	'help|?' => \$help,
> 	'format=s'  => \$format,
> 	'database=s'	=> \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is  
> required.
>
> Options:
> 	-h
> 		print this help
> 	-format: genbank|fasta|...
> 		give output format (default=genbank)
> 	-database: nucleotide|genome|protein|...
> 		define the database to search in (default=nucleotide)
>
> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html 
> \n";
>
> if ($#ARGV<1) {
> 	print $printhelp;
> 	exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 				-format => $dformat,
> 				-db => $db,
> 			);
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> 				-format => $format,
> 			);
> while (my $seqo = $seqio->next_seq ) {
> 	print $seqo->id, "\n";
> 	$seqout->write_seq($seqo);
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Tue Jan 29 18:22:56 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 29 Jan 2008 12:22:56 -0600
Subject: [Bioperl-l] loading sequence error bioseq
Message-ID: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>

Dear User,

 After successfully creating a database bioseqdb and loading ncbi_taxonomy
successfully I am getting following error message while loading sequences
into database.

load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc

MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were ("","31") FKs
MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed, values
were
MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were

Column 'dbname' cannot be null

STACK: /usr/local/bioperl-
db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620
-----------------------------------------------------------

 at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/load_seqdatabase.pl line
633

Any Idea?

Thanks in advance
s


From cjfields at uiuc.edu  Tue Jan 29 18:44:16 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 29 Jan 2008 12:44:16 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <479F7149.1010203@atgc.org>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
	<479F7149.1010203@atgc.org>
Message-ID: 

Forgot about that one; it's definitely a better way to do it if you  
have the GI/accessions.

chris

On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:

> you don't need to use bioperl to accomplish this task, to download  
> several thousand sequences based on accession ID list.
>
> NCBI batch Entrez can do that:
> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>
> just submit a large list of IDs, select database, and download.
>
> you can submit ~50,000 IDs in one file usually without problems.
> it may not return results if a list is larger than ~100,000 IDs
>
> --
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 Health Sciences Drive
> Genome Center, 4-th floor, room 4302
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email#1: akozik at atgc.org
> email#2: akozik at gmail.com
> web: http://www.atgc.org/
>
>
>
> Chris Fields wrote:
>> Yes, you can only retrieve ~500 sequences at a time using either  
>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities  
>> interact with NCBI's EUtilities (the former module returns raw data  
>> from the URL to be processed later, the latter module returns  
>> Bio::Seq/Bio::SeqIO objects).
>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets 
>>  You can usually post more IDs using epost and fetch sequence  
>> referring to the WebEnv/key combo (batch posting).  I try to make  
>> this a bit easier with EUtilities but it is woefully lacking in  
>> documentation (my fault), but there is some code up on the wiki  
>> which should work.
>> chris
>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>> Hello,
>>>
>>> I would like to download a large number of sequences from GenBank  
>>> (122,146 to be exact) following a list of accession numbers.
>>> I first investigated around Bio::DB::EUtilities, but got lost and  
>>> finally used Bio::DB::GenBank.
>>> My script works well for short request, but it gives the following  
>>> error with the long request:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: WebDBSeqI Request Error:
>>> 500 short write
>>> Content-Type: text/plain
>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>> Client-Warning: Internal response
>>>
>>> 500 short write
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
>>> Root.pm:359
>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/ 
>>> Bio/DB/WebDBSeqI.pm:685
>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/ 
>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>> STACK: ./fetch_from_genbank.pl:58
>>> ---------------------------------------------------------
>>>
>>> Does that mean that we can only fetch 500 sequences at a time?
>>> Should I split my list in 500 ids framents and submit them one  
>>> after the other?
>>>
>>> Any suggestions very welcomed...
>>> Thanks,
>>> -Tristan
>>>
>>>
>>> Here is the script:
>>>
>>> ##################################
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GenBank;
>>> # use Bio::DB::EUtilities;
>>> use Bio::SeqIO;
>>> use Getopt::Long;
>>>
>>> # 2008-01-22 T Lefebure
>>> # I tried to use Bio::DB::EUtilities without much succes and get  
>>> back to Bio::DB::GenBank.
>>> # The following procedure is not really good as the stream is  
>>> first copied to a temporary file,
>>> # and than re-used by BioPerl to generate the final file.
>>>
>>> my $db = 'nucleotide';
>>> my $format = 'genbank';
>>> my $help= '';
>>> my $dformat = 'gb';
>>>
>>> GetOptions(
>>>    'help|?' => \$help,
>>>    'format=s'  => \$format,
>>>    'database=s'    => \$db,
>>> );
>>>
>>>
>>> my $printhelp = "\nUsage: $0 [options]  
>>>
>>> Will download the corresponding data from GenBank. BioPerl is  
>>> required.
>>>
>>> Options:
>>>    -h
>>>        print this help
>>>    -format: genbank|fasta|...
>>>        give output format (default=genbank)
>>>    -database: nucleotide|genome|protein|...
>>>        define the database to search in (default=nucleotide)
>>>
>>> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html 
>>> \n";
>>>
>>> if ($#ARGV<1) {
>>>    print $printhelp;
>>>    exit;
>>> }
>>>
>>> open LIST, $ARGV[0];
>>> my @list = ;
>>>
>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>
>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>                -format => $dformat,
>>>                -db => $db,
>>>            );
>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>
>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>                -format => $format,
>>>            );
>>> while (my $seqo = $seqio->next_seq ) {
>>>    print $seqo->id, "\n";
>>>    $seqout->write_seq($seqo);
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From akozik at atgc.org  Tue Jan 29 18:32:41 2008
From: akozik at atgc.org (Alexander Kozik)
Date: Tue, 29 Jan 2008 10:32:41 -0800
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<16790738-AA52-4608-9501-D6A6BC7D4C73@uiuc.edu>
Message-ID: <479F7149.1010203@atgc.org>

you don't need to use bioperl to accomplish this task, to download 
several thousand sequences based on accession ID list.

NCBI batch Entrez can do that:
http://www.ncbi.nlm.nih.gov/sites/batchentrez

just submit a large list of IDs, select database, and download.

you can submit ~50,000 IDs in one file usually without problems.
it may not return results if a list is larger than ~100,000 IDs

--
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 Health Sciences Drive
Genome Center, 4-th floor, room 4302
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/



Chris Fields wrote:
> Yes, you can only retrieve ~500 sequences at a time using either 
> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities 
> interact with NCBI's EUtilities (the former module returns raw data from 
> the URL to be processed later, the latter module returns 
> Bio::Seq/Bio::SeqIO objects).
> 
> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-datasets 
> 
> 
> You can usually post more IDs using epost and fetch sequence referring 
> to the WebEnv/key combo (batch posting).  I try to make this a bit 
> easier with EUtilities but it is woefully lacking in documentation (my 
> fault), but there is some code up on the wiki which should work.
> 
> chris
> 
> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> 
>> Hello,
>>
>> I would like to download a large number of sequences from GenBank 
>> (122,146 to be exact) following a list of accession numbers.
>> I first investigated around Bio::DB::EUtilities, but got lost and 
>> finally used Bio::DB::GenBank.
>> My script works well for short request, but it gives the following 
>> error with the long request:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: WebDBSeqI Request Error:
>> 500 short write
>> Content-Type: text/plain
>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>> Client-Warning: Internal response
>>
>> 500 short write
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
>> STACK: Bio::DB::WebDBSeqI::_request 
>> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
>> STACK: Bio::DB::WebDBSeqI::get_seq_stream 
>> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc 
>> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>> STACK: ./fetch_from_genbank.pl:58
>> ---------------------------------------------------------
>>
>> Does that mean that we can only fetch 500 sequences at a time?
>> Should I split my list in 500 ids framents and submit them one after 
>> the other?
>>
>> Any suggestions very welcomed...
>> Thanks,
>> -Tristan
>>
>>
>> Here is the script:
>>
>> ##################################
>> use strict;
>> use warnings;
>> use Bio::DB::GenBank;
>> # use Bio::DB::EUtilities;
>> use Bio::SeqIO;
>> use Getopt::Long;
>>
>> # 2008-01-22 T Lefebure
>> # I tried to use Bio::DB::EUtilities without much succes and get back 
>> to Bio::DB::GenBank.
>> # The following procedure is not really good as the stream is first 
>> copied to a temporary file,
>> # and than re-used by BioPerl to generate the final file.
>>
>> my $db = 'nucleotide';
>> my $format = 'genbank';
>> my $help= '';
>> my $dformat = 'gb';
>>
>> GetOptions(
>>     'help|?' => \$help,
>>     'format=s'  => \$format,
>>     'database=s'    => \$db,
>> );
>>
>>
>> my $printhelp = "\nUsage: $0 [options]  
>>
>> Will download the corresponding data from GenBank. BioPerl is required.
>>
>> Options:
>>     -h
>>         print this help
>>     -format: genbank|fasta|...
>>         give output format (default=genbank)
>>     -database: nucleotide|genome|protein|...
>>         define the database to search in (default=nucleotide)
>>
>> The full description of the options can be find at 
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";
>>
>> if ($#ARGV<1) {
>>     print $printhelp;
>>     exit;
>> }
>>
>> open LIST, $ARGV[0];
>> my @list = ;
>>
>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>
>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>                 -format => $dformat,
>>                 -db => $db,
>>             );
>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>
>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>                 -format => $format,
>>             );
>> while (my $seqo = $seqio->next_seq ) {
>>     print $seqo->id, "\n";
>>     $seqout->write_seq($seqo);
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Tue Jan 29 21:31:47 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 29 Jan 2008 16:31:47 -0500
Subject: [Bioperl-l] loading sequence error bioseq
In-Reply-To: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>
References: <10f848910801291022v2d44487ao74b05c1a9fb10852@mail.gmail.com>
Message-ID: 

This looks suspiciously like a data error. Can you please give the  
full command line. This should also show which format your sequences  
are in.

	-hilmar

On Jan 29, 2008, at 1:22 PM, snoze pa wrote:

> Dear User,
>
>  After successfully creating a database bioseqdb and loading  
> ncbi_taxonomy
> successfully I am getting following error message while loading  
> sequences
> into database.
>
> load_seqdatabase.pl -host localhost -dbname bioseqdb .....etc
>
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were ("","31") FKs
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values  
> were
>
> Column 'dbname' cannot be null
>
> STACK: /usr/local/bioperl-
> db-1.5.2_100/scripts/biosql/load_seqdatabase.pl:620
> -----------------------------------------------------------
>
>  at /usr/local/bioperl-db-1.5.2_100/scripts/biosql/ 
> load_seqdatabase.pl line
> 633
>
> Any Idea?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From hlapp at gmx.net  Tue Jan 29 21:40:21 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 29 Jan 2008 16:40:21 -0500
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
	<10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
Message-ID: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net>

This would mean that two or more seqfeatures with the same type for  
the same sequence exist in the input data, each with rank 1.

Normally the rank will be incremented for each seqfeature of a  
sequence, so I'm not sure how this is happening here w/o seeing the  
data.

	-hilmar
On Jan 29, 2008, at 12:15 PM, snoze pa wrote:

> Dear Users,
> I tried the to refresh installation and seems it is working. But  
> when I
> loading sequences then it is giving me following warning messages.  
> Am i
> doing alright? or i am missing huge chunk of sequences..Thanks in  
> advance
> s
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,  
> values
> were ("","1") FKs (27,3,4)
> Duplicate entry '27-3-4-1' for key 2
> ---------------------------------------------------
> ...
> ...
> and so on
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From avilella at gmail.com  Wed Jan 30 09:28:34 2008
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 30 Jan 2008 09:28:34 +0000
Subject: [Bioperl-l] fetch dna seqs from genbank protein ids
Message-ID: <358f4d650801300128q44cf95a0va11799908c4f26a0@mail.gmail.com>

Hi bioperlers,

Got a question here:

>I have a bunch of protein sequences in multi-FastA with their
>accession numbers in the header and I want to retrieve their
>corresponding nucleotide sequences and nucleotide accession numbers.
>I can't seem to find a way to do it. I am looking at eUtils on the
>NCBI site, but they only do really simple stuff.

I had a look at the fetch example scripts, and I could fetch proteins
from Genbank,
but I don't see a clear connection between the protein sequence and
the DNA sequence.
Is this a DBlink? Which type?

Cheers,

    Albert.


From tristan.lefebure at gmail.com  Wed Jan 30 14:56:07 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 30 Jan 2008 09:56:07 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: 
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
Message-ID: <200801300956.07849.tristan.lefebure@gmail.com>

Thank you both!

Just in case it might be usefull for someone else, here are my ramblings:

1. I first tried to adapt my script and fetch 500 sequences at a time. It works, except that ~40% of the time NCBI gives the following error and my script crashed:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: WebDBSeqI Request Error:
[...]
    The proxy server received an invalid
    response from an upstream server.
[...]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: ./fetch_from_genbank.pl:68
-----------------------------------------------------------

I tried to modify the script so that when the retrieval of a 500 sequence block crashes, it continues with the other blocks, but I was unsuccessfull. It probably needs some better understanding of BioPerl errors...
Here is the section of the script that was modified:
#########
my $n_seq = scalar @list;
my @aborted;

for (my $i=1; $i<=$n_seq; $i += 500) {
	print "Fetching sequences $i to ", $i+499, ": ";
	my $start = $i -1;
	my $end = $i + 500 -1;
	my @red_list = @list[$start .. $end]; 
	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
					-format => $dformat,
					-db => $db,
				);

	my $seqio;
	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
		print "Aborted, resubmit latter\n";
		push @aborted, @red_list;
		next;
	}
	
	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
					-format => $format,
				);
	while (my $seqo = $seqio->next_seq ) {
# 		print $seqo->id, "\n";
		$seqout->write_seq($seqo);
	}
	print "Done\n";
}

if (@aborted) {
	open OUT, ">aborted_fetching.AN";
	foreach (@aborted) { print OUT $_ };
}
##########


2. So I moved to the second solution and tried batchentrez. I cut my 120,000 long AN list into 10,000 long pieces using split:
split -l 10000 full_list.AN splitted_list_

and then submitted the 13 lists one by one. I must say that I don't really like using a web-interface to fetch data, and here the most ennoying part is that you end up with a regular Entrez/GenBank webpage: select your format, export to file, chosse file name... and have to do it many times.
It is too much prone to human and web-browser errors for my taste, but it worked.
Nevertheless there is some caveats: 
- some downloaded files were incomplete (~10%) and you have to restart it
- you can't submit several lists in the same time (otherwise the same cookie will be used and you'll end up with several identical files) 

-Tristan

On Tuesday 29 January 2008 13:44:16 you wrote:
> Forgot about that one; it's definitely a better way to do it if you
> have the GI/accessions.
>
> chris
>
> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
> > you don't need to use bioperl to accomplish this task, to download
> > several thousand sequences based on accession ID list.
> >
> > NCBI batch Entrez can do that:
> > http://www.ncbi.nlm.nih.gov/sites/batchentrez
> >
> > just submit a large list of IDs, select database, and download.
> >
> > you can submit ~50,000 IDs in one file usually without problems.
> > it may not return results if a list is larger than ~100,000 IDs
> >
> > --
> > Alexander Kozik
> > Bioinformatics Specialist
> > Genome and Biomedical Sciences Facility
> > 451 Health Sciences Drive
> > Genome Center, 4-th floor, room 4302
> > University of California
> > Davis, CA 95616-8816
> > Phone: (530) 754-9127
> > email#1: akozik at atgc.org
> > email#2: akozik at gmail.com
> > web: http://www.atgc.org/
> >
> > Chris Fields wrote:
> >> Yes, you can only retrieve ~500 sequences at a time using either
> >> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
> >> interact with NCBI's EUtilities (the former module returns raw data
> >> from the URL to be processed later, the latter module returns
> >> Bio::Seq/Bio::SeqIO objects).
> >> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
> >>atasets You can usually post more IDs using epost and fetch sequence
> >> referring to the WebEnv/key combo (batch posting).  I try to make
> >> this a bit easier with EUtilities but it is woefully lacking in
> >> documentation (my fault), but there is some code up on the wiki
> >> which should work.
> >> chris
> >>
> >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> >>> Hello,
> >>>
> >>> I would like to download a large number of sequences from GenBank
> >>> (122,146 to be exact) following a list of accession numbers.
> >>> I first investigated around Bio::DB::EUtilities, but got lost and
> >>> finally used Bio::DB::GenBank.
> >>> My script works well for short request, but it gives the following
> >>> error with the long request:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: WebDBSeqI Request Error:
> >>> 500 short write
> >>> Content-Type: text/plain
> >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> >>> Client-Warning: Internal response
> >>>
> >>> 500 short write
> >>>
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/
> >>> Root.pm:359
> >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
> >>> Bio/DB/WebDBSeqI.pm:685
> >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
> >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
> >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> >>> STACK: ./fetch_from_genbank.pl:58
> >>> ---------------------------------------------------------
> >>>
> >>> Does that mean that we can only fetch 500 sequences at a time?
> >>> Should I split my list in 500 ids framents and submit them one
> >>> after the other?
> >>>
> >>> Any suggestions very welcomed...
> >>> Thanks,
> >>> -Tristan
> >>>
> >>>
> >>> Here is the script:
> >>>
> >>> ##################################
> >>> use strict;
> >>> use warnings;
> >>> use Bio::DB::GenBank;
> >>> # use Bio::DB::EUtilities;
> >>> use Bio::SeqIO;
> >>> use Getopt::Long;
> >>>
> >>> # 2008-01-22 T Lefebure
> >>> # I tried to use Bio::DB::EUtilities without much succes and get
> >>> back to Bio::DB::GenBank.
> >>> # The following procedure is not really good as the stream is
> >>> first copied to a temporary file,
> >>> # and than re-used by BioPerl to generate the final file.
> >>>
> >>> my $db = 'nucleotide';
> >>> my $format = 'genbank';
> >>> my $help= '';
> >>> my $dformat = 'gb';
> >>>
> >>> GetOptions(
> >>>    'help|?' => \$help,
> >>>    'format=s'  => \$format,
> >>>    'database=s'    => \$db,
> >>> );
> >>>
> >>>
> >>> my $printhelp = "\nUsage: $0 [options]  
> >>>
> >>> Will download the corresponding data from GenBank. BioPerl is
> >>> required.
> >>>
> >>> Options:
> >>>    -h
> >>>        print this help
> >>>    -format: genbank|fasta|...
> >>>        give output format (default=genbank)
> >>>    -database: nucleotide|genome|protein|...
> >>>        define the database to search in (default=nucleotide)
> >>>
> >>> The full description of the options can be find at
> >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
> >>> \n";
> >>>
> >>> if ($#ARGV<1) {
> >>>    print $printhelp;
> >>>    exit;
> >>> }
> >>>
> >>> open LIST, $ARGV[0];
> >>> my @list = ;
> >>>
> >>> if ($format eq 'fasta') { $dformat = 'fasta' }
> >>>
> >>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
> >>>                -format => $dformat,
> >>>                -db => $db,
> >>>            );
> >>> my $seqio = $gb->get_Stream_by_acc(\@list);
> >>>
> >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> >>>                -format => $format,
> >>>            );
> >>> while (my $seqo = $seqio->next_seq ) {
> >>>    print $seqo->id, "\n";
> >>>    $seqout->write_seq($seqo);
> >>> }
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign




From cjfields at uiuc.edu  Wed Jan 30 15:10:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 09:10:14 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: <7143A650-AA84-4331-B55A-A66C3F5BBAB0@uiuc.edu>

You can use an eval {} block to catch the error, then redo the loop  
(so you don't iterate to the next block) or use next and skip the  
current block if an error occurs.  If you use redo then you should use  
a counter to exit the loop after several tries.

chris

On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote:

> Thank you both!
>
> Just in case it might be usefull for someone else, here are my  
> ramblings:
>
> 1. I first tried to adapt my script and fetch 500 sequences at a  
> time. It works, except that ~40% of the time NCBI gives the  
> following error and my script crashed:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>    The proxy server received an invalid
>    response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
>
> I tried to modify the script so that when the retrieval of a 500  
> sequence block crashes, it continues with the other blocks, but I  
> was unsuccessfull. It probably needs some better understanding of  
> BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
>
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
>
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 	
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
>
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
>
>
> 2. So I moved to the second solution and tried batchentrez. I cut my  
> 120,000 long AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
>
> and then submitted the 13 lists one by one. I must say that I don't  
> really like using a web-interface to fetch data, and here the most  
> ennoying part is that you end up with a regular Entrez/GenBank  
> webpage: select your format, export to file, chosse file name... and  
> have to do it many times.
> It is too much prone to human and web-browser errors for my taste,  
> but it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to  
> restart it
> - you can't submit several lists in the same time (otherwise the  
> same cookie will be used and you'll end up with several identical  
> files)
>
> -Tristan
>
> On Tuesday 29 January 2008 13:44:16 you wrote:
>> Forgot about that one; it's definitely a better way to do it if you
>> have the GI/accessions.
>>
>> chris
>>
>> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
>>> you don't need to use bioperl to accomplish this task, to download
>>> several thousand sequences based on accession ID list.
>>>
>>> NCBI batch Entrez can do that:
>>> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>>>
>>> just submit a large list of IDs, select database, and download.
>>>
>>> you can submit ~50,000 IDs in one file usually without problems.
>>> it may not return results if a list is larger than ~100,000 IDs
>>>
>>> --
>>> Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 Health Sciences Drive
>>> Genome Center, 4-th floor, room 4302
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>> Chris Fields wrote:
>>>> Yes, you can only retrieve ~500 sequences at a time using either
>>>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
>>>> interact with NCBI's EUtilities (the former module returns raw data
>>>> from the URL to be processed later, the latter module returns
>>>> Bio::Seq/Bio::SeqIO objects).
>>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
>>>> atasets You can usually post more IDs using epost and fetch  
>>>> sequence
>>>> referring to the WebEnv/key combo (batch posting).  I try to make
>>>> this a bit easier with EUtilities but it is woefully lacking in
>>>> documentation (my fault), but there is some code up on the wiki
>>>> which should work.
>>>> chris
>>>>
>>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to download a large number of sequences from GenBank
>>>>> (122,146 to be exact) following a list of accession numbers.
>>>>> I first investigated around Bio::DB::EUtilities, but got lost and
>>>>> finally used Bio::DB::GenBank.
>>>>> My script works well for short request, but it gives the following
>>>>> error with the long request:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: WebDBSeqI Request Error:
>>>>> 500 short write
>>>>> Content-Type: text/plain
>>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>>>> Client-Warning: Internal response
>>>>>
>>>>> 500 short write
>>>>>
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ 
>>>>> Root/
>>>>> Root.pm:359
>>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
>>>>> Bio/DB/WebDBSeqI.pm:685
>>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
>>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
>>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>>>> STACK: ./fetch_from_genbank.pl:58
>>>>> ---------------------------------------------------------
>>>>>
>>>>> Does that mean that we can only fetch 500 sequences at a time?
>>>>> Should I split my list in 500 ids framents and submit them one
>>>>> after the other?
>>>>>
>>>>> Any suggestions very welcomed...
>>>>> Thanks,
>>>>> -Tristan
>>>>>
>>>>>
>>>>> Here is the script:
>>>>>
>>>>> ##################################
>>>>> use strict;
>>>>> use warnings;
>>>>> use Bio::DB::GenBank;
>>>>> # use Bio::DB::EUtilities;
>>>>> use Bio::SeqIO;
>>>>> use Getopt::Long;
>>>>>
>>>>> # 2008-01-22 T Lefebure
>>>>> # I tried to use Bio::DB::EUtilities without much succes and get
>>>>> back to Bio::DB::GenBank.
>>>>> # The following procedure is not really good as the stream is
>>>>> first copied to a temporary file,
>>>>> # and than re-used by BioPerl to generate the final file.
>>>>>
>>>>> my $db = 'nucleotide';
>>>>> my $format = 'genbank';
>>>>> my $help= '';
>>>>> my $dformat = 'gb';
>>>>>
>>>>> GetOptions(
>>>>>   'help|?' => \$help,
>>>>>   'format=s'  => \$format,
>>>>>   'database=s'    => \$db,
>>>>> );
>>>>>
>>>>>
>>>>> my $printhelp = "\nUsage: $0 [options]   
>>>>> 
>>>>>
>>>>> Will download the corresponding data from GenBank. BioPerl is
>>>>> required.
>>>>>
>>>>> Options:
>>>>>   -h
>>>>>       print this help
>>>>>   -format: genbank|fasta|...
>>>>>       give output format (default=genbank)
>>>>>   -database: nucleotide|genome|protein|...
>>>>>       define the database to search in (default=nucleotide)
>>>>>
>>>>> The full description of the options can be find at
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ 
>>>>> efetchseq_help.html
>>>>> \n";
>>>>>
>>>>> if ($#ARGV<1) {
>>>>>   print $printhelp;
>>>>>   exit;
>>>>> }
>>>>>
>>>>> open LIST, $ARGV[0];
>>>>> my @list = ;
>>>>>
>>>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>>>
>>>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>>>               -format => $dformat,
>>>>>               -db => $db,
>>>>>           );
>>>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>>>
>>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>>>               -format => $format,
>>>>>           );
>>>>> while (my $seqo = $seqio->next_seq ) {
>>>>>   print $seqo->id, "\n";
>>>>>   $seqout->write_seq($seqo);
>>>>> }
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From snoze.pa at gmail.com  Wed Jan 30 17:34:24 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 11:34:24 -0600
Subject: [Bioperl-l] bioseqDB error
In-Reply-To: <31534016-91B3-45C0-995D-CE5A82466303@gmx.net>
References: <10f848910801251549h546bea04p4c71cbb7a48aab5c@mail.gmail.com>
	<10f848910801281307i125ce285k5b73fd4ae5d4af28@mail.gmail.com>
	<557D19AE-CD1D-4558-8804-830B24F294AE@gmx.net>
	<10f848910801290915x7248944ds3a12d84b0508c280@mail.gmail.com>
	<31534016-91B3-45C0-995D-CE5A82466303@gmx.net>
Message-ID: <10f848910801300934q57e5d45cpbf0e17b45640e3f9@mail.gmail.com>

Hilmar,

The command I am using is following

load_seqdatabase.pl -host localhost -namespace bioperl -dbname bioseqdb
-dbuser root -format genbank sequences.txt

I have no idea why i am getting that error

thanks in advance


On Jan 29, 2008 3:40 PM, Hilmar Lapp  wrote:

> This would mean that two or more seqfeatures with the same type for
> the same sequence exist in the input data, each with rank 1.
>
> Normally the rank will be incremented for each seqfeature of a
> sequence, so I'm not sure how this is happening here w/o seeing the
> data.
>
>        -hilmar
> On Jan 29, 2008, at 12:15 PM, snoze pa wrote:
>
> > Dear Users,
> > I tried the to refresh installation and seems it is working. But
> > when I
> > loading sequences then it is giving me following warning messages.
> > Am i
> > doing alright? or i am missing huge chunk of sequences..Thanks in
> > advance
> > s
> >
> > -------------------- WARNING ---------------------
> > MSG: insert in Bio::DB::BioSQL::SeqFeatureAdaptor (driver) failed,
> > values
> > were ("","1") FKs (27,3,4)
> > Duplicate entry '27-3-4-1' for key 2
> > ---------------------------------------------------
> > ...
> > ...
> > and so on
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>


From snoze.pa at gmail.com  Wed Jan 30 18:01:46 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 12:01:46 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801291219.23172.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: <10f848910801301001k681e1291we0ce468e96d88f57@mail.gmail.com>

U can use LWP one line code to grab sequences..

On Jan 29, 2008 11:19 AM, Tristan Lefebure 
wrote:

> Hello,
>
> I would like to download a large number of sequences from GenBank (122,146
> to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and finally
> used Bio::DB::GenBank.
> My script works well for short request, but it gives the following error
> with the long request:
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after the
> other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get back to
> Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first copied
> to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
>        'help|?' => \$help,
>        'format=s'  => \$format,
>        'database=s'    => \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is required.
>
> Options:
>        -h
>                print this help
>        -format: genbank|fasta|...
>                give output format (default=genbank)
>        -database: nucleotide|genome|protein|...
>                define the database to search in (default=nucleotide)
>
> The full description of the options can be find at
> http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n
> ";
>
> if ($#ARGV<1) {
>        print $printhelp;
>        exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(  -retrievaltype => 'tempfile',
>                                -format => $dformat,
>                                -db => $db,
>                        );
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>                                -format => $format,
>                        );
> while (my $seqo = $seqio->next_seq ) {
>        print $seqo->id, "\n";
>        $seqout->write_seq($seqo);
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From snoze.pa at gmail.com  Wed Jan 30 18:38:12 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 12:38:12 -0600
Subject: [Bioperl-l] load_seqdatabase help
Message-ID: <10f848910801301038t1ae296c2o2453728b68dc81f8@mail.gmail.com>

Dear User,
 Is there any alternative way so that I can load following sequence in to
biosql schema. I am trying to use load_seqdatabase.pl but it is not working
in my case and showing numbers of warning/error messages.. I did everything
but unable to load it yet.

http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb



Any help, if i can load above sequence into my bioseqdb database.

Thanks in advance
s


From snoze.pa at gmail.com  Wed Jan 30 19:30:22 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Wed, 30 Jan 2008 13:30:22 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
Message-ID: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>

Hi Hilmar,

 After spending lots of time i figure out the error. I am able to load
sequences if the sequences do not have following entry

xrefs (non-sequence databases):

If the Genbank sequence have this entry then script load_seqdatabase.pl is
crashing. I try it in couple of sequences and found it is the culprit line
genbank format.  But this line is important as it contain lots of
information... so I am wondering how to solve this problem

Any help?

Thanks in advance
s


From Russell.Smithies at agresearch.co.nz  Wed Jan 30 19:34:44 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 31 Jan 2008 08:34:44 +1300
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com><479F7149.1010203@atgc.org>
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: 

Take a look at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi
Ebot is an interactive tool that generates a Perl script that implements
an E-utility pipeline.
You can probably hack the resulting script to introduce the required
BioPerly bits.

Russell Smithies 

Bioinformatics Software Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Tristan Lefebure
> Sent: Thursday, 31 January 2008 3:56 a.m.
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::GenBank and large number of requests
> 
> Thank you both!
> 
> Just in case it might be usefull for someone else, here are my
ramblings:
> 
> 1. I first tried to adapt my script and fetch 500 sequences at a time.
It works,
> except that ~40% of the time NCBI gives the following error and my
script crashed:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>     The proxy server received an invalid
>     response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
> 
> I tried to modify the script so that when the retrieval of a 500
sequence block
> crashes, it continues with the other blocks, but I was unsuccessfull.
It probably
> needs some better understanding of BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
> 
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
> 
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
> 
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
> 
> 
> 2. So I moved to the second solution and tried batchentrez. I cut my
120,000 long
> AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
> 
> and then submitted the 13 lists one by one. I must say that I don't
really like using
> a web-interface to fetch data, and here the most ennoying part is that
you end up
> with a regular Entrez/GenBank webpage: select your format, export to
file, chosse
> file name... and have to do it many times.
> It is too much prone to human and web-browser errors for my taste, but
it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to restart
it
> - you can't submit several lists in the same time (otherwise the same
cookie will be
> used and you'll end up with several identical files)
> 
> -Tristan
> 
> On Tuesday 29 January 2008 13:44:16 you wrote:
> > Forgot about that one; it's definitely a better way to do it if you
> > have the GI/accessions.
> >
> > chris
> >
> > On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
> > > you don't need to use bioperl to accomplish this task, to download
> > > several thousand sequences based on accession ID list.
> > >
> > > NCBI batch Entrez can do that:
> > > http://www.ncbi.nlm.nih.gov/sites/batchentrez
> > >
> > > just submit a large list of IDs, select database, and download.
> > >
> > > you can submit ~50,000 IDs in one file usually without problems.
> > > it may not return results if a list is larger than ~100,000 IDs
> > >
> > > --
> > > Alexander Kozik
> > > Bioinformatics Specialist
> > > Genome and Biomedical Sciences Facility
> > > 451 Health Sciences Drive
> > > Genome Center, 4-th floor, room 4302
> > > University of California
> > > Davis, CA 95616-8816
> > > Phone: (530) 754-9127
> > > email#1: akozik at atgc.org
> > > email#2: akozik at gmail.com
> > > web: http://www.atgc.org/
> > >
> > > Chris Fields wrote:
> > >> Yes, you can only retrieve ~500 sequences at a time using either
> > >> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
> > >> interact with NCBI's EUtilities (the former module returns raw
data
> > >> from the URL to be processed later, the latter module returns
> > >> Bio::Seq/Bio::SeqIO objects).
> > >>
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
> > >>atasets You can usually post more IDs using epost and fetch
sequence
> > >> referring to the WebEnv/key combo (batch posting).  I try to make
> > >> this a bit easier with EUtilities but it is woefully lacking in
> > >> documentation (my fault), but there is some code up on the wiki
> > >> which should work.
> > >> chris
> > >>
> > >> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
> > >>> Hello,
> > >>>
> > >>> I would like to download a large number of sequences from
GenBank
> > >>> (122,146 to be exact) following a list of accession numbers.
> > >>> I first investigated around Bio::DB::EUtilities, but got lost
and
> > >>> finally used Bio::DB::GenBank.
> > >>> My script works well for short request, but it gives the
following
> > >>> error with the long request:
> > >>>
> > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>> MSG: WebDBSeqI Request Error:
> > >>> 500 short write
> > >>> Content-Type: text/plain
> > >>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> > >>> Client-Warning: Internal response
> > >>>
> > >>> 500 short write
> > >>>
> > >>> STACK: Error::throw
> > >>> STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.8.8/Bio/Root/
> > >>> Root.pm:359
> > >>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
> > >>> Bio/DB/WebDBSeqI.pm:685
> > >>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
> > >>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> > >>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
> > >>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> > >>> STACK: ./fetch_from_genbank.pl:58
> > >>> ---------------------------------------------------------
> > >>>
> > >>> Does that mean that we can only fetch 500 sequences at a time?
> > >>> Should I split my list in 500 ids framents and submit them one
> > >>> after the other?
> > >>>
> > >>> Any suggestions very welcomed...
> > >>> Thanks,
> > >>> -Tristan
> > >>>
> > >>>
> > >>> Here is the script:
> > >>>
> > >>> ##################################
> > >>> use strict;
> > >>> use warnings;
> > >>> use Bio::DB::GenBank;
> > >>> # use Bio::DB::EUtilities;
> > >>> use Bio::SeqIO;
> > >>> use Getopt::Long;
> > >>>
> > >>> # 2008-01-22 T Lefebure
> > >>> # I tried to use Bio::DB::EUtilities without much succes and get
> > >>> back to Bio::DB::GenBank.
> > >>> # The following procedure is not really good as the stream is
> > >>> first copied to a temporary file,
> > >>> # and than re-used by BioPerl to generate the final file.
> > >>>
> > >>> my $db = 'nucleotide';
> > >>> my $format = 'genbank';
> > >>> my $help= '';
> > >>> my $dformat = 'gb';
> > >>>
> > >>> GetOptions(
> > >>>    'help|?' => \$help,
> > >>>    'format=s'  => \$format,
> > >>>    'database=s'    => \$db,
> > >>> );
> > >>>
> > >>>
> > >>> my $printhelp = "\nUsage: $0 [options] 

> > >>>
> > >>> Will download the corresponding data from GenBank. BioPerl is
> > >>> required.
> > >>>
> > >>> Options:
> > >>>    -h
> > >>>        print this help
> > >>>    -format: genbank|fasta|...
> > >>>        give output format (default=genbank)
> > >>>    -database: nucleotide|genome|protein|...
> > >>>        define the database to search in (default=nucleotide)
> > >>>
> > >>> The full description of the options can be find at
> > >>>
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
> > >>> \n";
> > >>>
> > >>> if ($#ARGV<1) {
> > >>>    print $printhelp;
> > >>>    exit;
> > >>> }
> > >>>
> > >>> open LIST, $ARGV[0];
> > >>> my @list = ;
> > >>>
> > >>> if ($format eq 'fasta') { $dformat = 'fasta' }
> > >>>
> > >>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
> > >>>                -format => $dformat,
> > >>>                -db => $db,
> > >>>            );
> > >>> my $seqio = $gb->get_Stream_by_acc(\@list);
> > >>>
> > >>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> > >>>                -format => $format,
> > >>>            );
> > >>> while (my $seqo = $seqio->next_seq ) {
> > >>>    print $seqo->id, "\n";
> > >>>    $seqout->write_seq($seqo);
> > >>> }
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher
> > >> Lab of Dr. Robert Switzer
> > >> Dept of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



From cjfields at uiuc.edu  Wed Jan 30 20:04:18 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 14:04:18 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
Message-ID: <0BA39C27-1871-441B-B2DE-F7FECF8570D7@uiuc.edu>

Sounds like a bug in the GenBank parser.  Could you post a bug report  
with an example sequence record and your script?

http://bugzilla.open-bio.org/

chris

On Jan 30, 2008, at 1:30 PM, snoze pa wrote:

> Hi Hilmar,
>
> After spending lots of time i figure out the error. I am able to load
> sequences if the sequences do not have following entry
>
> xrefs (non-sequence databases):
>
> If the Genbank sequence have this entry then script  
> load_seqdatabase.pl is
> crashing. I try it in couple of sequences and found it is the  
> culprit line
> genbank format.  But this line is important as it contain lots of
> information... so I am wondering how to solve this problem
>
> Any help?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Wed Jan 30 20:42:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 30 Jan 2008 14:42:14 -0600
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
In-Reply-To: <200801300956.07849.tristan.lefebure@gmail.com>
References: <200801291219.23172.tristan.lefebure@gmail.com>
	<479F7149.1010203@atgc.org>
	
	<200801300956.07849.tristan.lefebure@gmail.com>
Message-ID: <29768205-F511-4EDB-84D2-BCC36DBA92C7@uiuc.edu>

When using Bio::DB::EUtilities (from bioperl-live) this works for me:

use Bio::DB::EUtilities;

# get array of IDs somehow, in @ids

my ($start, $chunk, $last) = (0, 100, $#ids);

my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
                      -db => 'protein',
                      -rettype => 'genbank');

my $ct = 1; # used to denote separate files
my $tries = 0; # server attempts

while ($start < $last) {
     # want seqs in chunk size of 100 (set above)
     my $end = ($start + $chunk - 1 ) < $last ? ($start + $chunk -  
1) : $last;
     # grab slice of IDs
     my @sub = @ids[$start..$end];

     # pass to agent
     $factory->set_parameters(-id => \@sub );

     eval {
         # check server response, if good send to file
         $factory->get_Response(-file => ">seqs_$ct.gb");
     };

     # ERROR!
     if ($@) {
         $tries++;
         if ($tries <= 10) {
             warn("Server problem on attempt $tries:$@.\nTrying  
again...");
             redo;
         } else {
             die("Repeated server issues after $tries attempts.");
             # could warn and just skip this batch of accs using 'next'
         }
     }

     $start = $end+1;
     $ct++;
     $tries = 0;
}



chris

On Jan 30, 2008, at 8:56 AM, Tristan Lefebure wrote:

> Thank you both!
>
> Just in case it might be usefull for someone else, here are my  
> ramblings:
>
> 1. I first tried to adapt my script and fetch 500 sequences at a  
> time. It works, except that ~40% of the time NCBI gives the  
> following error and my script crashed:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> [...]
>    The proxy server received an invalid
>    response from an upstream server.
> [...]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/ 
> Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/ 
> DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/ 
> 5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/ 
> 5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:68
> -----------------------------------------------------------
>
> I tried to modify the script so that when the retrieval of a 500  
> sequence block crashes, it continues with the other blocks, but I  
> was unsuccessfull. It probably needs some better understanding of  
> BioPerl errors...
> Here is the section of the script that was modified:
> #########
> my $n_seq = scalar @list;
> my @aborted;
>
> for (my $i=1; $i<=$n_seq; $i += 500) {
> 	print "Fetching sequences $i to ", $i+499, ": ";
> 	my $start = $i -1;
> 	my $end = $i + 500 -1;
> 	my @red_list = @list[$start .. $end];
> 	my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 					-format => $dformat,
> 					-db => $db,
> 				);
>
> 	my $seqio;
> 	unless(	$seqio = $gb->get_Stream_by_acc(\@red_list)) {
> 		print "Aborted, resubmit latter\n";
> 		push @aborted, @red_list;
> 		next;
> 	}
> 	
> 	my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1].$i",
> 					-format => $format,
> 				);
> 	while (my $seqo = $seqio->next_seq ) {
> # 		print $seqo->id, "\n";
> 		$seqout->write_seq($seqo);
> 	}
> 	print "Done\n";
> }
>
> if (@aborted) {
> 	open OUT, ">aborted_fetching.AN";
> 	foreach (@aborted) { print OUT $_ };
> }
> ##########
>
>
> 2. So I moved to the second solution and tried batchentrez. I cut my  
> 120,000 long AN list into 10,000 long pieces using split:
> split -l 10000 full_list.AN splitted_list_
>
> and then submitted the 13 lists one by one. I must say that I don't  
> really like using a web-interface to fetch data, and here the most  
> ennoying part is that you end up with a regular Entrez/GenBank  
> webpage: select your format, export to file, chosse file name... and  
> have to do it many times.
> It is too much prone to human and web-browser errors for my taste,  
> but it worked.
> Nevertheless there is some caveats:
> - some downloaded files were incomplete (~10%) and you have to  
> restart it
> - you can't submit several lists in the same time (otherwise the  
> same cookie will be used and you'll end up with several identical  
> files)
>
> -Tristan
>
> On Tuesday 29 January 2008 13:44:16 you wrote:
>> Forgot about that one; it's definitely a better way to do it if you
>> have the GI/accessions.
>>
>> chris
>>
>> On Jan 29, 2008, at 12:32 PM, Alexander Kozik wrote:
>>> you don't need to use bioperl to accomplish this task, to download
>>> several thousand sequences based on accession ID list.
>>>
>>> NCBI batch Entrez can do that:
>>> http://www.ncbi.nlm.nih.gov/sites/batchentrez
>>>
>>> just submit a large list of IDs, select database, and download.
>>>
>>> you can submit ~50,000 IDs in one file usually without problems.
>>> it may not return results if a list is larger than ~100,000 IDs
>>>
>>> --
>>> Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 Health Sciences Drive
>>> Genome Center, 4-th floor, room 4302
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>> Chris Fields wrote:
>>>> Yes, you can only retrieve ~500 sequences at a time using either
>>>> Bio::DB::GenBank.  Both Bio::DB::GenBank and Bio::DB::EUtilities
>>>> interact with NCBI's EUtilities (the former module returns raw data
>>>> from the URL to be processed later, the latter module returns
>>>> Bio::Seq/Bio::SeqIO objects).
>>>> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.large-d
>>>> atasets You can usually post more IDs using epost and fetch  
>>>> sequence
>>>> referring to the WebEnv/key combo (batch posting).  I try to make
>>>> this a bit easier with EUtilities but it is woefully lacking in
>>>> documentation (my fault), but there is some code up on the wiki
>>>> which should work.
>>>> chris
>>>>
>>>> On Jan 29, 2008, at 11:19 AM, Tristan Lefebure wrote:
>>>>> Hello,
>>>>>
>>>>> I would like to download a large number of sequences from GenBank
>>>>> (122,146 to be exact) following a list of accession numbers.
>>>>> I first investigated around Bio::DB::EUtilities, but got lost and
>>>>> finally used Bio::DB::GenBank.
>>>>> My script works well for short request, but it gives the following
>>>>> error with the long request:
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: WebDBSeqI Request Error:
>>>>> 500 short write
>>>>> Content-Type: text/plain
>>>>> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
>>>>> Client-Warning: Internal response
>>>>>
>>>>> 500 short write
>>>>>
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/ 
>>>>> Root/
>>>>> Root.pm:359
>>>>> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/
>>>>> Bio/DB/WebDBSeqI.pm:685
>>>>> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/
>>>>> 5.8.8/Bio/DB/WebDBSeqI.pm:472
>>>>> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/
>>>>> perl/5.8.8/Bio/DB/NCBIHelper.pm:361
>>>>> STACK: ./fetch_from_genbank.pl:58
>>>>> ---------------------------------------------------------
>>>>>
>>>>> Does that mean that we can only fetch 500 sequences at a time?
>>>>> Should I split my list in 500 ids framents and submit them one
>>>>> after the other?
>>>>>
>>>>> Any suggestions very welcomed...
>>>>> Thanks,
>>>>> -Tristan
>>>>>
>>>>>
>>>>> Here is the script:
>>>>>
>>>>> ##################################
>>>>> use strict;
>>>>> use warnings;
>>>>> use Bio::DB::GenBank;
>>>>> # use Bio::DB::EUtilities;
>>>>> use Bio::SeqIO;
>>>>> use Getopt::Long;
>>>>>
>>>>> # 2008-01-22 T Lefebure
>>>>> # I tried to use Bio::DB::EUtilities without much succes and get
>>>>> back to Bio::DB::GenBank.
>>>>> # The following procedure is not really good as the stream is
>>>>> first copied to a temporary file,
>>>>> # and than re-used by BioPerl to generate the final file.
>>>>>
>>>>> my $db = 'nucleotide';
>>>>> my $format = 'genbank';
>>>>> my $help= '';
>>>>> my $dformat = 'gb';
>>>>>
>>>>> GetOptions(
>>>>>   'help|?' => \$help,
>>>>>   'format=s'  => \$format,
>>>>>   'database=s'    => \$db,
>>>>> );
>>>>>
>>>>>
>>>>> my $printhelp = "\nUsage: $0 [options]   
>>>>> 
>>>>>
>>>>> Will download the corresponding data from GenBank. BioPerl is
>>>>> required.
>>>>>
>>>>> Options:
>>>>>   -h
>>>>>       print this help
>>>>>   -format: genbank|fasta|...
>>>>>       give output format (default=genbank)
>>>>>   -database: nucleotide|genome|protein|...
>>>>>       define the database to search in (default=nucleotide)
>>>>>
>>>>> The full description of the options can be find at
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/ 
>>>>> efetchseq_help.html
>>>>> \n";
>>>>>
>>>>> if ($#ARGV<1) {
>>>>>   print $printhelp;
>>>>>   exit;
>>>>> }
>>>>>
>>>>> open LIST, $ARGV[0];
>>>>> my @list = ;
>>>>>
>>>>> if ($format eq 'fasta') { $dformat = 'fasta' }
>>>>>
>>>>> my $gb = new Bio::DB::GenBank(    -retrievaltype => 'tempfile',
>>>>>               -format => $dformat,
>>>>>               -db => $db,
>>>>>           );
>>>>> my $seqio = $gb->get_Stream_by_acc(\@list);
>>>>>
>>>>> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
>>>>>               -format => $format,
>>>>>           );
>>>>> while (my $seqo = $seqio->next_seq ) {
>>>>>   print $seqo->id, "\n";
>>>>>   $seqout->write_seq($seqo);
>>>>> }
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From georg.otto at tuebingen.mpg.de  Thu Jan 31 09:34:31 2008
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Thu, 31 Jan 2008 10:34:31 +0100
Subject: [Bioperl-l] Bio::DB::GenBank and large number of requests
References: <200801291219.23172.tristan.lefebure@gmail.com>
Message-ID: 

Hi,

I succeeded with a similar task using the seqhound database. I had a
list of > 200,000 gid numbers, but I guess it can work in a similar
fashion using accession numbers. Here is the script:

#!/usr/perl

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::Query::GenBank;
use Bio::DB::SeqHound;

my $sh = new Bio::DB::SeqHound();

my($USAGE) = "$0 id_file\n\n";

unless(@ARGV) {
	print $USAGE;
	exit;
}

my $id_file = $ARGV[0];

open ID_FILE, "<$id_file" or die "error: $!";

while () {
  chomp;
  my $id = $_;
  if (defined(my $seq_obj = $sh->get_Seq_by_gi($id))) {
    my $out = Bio::SeqIO->new(-format => 'fasta');
    $out->write_seq($seq_obj);
  } else {
    next;
  }
}


Best,

Georg


Tristan Lefebure  writes:

> Hello,
>
> I would like to download a large number of sequences from GenBank (122,146 to be exact) following a list of accession numbers.
> I first investigated around Bio::DB::EUtilities, but got lost and finally used Bio::DB::GenBank. 
> My script works well for short request, but it gives the following error with the long request:
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> 500 short write
> Content-Type: text/plain
> Client-Date: Tue, 29 Jan 2008 17:22:46 GMT
> Client-Warning: Internal response
>
> 500 short write
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::_request /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:685
> STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:472
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/local/share/perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: ./fetch_from_genbank.pl:58
> ---------------------------------------------------------
>
> Does that mean that we can only fetch 500 sequences at a time?
> Should I split my list in 500 ids framents and submit them one after the other?
>
> Any suggestions very welcomed...
> Thanks,
> -Tristan
>
>
> Here is the script:
>
> ##################################
> use strict;
> use warnings;
> use Bio::DB::GenBank;
> # use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Getopt::Long;
>
> # 2008-01-22 T Lefebure
> # I tried to use Bio::DB::EUtilities without much succes and get back to Bio::DB::GenBank.
> # The following procedure is not really good as the stream is first copied to a temporary file,
> # and than re-used by BioPerl to generate the final file.
>
> my $db = 'nucleotide';
> my $format = 'genbank';
> my $help= '';
> my $dformat = 'gb';
>
> GetOptions(
> 	'help|?' => \$help,
> 	'format=s'  => \$format,
> 	'database=s'	=> \$db,
> );
>
>
> my $printhelp = "\nUsage: $0 [options]  
>
> Will download the corresponding data from GenBank. BioPerl is required.
>
> Options:
> 	-h
> 		print this help
> 	-format: genbank|fasta|...
> 		give output format (default=genbank)
> 	-database: nucleotide|genome|protein|...
> 		define the database to search in (default=nucleotide)
>
> The full description of the options can be find at http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html\n";
>
> if ($#ARGV<1) {
> 	print $printhelp;
> 	exit;
> }
>
> open LIST, $ARGV[0];
> my @list = ;
>
> if ($format eq 'fasta') { $dformat = 'fasta' }
>
> my $gb = new Bio::DB::GenBank(	-retrievaltype => 'tempfile',
> 				-format => $dformat,
> 				-db => $db,
> 			);
> my $seqio = $gb->get_Stream_by_acc(\@list);
>
> my $seqout = Bio::SeqIO->new( -file => ">$ARGV[1]",
> 				-format => $format,
> 			);
> while (my $seqo = $seqio->next_seq ) {
> 	print $seqo->id, "\n";
> 	$seqout->write_seq($seqo);
> }



From bernd.web at gmail.com  Thu Jan 31 10:48:15 2008
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 31 Jan 2008 11:48:15 +0100
Subject: [Bioperl-l] searchio/blast
Message-ID: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>

Hi,

I noticed that the HTMLWriter output for a BLAST report may not be
correct if more than one sequence was "blasted".

After the BLAST report of the first sequence the report is ended with:
Search Parameters
Parameter	Value

Search Statistics
Statistic	Value

Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on
Thu Jan 31 11:35:51 2008
Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37 tseemann Exp $

Then the second HTML blast report follows.
Although maybe generally 1 sequence is blasted by a user requiring
HTML output, this may be nice to fix?
Also for the HTML Writer of FastA reports the statistics section is empty,

An additional issue with HTMLWriter  containing more than 1 BLAST
report is the following:
When a sequence ID occurs more than once, the link (on the E-value) is
to the first occurrence since it is not report specific.

In case the above is regarded as unwanted, I'd be happy to make a
concise example with code.


Best regards,
Bernd


From cjfields at uiuc.edu  Thu Jan 31 12:39:46 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 31 Jan 2008 06:39:46 -0600
Subject: [Bioperl-l] searchio/blast
In-Reply-To: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>
References: <716af09c0801310248x4386cc1ate37939fab0ad2339@mail.gmail.com>
Message-ID: 

The easiest way to take care of these (so we don't forget about them  
and can track changes) is to add them as BioPerl bugs/enhancement  
requests to bugzilla, along with example reports and code.

chris

On Jan 31, 2008, at 4:48 AM, Bernd Web wrote:

> Hi,
>
> I noticed that the HTMLWriter output for a BLAST report may not be
> correct if more than one sequence was "blasted".
>
> After the BLAST report of the first sequence the report is ended with:
> Search Parameters
> Parameter	Value
>
> Search Statistics
> Statistic	Value
>
> Produced by Bioperl module Bio::SearchIO::Writer::HTMLResultWriter on
> Thu Jan 31 11:35:51 2008
> Revision: $Id: HTMLResultWriter.pm,v 1.41 2006/10/02 04:45:37  
> tseemann Exp $
>
> Then the second HTML blast report follows.
> Although maybe generally 1 sequence is blasted by a user requiring
> HTML output, this may be nice to fix?
> Also for the HTML Writer of FastA reports the statistics section is  
> empty,
>
> An additional issue with HTMLWriter  containing more than 1 BLAST
> report is the following:
> When a sequence ID occurs more than once, the link (on the E-value) is
> to the first occurrence since it is not report specific.
>
> In case the above is regarded as unwanted, I'd be happy to make a
> concise example with code.
>
>
> Best regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From hlapp at gmx.net  Thu Jan 31 13:12:25 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 31 Jan 2008 08:12:25 -0500
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
Message-ID: 


On Jan 30, 2008, at 2:30 PM, snoze pa wrote:

> Hi Hilmar,
>
>  After spending lots of time i figure out the error. I am able to load
> sequences if the sequences do not have following entry
>
> xrefs (non-sequence databases):

Is this the literal value? I am asking because I can't find this in  
the file at

http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb

which you said was giving you grief. So does the genbank file above  
now load, or how can I identify the critical line in there?

	-hilmar
>
> If the Genbank sequence have this entry then script  
> load_seqdatabase.pl is
> crashing. I try it in couple of sequences and found it is the  
> culprit line
> genbank format.  But this line is important as it contain lots of
> information... so I am wondering how to solve this problem
>
> Any help?
>
> Thanks in advance
> s
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From snoze.pa at gmail.com  Thu Jan 31 18:46:24 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Thu, 31 Jan 2008 12:46:24 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: 
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
Message-ID: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>

The link i sent was related to my tutorial. I was following that website.
The typical example is one of the following which have *xrefs (non-sequence
databases): line.
thanks
s
*
LOCUS       P27912                   792 aa            linear   VRL
15-JAN-2008
DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
            protein); prM; Peptide pr; Small envelope protein M (Matrix
            protein); Envelope protein E; Non-structural protein 1 (NS1)].
ACCESSION   P27912
VERSION     P27912.1  GI:130422
DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
            class: standard.
            created: Aug 1, 1992.
            sequence updated: Aug 1, 1992.
            annotation updated: Jan 15, 2008.
            xrefs: D00502.1, BAA00394.1, B32401
            *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
            GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
            InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
            InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:2.60.98.10,
            Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
Pfam:PF00869,
            Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
KEYWORDS    Capsid protein; Cleavage on pair of basic residues; Endoplasmic
            reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
            Transmembrane; Viral nucleoprotein; Virion.
SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
  ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
            Viruses; ssRNA positive-strand viruses, no DNA stage;
Flaviviridae;
            Flavivirus; Dengue virus group.
REFERENCE   1  (residues 1 to 792)
  AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
  TITLE     Genetic relatedness among structural protein genes of dengue 1
            virus strains
  JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
   PUBMED   2738579
  REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
            [FUNCTION] Protein C packages viral RNA to form a viral
            nucleocapsid, and promotes virion budding (By similarity).
            [FUNCTION] prM acts as a chaperone for envelope protein E during
            intracellular virion assembly by masking and inactivating
envelope
            protein E fusion peptide. prM is matured in the last step of
virion
            assembly, presumably to avoid catastrophic activation of the
viral
            fusion peptide induced by the acidic pH of the trans-Golgi
network.
            After cleavage by host furin, the pr peptide is released in the
            extracellular medium and small envelope protein M and envelope
            protein E homodimers are dissociated (By similarity).
            [FUNCTION] Envelope protein E binds cell surface receptor and is
            involved in membrane fusion between virion and target cell.
            Synthesized as an homodimer with prM which acts as a chaperone
for
            envelope protein E. After cleavage of prM, envelope protein E
            dissociate from small envelope protein M and homodimerizes (By
            similarity).
            [FUNCTION] Non-structural protein 1 is slowly secreted from
            mammalian cells, but not from mosquito cells. Secreted form
elicits
            protective immune response and plays an essential role in RNA
            replication. Soluble and membrane-associated NS1 may activate
human
            complement and induce host vascular leakage. This effect might
            explain the clinical manifestations of dengue hemorrhagic fever
and
            dengue shock syndrome (By similarity).
            [SUBUNIT] prM and envelope protein E form heterodimers in the
            endoplasmic reticulum and Golgi. Envelope protein E forms
            homodimers. NS1 forms homodimers as well as homohexamers when
            secreted. NS1 may interact with NS4A (By similarity).
            [SUBCELLULAR LOCATION] Note=The virion is assembled in the
            endoplasmic reticulum lumen, transported by vesicles to the
Golgi,
            then transported again to the cell membrane where it is released
            outside the cell.
            [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
            [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
            [SUBCELLULAR LOCATION] Small envelope protein M: Virion
membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
            Endoplasmic reticulum membrane; Peripheral membrane protein;
            Lumenal side (By similarity).
            [DOMAIN] Transmembrane domains of the small envelope protein M
and
            envelope protein E contains an endoplasmic reticulum retention
            signals (By similarity).
            [PTM] Specific enzymatic cleavages in vivo yield mature
proteins.
            The nascent protein C contains a C-terminal hydrophobic domain
that
            act as a signal sequence for translocation of prM into the lumen
of
            the ER. Mature protein C is cleaved at a site upstream of this
            hydrophobic domain by NS3. prM is cleaved in post-Golgi vesicles
by
            a host furin, releasing the mature small envelope protein M, and
            peptide pr (By similarity).
            [PTM] Envelope protein E and non-structural protein 1 are
            N-glycosylated (By similarity).
FEATURES             Location/Qualifiers
     source          1..792
                     /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
                     /specific_host="Aedes aegypti (Yellowfever mosquito)"
                     /specific_host="Homo sapiens (Human)"
                     /db_xref="taxon:11057"
     Protein         1..>792
                     /product="Genome polyprotein [Contains: Protein C"
     Region          1..101
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          1..100
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Protein C. /FTId=PRO_0000037884."
     Region          5..114
                     /region_name="Flavi_capsid"
                     /note="Flavivirus capsid protein C. Flaviviruses are
small
                     enveloped viruses with virions comprised of 3 proteins
                     called C, M and E. Multiple copies of the C protein
form
                     the nucleocapsid, which contains the ssRNA molecule;
                     pfam01003"
                     /db_xref="CDD:85176"
     Site            100..101
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by serine protease NS3 (By
similarity)."
     Region          101..114
                     /region_name="Propeptide"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="ER anchor for the protein C, removed in mature
form
                     by serine protease NS3. /FTId=PRO_0000037885."
     Region          102..122
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            114..115
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          115..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="prM. /FTId=PRO_0000264649."
     Region          115..205
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Peptide pr. /FTId=PRO_0000264650."
     Region          119..204
                     /region_name="Flavi_propep"
                     /note="Flavivirus polyprotein propeptide. The
flaviviruses
                     are small enveloped animal viruses containing a single
                     positive strand genomic RNA. The genome encodes one
large
                     ORF a polyprotein which undergos proteolytic processing
                     into mature viral peptide chains; pfam01570"
                     /db_xref="CDD:65376"
     Region          123..238
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            183
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Site            205..206
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host furin (By similarity)."
     Region          206..280
                     /region_name="Flavi_M"
                     /note="Flavivirus envelope glycoprotein M. Flaviviruses
                     are small enveloped viruses with virions comprised of 3
                     proteins called C, M and E. The envelope glycoprotein M
is
                     made as a precursor, called prM; pfam01004"
                     /db_xref="CDD:85177"
     Region          206..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Small envelope protein M. /FTId=PRO_0000037886."
     Region          239..259
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          260..265
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          266..286
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            280..281
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          281..775
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Envelope protein E. /FTId=PRO_0000037887."
     Region          281..576
                     /region_name="Flavi_glycoprot"
                     /note="Flavivirus glycoprotein, central and
dimerisation
                     domains; pfam00869"
                     /db_xref="CDD:85082"
     Bond            bond(283,310)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          287..725
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Bond            bond(340,401)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            347
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(354,385)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Bond            bond(372,396)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            433
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(465,565)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          578..673
                     /region_name="Flavi_glycop_C"
                     /note="Flavivirus glycoprotein, immunoglobulin-like
                     domain; pfam02832"
                     /db_xref="CDD:66513"
     Bond            bond(582,613)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          726..746
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          747..752
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          753..773
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          774..>792
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            775..776
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          776..>792
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Non-structural protein 1. /FTId=PRO_0000037888."
ORIGIN
        1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf vaflrflaip
       61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp talafhlttr
      121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm teaepddvdc
      181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega wkqiqkvetw
      241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd fveglsgatw
      301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt dsrcptqgea
      361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv qyenlkysvi
      421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg ldfnrvvllt
      481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev vvlgsqegam
      541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek evaetqhgtv
      601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae ppfgesyivv
      661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft svgklihqif
      721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg vmvqadsgcv
      781 inwkgkelkc gs
//


On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:

>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>


From hlapp at gmx.net  Thu Jan 31 20:10:35 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 31 Jan 2008 15:10:35 -0500
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
	<10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
Message-ID: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>

I see. Note that the sequence below is really a UniProt sequence,  
that has been reformatted into GenBank format, and hence aren't in  
your typical genbank sequence format (which usually lacks DBSOURCE,  
for example). (The joys of data integration.)

If you load the same sequence from UniProt, does it still fail to  
parse or to load?

Also, does it or does this not mean that sequences at the link you  
sent load w/o error? I.e., can I close that issue report, or is there  
a bug in bioperl-db?

	-hilmar

On Jan 31, 2008, at 1:46 PM, snoze pa wrote:

> The link i sent was related to my tutorial. I was following that  
> website. The typical example is one of the following which have  
> xrefs (non-sequence databases): line.
> thanks
> s
>
> LOCUS       P27912                   792 aa            linear   VRL  
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein)  
> (Capsid
>             protein); prM; Peptide pr; Small envelope protein M  
> (Matrix
>             protein); Envelope protein E; Non-structural protein 1  
> (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             xrefs (non-sequence databases): HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069,  
> InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA: 
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,  
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;  
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane;  
> Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;  
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of  
> dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein  
> E during
>             intracellular virion assembly by masking and  
> inactivating envelope
>             protein E fusion peptide. prM is matured in the last  
> step of virion
>             assembly, presumably to avoid catastrophic activation  
> of the viral
>             fusion peptide induced by the acidic pH of the trans- 
> Golgi network.
>             After cleavage by host furin, the pr peptide is  
> released in the
>             extracellular medium and small envelope protein M and  
> envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface  
> receptor and is
>             involved in membrane fusion between virion and target  
> cell.
>             Synthesized as an homodimer with prM which acts as a  
> chaperone for
>             envelope protein E. After cleavage of prM, envelope  
> protein E
>             dissociate from small envelope protein M and  
> homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted  
> from
>             mammalian cells, but not from mosquito cells. Secreted  
> form elicits
>             protective immune response and plays an essential role  
> in RNA
>             replication. Soluble and membrane-associated NS1 may  
> activate human
>             complement and induce host vascular leakage. This  
> effect might
>             explain the clinical manifestations of dengue  
> hemorrhagic fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers  
> in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as  
> homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to  
> the Golgi,
>             then transported again to the cell membrane where it is  
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By  
> similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane  
> protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope  
> protein M and
>             envelope protein E contains an endoplasmic reticulum  
> retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature  
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic  
> domain that
>             act as a signal sequence for translocation of prM into  
> the lumen of
>             the ER. Mature protein C is cleaved at a site upstream  
> of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi  
> vesicles by
>             a host furin, releasing the mature small envelope  
> protein M, and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF  
> 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever  
> mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains:  
> Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C.  
> Flaviviruses are small
>                      enveloped viruses with virions comprised of 3  
> proteins
>                      called C, M and E. Multiple copies of the C  
> protein form
>                      the nucleocapsid, which contains the ssRNA  
> molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By  
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in  
> mature form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The  
> flaviviruses
>                      are small enveloped animal viruses containing  
> a single
>                      positive strand genomic RNA. The genome  
> encodes one large
>                      ORF a polyprotein which undergos proteolytic  
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.  
> Flaviviruses
>                      are small enveloped viruses with virions  
> comprised of 3
>                      proteins called C, M and E. The envelope  
> glycoprotein M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Small envelope protein M. / 
> FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and  
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin- 
> like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Non-structural protein 1. / 
> FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf  
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp  
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm  
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega  
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd  
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt  
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv  
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg  
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev  
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek  
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae  
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft  
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg  
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:
>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to  
> load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================







From snoze.pa at gmail.com  Thu Jan 31 20:21:18 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Thu, 31 Jan 2008 14:21:18 -0600
Subject: [Bioperl-l] found the error tarp in load_seqdatabase.pl
In-Reply-To: <3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>
References: <10f848910801301130t51ba7e98w4c52ec4ceb61cc93@mail.gmail.com>
	
	<10f848910801311046y34e3f6ebl3300f5adebe27587@mail.gmail.com>
	<3B75A2FD-5EBE-4830-AD00-8FC7F669DB97@gmx.net>
Message-ID: <10f848910801311221q2a9f0d02x6c4600048f05adab@mail.gmail.com>

Thanks Hilmar,

 I also thought that they are translated into genbank format. My problem is
i have downloaded tons of sequences from NCBI in gb format. In my flat
file,  i have many sequences in this format so I am unable to load them into
local database using  load_seqdatabase.pl script. So far i am full of
warnings and errors. Any solution to this problem? otherwise i will try to
write some code to load all sequences into local data base. But it seems to
be easy to modify the parsing code so that we can load these sequences.


>format (which usually lacks DBSOURCE, for example

I think if the three dimensional structure of the protein is known then in
ncbi gb format the DBSOURCE is common. I agree with you, the joys of
integration.

The link was related to tutorial i was using.. u can off it.

Thanks for looking into matter..
 s

On Jan 31, 2008 2:10 PM, Hilmar Lapp  wrote:

> I see. Note that the sequence below is really a UniProt sequence, that has
> been reformatted into GenBank format, and hence aren't in your typical
> genbank sequence format (which usually lacks DBSOURCE, for example). (The
> joys of data integration.)
> If you load the same sequence from UniProt, does it still fail to parse or
> to load?
>
> Also, does it or does this not mean that sequences at the link you sent
> load w/o error? I.e., can I close that issue report, or is there a bug in
> bioperl-db?
>
> -hilmar
>
> On Jan 31, 2008, at 1:46 PM, snoze pa wrote:
>
> The link i sent was related to my tutorial. I was following that website.
> The typical example is one of the following which have *xrefs
> (non-sequence databases): line.
> thanks
> s
> *
> LOCUS       P27912                   792 aa            linear   VRL
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
>             protein); prM; Peptide pr; Small envelope protein M (Matrix
>             protein); Envelope protein E; Non-structural protein 1 (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein E
> during
>             intracellular virion assembly by masking and inactivating
> envelope
>             protein E fusion peptide. prM is matured in the last step of
> virion
>             assembly, presumably to avoid catastrophic activation of the
> viral
>             fusion peptide induced by the acidic pH of the trans-Golgi
> network.
>             After cleavage by host furin, the pr peptide is released in
> the
>             extracellular medium and small envelope protein M and envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface receptor and
> is
>             involved in membrane fusion between virion and target cell.
>             Synthesized as an homodimer with prM which acts as a chaperone
> for
>             envelope protein E. After cleavage of prM, envelope protein E
>             dissociate from small envelope protein M and homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted from
>             mammalian cells, but not from mosquito cells. Secreted form
> elicits
>             protective immune response and plays an essential role in RNA
>             replication. Soluble and membrane-associated NS1 may activate
> human
>             complement and induce host vascular leakage. This effect might
>             explain the clinical manifestations of dengue hemorrhagic
> fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to the
> Golgi,
>             then transported again to the cell membrane where it is
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope protein M
> and
>             envelope protein E contains an endoplasmic reticulum retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic domain
> that
>             act as a signal sequence for translocation of prM into the
> lumen of
>             the ER. Mature protein C is cleaved at a site upstream of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi
> vesicles by
>             a host furin, releasing the mature small envelope protein M,
> and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains: Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C. Flaviviruses are
> small
>                      enveloped viruses with virions comprised of 3
> proteins
>                      called C, M and E. Multiple copies of the C protein
> form
>                      the nucleocapsid, which contains the ssRNA molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in mature
> form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The
> flaviviruses
>                      are small enveloped animal viruses containing a
> single
>                      positive strand genomic RNA. The genome encodes one
> large
>                      ORF a polyprotein which undergos proteolytic
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.
> Flaviviruses
>                      are small enveloped viruses with virions comprised of
> 3
>                      proteins called C, M and E. The envelope glycoprotein
> M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Small envelope protein M.
> /FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin-like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Non-structural protein 1.
> /FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp  wrote:
>
> >
> > On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
> >
> > > Hi Hilmar,
> > >
> > >  After spending lots of time i figure out the error. I am able to load
> > > sequences if the sequences do not have following entry
> > >
> > > xrefs (non-sequence databases):
> >
> > Is this the literal value? I am asking because I can't find this in
> > the file at
> >
> > http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
> >
> > which you said was giving you grief. So does the genbank file above
> > now load, or how can I identify the critical line in there?
> >
> >        -hilmar
> > >
> > > If the Genbank sequence have this entry then script
> > > load_seqdatabase.pl is
> > > crashing. I try it in couple of sequences and found it is the
> > > culprit line
> > > genbank format.  But this line is important as it contain lots of
> > > information... so I am wondering how to solve this problem
> > >
> > > Any help?
> > >
> > > Thanks in advance
> > > s
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>