From hlapp at gmx.net  Mon Feb  6 13:29:38 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 6 Feb 2006 10:29:38 -0800
Subject: [BioSQL-l] list resumes
Message-ID: <25f1da01ac9f022027fe9127892934aa@gmx.net>

Hi all, if you receive this email it means that the Biosql list is 
working again at its previous address. It somehow got lost in the 
migration to a new server and finally Chris Dagdigian came back from 
vacation and brought it back on-line.

Please repost any emails you may have sent to the list over the past 
week or two (they should have bounced), and I apologize for the 
inconvenience.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From Doris.Siegl at fh-hagenberg.at  Wed Feb  8 08:57:03 2006
From: Doris.Siegl at fh-hagenberg.at (Siegl Doris)
Date: Wed, 8 Feb 2006 14:57:03 +0100
Subject: [BioSQL-l] OBDA abbreviation
Message-ID: <532D7AB7D5A2A34EAD3A39A7B650E8773DE48C@postfux.fhs-hagenberg.ac.at>

Dear ng,

I was wondering whether the abbreviation OBDA stands for "Open
Bioinformatics Database Access" or "Open Bioinformatics Data Access". I
could not find any information about this issue on the OBDA homepage.
Could anybody clear this up for me?


Thanks,
Doris


From torsten.seemann at infotech.monash.edu.au  Wed Feb  8 12:52:52 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 09 Feb 2006 04:52:52 +1100
Subject: [BioSQL-l] OBDA abbreviation
In-Reply-To: <532D7AB7D5A2A34EAD3A39A7B650E8773DE48C@postfux.fhs-hagenberg.ac.at>
References: <532D7AB7D5A2A34EAD3A39A7B650E8773DE48C@postfux.fhs-hagenberg.ac.at>
Message-ID: <43EA2FF4.4030302@infotech.monash.edu.au>

> I was wondering whether the abbreviation OBDA stands for "Open
> Bioinformatics Database Access" or "Open Bioinformatics Data Access". I
> could not find any information about this issue on the OBDA homepage.
> Could anybody clear this up for me?

You are right in that it is not defined on the OBDA homepage (that I 
could find), but according to the BioPerl "OBDA" HOWTO at 	 
http://bioperl.open-bio.org/wiki/HOWTO:OBDA
it is "Open Biological Database Access" which I have now made clear in 
the Bioperl WIKI entry for OBDA at http://bioperl.open-bio.org/wiki/OBDA

Some acronyms have a life of their own! :-)

--Torsten Seemann

From hlapp at gnf.org  Thu Feb  9 19:13:31 2006
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu, 09 Feb 2006 16:13:31 -0800
Subject: [BioSQL-l] OBDA abbreviation
In-Reply-To: <532D7AB7D5A2A34EAD3A39A7B650E8773DE48C@postfux.fhs-hagenberg.ac.at>
Message-ID: <C0111AAB.6D35%hlapp@gnf.org>

I believe somebody answered this already, but just in case, it stands for
Open Bioinformatics Database Access.

    -hilmar


On 2/8/06 5:57 AM, "Siegl Doris" <Doris.Siegl at fh-hagenberg.at> wrote:

> Dear ng,
> 
> I was wondering whether the abbreviation OBDA stands for "Open
> Bioinformatics Database Access" or "Open Bioinformatics Data Access". I
> could not find any information about this issue on the OBDA homepage.
> Could anybody clear this up for me?
> 
> 
> Thanks,
> Doris
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From cjfields at uiuc.edu  Tue Feb 14 15:32:42 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 14 Feb 2006 14:32:42 -0600
Subject: [BioSQL-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
Message-ID: <001201c631a5$ce7496f0$15327e82@pyrimidine>

Hilmar, 

Good News: I've added a section to the bioperl wiki on installing bioperl-db
in Windows:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Installing_bioperl
-db

Bad News:  There's a new problem now. I updated from CVS yesterday; I walked
through the steps and ran 'nmake test', with everything passing fine.
However, load_seqdatabase.pl is extremely slow; it's loading a sequence
every 5 minutes or so.  I noticed (when using '-debug') that it is hanging
up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a database,
load the biosql schema, and load sequences w/o loading taxonomy, the problem
goes away.

Here's the debugging output (I cut it off at the point it hangs up):
----------------------------------------------------------------------------
-------------------------
C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -driver
mysql -namespace test -dbname biosql -dbuser root -dbpass ********** -format
genbank  -debug NP_252217.gpt
Loading NP_252217.gpt ...
attempting to load adaptor class for Bio::Seq::RichSeq
        attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
        attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
        attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Annotation::Collection
        attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::Root::Root
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
        attempting to load module Bio::DB::BioSQL::RootIAdaptor
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionIAdaptor
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
        attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
        attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
        attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
        attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
        attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
        attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
        attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
        attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
        attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
        attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
        attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
        attempting to load module Bio::DB::BioSQL::LocationIAdaptor
        attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
        attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer
for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "test" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES
(?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "test" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id =
?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)  
----------------------------------------------------------------------------
-------------------------

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hlapp at gmx.net  Wed Feb 15 20:54:01 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 17:54:01 -0800
Subject: [BioSQL-l] [Bioperl-l] Added 'Installing bioperl-db in Windows'
	to wiki, problems with bioperl-db
In-Reply-To: <001201c631a5$ce7496f0$15327e82@pyrimidine>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
Message-ID: <c0a15fab7d1cbd8b6bc0554e7d9dd45b@gmx.net>


On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:

> Hilmar,
>
> Good News: I've added a section to the bioperl wiki on installing  
> bioperl-db
> in Windows:
>
> http://www.bioperl.org/wiki/ 
> Installing_Bioperl_on_Windows#Installing_bioperl
> -db
>
> Bad News:  There's a new problem now. I updated from CVS yesterday; I  
> walked
> through the steps and ran 'nmake test', with everything passing fine.
> However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> every 5 minutes or so.  I noticed (when using '-debug') that it is  
> hanging
> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a  
> database,
> load the biosql schema, and load sequences w/o loading taxonomy, the  
> problem
> goes away.
>
> Here's the debugging output (I cut it off at the point it hangs up):
> [...]

> preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND  
> ncbi_taxon_id =
> ?
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)

I'm a bit surprised if this is the query where it hangs. Are the  
indexes all there? There should be a primary key index on  
taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name  
over (taxon_id,name,name_class). Also, there should be separate indexes  
on taxon_name.taxon_id and taxon_name.name. Are they all there? If you  
reinstantiated the schema from the DDL then it seems unlikely that  
somehow the indexes have vanished except if you messed with the schema  
or the DDL.

Putting an index on taxon_name.name_class really can't make sense, so  
let's assume it can't be that.

So really I suspect this has something to do with the state of the  
database and the version of MySQL. In particular, from some 4.x version  
of MySQL under certain circumstances you have to analyze the statistics  
of the tables in order to get the optimizer pick up the indexes  
properly. Are you on MySQL 4.x and if so, have you done that?

There's the ANALYZE TABLE command:
http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html

Note the comment: "This statement works with MyISAM, BDB, and (as of  
MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?

Also, you can check the execution plan for the query using EXPLAIN.
http://dev.mysql.com/doc/refman/4.1/en/explain.html

This should show you whether the index would be picked up for the query  
or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to  
the db using the mysql shell (mysql).

I believe something similarly strange was encountered by someone using  
DB::GFF (or Chado) under MySQL, and if I recall correctly the solution  
was to optimize (analyze) the tables. Maybe someone who was in that  
thread reads this and can comment?

	-hilmar


>
> ----------------------------------------------------------------------- 
> -----
> -------------------------
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------


From cjfields at uiuc.edu  Wed Feb 15 22:56:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 21:56:14 -0600
Subject: [BioSQL-l] [Bioperl-l] Added 'Installing bioperl-db in Windows'
	to wiki, problems with bioperl-db
In-Reply-To: <c0a15fab7d1cbd8b6bc0554e7d9dd45b@gmx.net>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	<c0a15fab7d1cbd8b6bc0554e7d9dd45b@gmx.net>
Message-ID: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>


On Feb 15, 2006, at 7:54 PM, Hilmar Lapp wrote:

>
> On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:
>
>> Hilmar,
>>
>> Good News: I've added a section to the bioperl wiki on installing
>> bioperl-db
>> in Windows:
>>
>> http://www.bioperl.org/wiki/
>> Installing_Bioperl_on_Windows#Installing_bioperl
>> -db
>>
>> Bad News:  There's a new problem now. I updated from CVS yesterday; I
>> walked
>> through the steps and ran 'nmake test', with everything passing fine.
>> However, load_seqdatabase.pl is extremely slow; it's loading a  
>> sequence
>> every 5 minutes or so.  I noticed (when using '-debug') that it is
>> hanging
>> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a
>> database,
>> load the biosql schema, and load sequences w/o loading taxonomy, the
>> problem
>> goes away.
>>
>> Here's the debugging output (I cut it off at the point it hangs up):
>> [...]
>
>> preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
>> NULL,
>> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name  
>> WHERE
>> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND
>> ncbi_taxon_id =
>> ?
>> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
>> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)
>
> I'm a bit surprised if this is the query where it hangs. Are the
> indexes all there? There should be a primary key index on
> taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on  
> taxon_name
> over (taxon_id,name,name_class). Also, there should be separate  
> indexes
> on taxon_name.taxon_id and taxon_name.name. Are they all there? If you
> reinstantiated the schema from the DDL then it seems unlikely that
> somehow the indexes have vanished except if you messed with the schema
> or the DDL.

I looked in the mailing list archives and Barry mentions something here:

http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html

He rebuilt the database from scratch and got it working; no reason  
was given.  I wouldn't be surprised if it is something Mysql-related  
that pops up.  The strange thing is that only a few months ago  
everything ran well with this version of MySQL (v.5); this was with  
the first test database I installed on it.  Another strange thing (I  
think I mentioned it) is that NOT loading the taxonomy with  
load_ncbi_taxonomy.pl worked (everything was entered).  I'll try  
rebuilding the database from scratch to see what happens.  I am  
running this on Windows, so this is new territory...

> Putting an index on taxon_name.name_class really can't make sense, so
> let's assume it can't be that.
>
> So really I suspect this has something to do with the state of the
> database and the version of MySQL. In particular, from some 4.x  
> version
> of MySQL under certain circumstances you have to analyze the  
> statistics
> of the tables in order to get the optimizer pick up the indexes
> properly. Are you on MySQL 4.x and if so, have you done that?
>
> There's the ANALYZE TABLE command:
> http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html
>
> Note the comment: "This statement works with MyISAM, BDB, and (as of
> MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?
>
> Also, you can check the execution plan for the query using EXPLAIN.
> http://dev.mysql.com/doc/refman/4.1/en/explain.html
>
> This should show you whether the index would be picked up for the  
> query
> or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to
> the db using the mysql shell (mysql).

I'll give these a shot and post what I find in the next few days.

> I believe something similarly strange was encountered by someone using
> DB::GFF (or Chado) under MySQL, and if I recall correctly the solution
> was to optimize (analyze) the tables. Maybe someone who was in that
> thread reads this and can comment?
>
> 	-hilmar

I wanted to also mention that we shouldn't check in the modifications  
to Bio::Root:Root until I confirm something (I'm at home and  
currently can't).  I tried running a script on an unrelated module  
using the modified Bio::Root::Roo (with the commas added after the  
'throw $class' statements.  Everything worked for $self->throw(),  
except the thrown message wasn't displayed.  I'll dig into it a bit  
more to see what happens.

>
>
>>
>> --------------------------------------------------------------------- 
>> --
>> -----
>> -------------------------
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> -- 
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Thu Feb 16 01:31:54 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 22:31:54 -0800
Subject: [BioSQL-l] [Bioperl-l] Added 'Installing bioperl-db in Windows'
	to wiki, problems with bioperl-db
In-Reply-To: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	<c0a15fab7d1cbd8b6bc0554e7d9dd45b@gmx.net>
	<12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
Message-ID: <cdf04e5fcb1471e4de168a73cc24ae88@gmx.net>


On Feb 15, 2006, at 7:56 PM, Chris Fields wrote:

> [...]
> I looked in the mailing list archives and Barry mentions something 
> here:
>
> http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html
>
> He rebuilt the database from scratch and got it working; no reason
> was given.  I wouldn't be surprised if it is something Mysql-related
> that pops up.

Note though that he was using PostgreSQL. With Pg you definitely need 
to 'vacuum,' which is their name for analyzing/optimizing the table(s).

>   The strange thing is that only a few months ago
> everything ran well with this version of MySQL (v.5); this was with
> the first test database I installed on it.  Another strange thing (I
> think I mentioned it) is that NOT loading the taxonomy with
> load_ncbi_taxonomy.pl worked (everything was entered).

That's not really strange, it is in fact consistent with the query you 
report as taking a long time. If you don't pre-load the taxonomy then 
the taxon and taxon_name tables are empty or almost empty and look-ups 
and joins of empty tables are amazingly fast :-J

[...]
> I wanted to also mention that we shouldn't check in the modifications
> to Bio::Root:Root until I confirm something (I'm at home and
> currently can't).

OK we'll hold off.

	-hilmar
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------


From cjfields at uiuc.edu  Wed Feb 22 00:13:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 23:13:18 -0600
Subject: [BioSQL-l] removing sequences from a database?
Message-ID: <000001c6376e$b113c170$15327e82@pyrimidine>

I think this has been posed once but I couldn't find a straight answer on
the mailing list; is there a way to remove sequences in a BioSQL database
using bioperl-db?  This is the last I heard about it:

http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hlapp at gmx.net  Wed Feb 22 00:20:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 21:20:05 -0800
Subject: [BioSQL-l] [Bioperl-l] removing sequences from a database?
In-Reply-To: <000001c6376e$b113c170$15327e82@pyrimidine>
References: <000001c6376e$b113c170$15327e82@pyrimidine>
Message-ID: <aea845b90602212120w73c7740en7a199b2bd435cab3@mail.gmail.com>

This is a pretty old posting :-) Sure you can remove sequences. In
fact you can remove any persistent object by calling $pobj->remove().
I.e., for a persistent sequence (which is what you get from the
adaptors): $pseq->remove()

Do not forget to call commit() on the persistence adaptor or the
persistent object itself or otherwise the operation is rolled back
when you disconnect.

BTW there are examples for objects other than the sequence object
itself (say you want to remove only the features) in the
scripts/biosql directory; some of the --mergeobjs closure examples do
this.

    -hilmar

On 2/21/06, Chris Fields <cjfields at uiuc.edu> wrote:
> I think this has been posed once but I couldn't find a straight answer on
> the mailing list; is there a way to remove sequences in a BioSQL database
> using bioperl-db?  This is the last I heard about it:
>
> http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------


From mjcipriano at lbl.gov  Wed Feb 22 19:58:45 2006
From: mjcipriano at lbl.gov (Michael Cipriano)
Date: Wed, 22 Feb 2006 16:58:45 -0800
Subject: [BioSQL-l] Load seqfeature from biosql database with perl
Message-ID: <1140656325.2888.13.camel@alien>

Hello BioSQLers,

I have a simple question (I hope), Can I easily load a seqfeature from a
biosql database into a perl Bio::SeqFeatureI object?  I have the
database value for the  seqfeature.seqfeature_id and would like to load
it using this alone.

I do not want to have to load the whole bioentry object then search for
the feature, I just want the feature object since the bioentry is a
whole genome and loading that will take more time then necessary. 

I have searched the documentation and have even tried looking through
the code for the modules, but could not find an easy fast method.

Please reply directly to me as well as the list as I am not a list
member.

Thanks for your help,


Michael Cipriano


From mjcipriano at lbl.gov  Thu Feb 23 20:29:21 2006
From: mjcipriano at lbl.gov (Michael Cipriano)
Date: Thu, 23 Feb 2006 17:29:21 -0800
Subject: [BioSQL-l] Load seqfeature from biosql database with perl
In-Reply-To: <1140656325.2888.13.camel@alien>
References: <1140656325.2888.13.camel@alien>
Message-ID: <1140744561.2888.19.camel@alien>

Ah, I think I figured it out.

my $seqfeature_id = '401138';
my $adaptor = $seqdb_obj->get_object_adaptor("Bio::SeqFeatureI");

my $query = Bio::DB::Query::BioQuery->new(

-datacollections=>["Bio::SeqFeatureI t1"],
                                        -where => ["t1.Bio::SeqFeatureI
= ?"]);

my $qres = $adaptor->find_by_query($query,      -name=>'FIND FEATURE BY
SEQ',

-values=>[$seqfeature_id]);

while(my $loc = $qres->next_object())
{
        my $obj = $loc;

        print $obj->primary_key() . "\n";
        print 'location:' . $obj->location->to_FTstring() . "\n";
        $obj->add_tag_value("test", "moretest");
        foreach my $tag ($obj->get_all_tags())
        {
                print " Values for tag $tag: ";
                print join(' ',$obj->get_tag_values($tag));
                print "\n";
        }
        print "------------------\n";

}


This seems to work
On Wed, 2006-02-22 at 16:58 -0800, Michael Cipriano wrote:
> Hello BioSQLers,
> 
> I have a simple question (I hope), Can I easily load a seqfeature from a
> biosql database into a perl Bio::SeqFeatureI object?  I have the
> database value for the  seqfeature.seqfeature_id and would like to load
> it using this alone.
> 
> I do not want to have to load the whole bioentry object then search for
> the feature, I just want the feature object since the bioentry is a
> whole genome and loading that will take more time then necessary. 
> 
> I have searched the documentation and have even tried looking through
> the code for the modules, but could not find an easy fast method.
> 
> Please reply directly to me as well as the list as I am not a list
> member.
> 
> Thanks for your help,
> 
> 
> Michael Cipriano
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l


From hlapp at gnf.org  Thu Feb 23 21:10:13 2006
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu, 23 Feb 2006 18:10:13 -0800
Subject: [BioSQL-l] Load seqfeature from biosql database with perl
In-Reply-To: <1140744561.2888.19.camel@alien>
Message-ID: <C023AB05.71A2%hlapp@gnf.org>

Yes, kudos to you for figuring this out yourself, and you actually figured
out the more difficult way. I apologize for my delay in responding, I was
tied up this morning and last night.

You got the first key step right, namely obtaining the right persistence
adaptor. This step determines which object you get back.

Your query will work, and in fact will be equally fast as the simple
solution (which is simple only because it is simpler to code, not because
the internally executed query is simpler). The simple solution is that every
Bio::DB::PersistenceAdaptorI implementing object (i.e., any object you get
back from $db->get_object_adaptor(..)) has a method
$adp->find_by_primary_key(). So, using that method:

    $feature = $adaptor->find_by_primary_key($seqfeature_id);

You can also control the type of object to be created (so long as it is a
Bio::SeqFeatureI) by passing in an object factory in addition.

BTW as an aside, using the finder method will also make the object cache
used for lookup first if the cache is enabled. It doesn't matter for seq
features because due to the potentially large number of objects the cache is
not enabled by default for this adaptor.

    -hilmar  

On 2/23/06 5:29 PM, "Michael Cipriano" <mjcipriano at lbl.gov> wrote:

> Ah, I think I figured it out.
> 
> my $seqfeature_id = '401138';
> my $adaptor = $seqdb_obj->get_object_adaptor("Bio::SeqFeatureI");
> 
> my $query = Bio::DB::Query::BioQuery->new(
> 
> -datacollections=>["Bio::SeqFeatureI t1"],
>                                         -where => ["t1.Bio::SeqFeatureI
> = ?"]);
> 
> my $qres = $adaptor->find_by_query($query,      -name=>'FIND FEATURE BY
> SEQ',
> 
> -values=>[$seqfeature_id]);
> 
> while(my $loc = $qres->next_object())
> {
>         my $obj = $loc;
> 
>         print $obj->primary_key() . "\n";
>         print 'location:' . $obj->location->to_FTstring() . "\n";
>         $obj->add_tag_value("test", "moretest");
>         foreach my $tag ($obj->get_all_tags())
>         {
>                 print " Values for tag $tag: ";
>                 print join(' ',$obj->get_tag_values($tag));
>                 print "\n";
>         }
>         print "------------------\n";
> 
> }
> 
> 
> 
> This seems to work
> On Wed, 2006-02-22 at 16:58 -0800, Michael Cipriano wrote:
>> Hello BioSQLers,
>> 
>> I have a simple question (I hope), Can I easily load a seqfeature from a
>> biosql database into a perl Bio::SeqFeatureI object?  I have the
>> database value for the  seqfeature.seqfeature_id and would like to load
>> it using this alone.
>> 
>> I do not want to have to load the whole bioentry object then search for
>> the feature, I just want the feature object since the bioentry is a
>> whole genome and loading that will take more time then necessary.
>> 
>> I have searched the documentation and have even tried looking through
>> the code for the modules, but could not find an easy fast method.
>> 
>> Please reply directly to me as well as the list as I am not a list
>> member.
>> 
>> Thanks for your help,
>> 
>> 
>> Michael Cipriano
>> 
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Mon Feb  6 18:29:38 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 6 Feb 2006 10:29:38 -0800
Subject: [BioSQL-l] list resumes
Message-ID: <25f1da01ac9f022027fe9127892934aa@gmx.net>

Hi all, if you receive this email it means that the Biosql list is 
working again at its previous address. It somehow got lost in the 
migration to a new server and finally Chris Dagdigian came back from 
vacation and brought it back on-line.

Please repost any emails you may have sent to the list over the past 
week or two (they should have bounced), and I apologize for the 
inconvenience.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From Doris.Siegl at fh-hagenberg.at  Wed Feb  8 13:57:03 2006
From: Doris.Siegl at fh-hagenberg.at (Siegl Doris)
Date: Wed, 8 Feb 2006 14:57:03 +0100
Subject: [BioSQL-l] OBDA abbreviation
Message-ID: <532D7AB7D5A2A34EAD3A39A7B650E8773DE48C@postfux.fhs-hagenberg.ac.at>

Dear ng,

I was wondering whether the abbreviation OBDA stands for "Open
Bioinformatics Database Access" or "Open Bioinformatics Data Access". I
could not find any information about this issue on the OBDA homepage.
Could anybody clear this up for me?


Thanks,
Doris


From torsten.seemann at infotech.monash.edu.au  Wed Feb  8 17:52:52 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 09 Feb 2006 04:52:52 +1100
Subject: [BioSQL-l] OBDA abbreviation
In-Reply-To: <532D7AB7D5A2A34EAD3A39A7B650E8773DE48C@postfux.fhs-hagenberg.ac.at>
References: <532D7AB7D5A2A34EAD3A39A7B650E8773DE48C@postfux.fhs-hagenberg.ac.at>
Message-ID: <43EA2FF4.4030302@infotech.monash.edu.au>

> I was wondering whether the abbreviation OBDA stands for "Open
> Bioinformatics Database Access" or "Open Bioinformatics Data Access". I
> could not find any information about this issue on the OBDA homepage.
> Could anybody clear this up for me?

You are right in that it is not defined on the OBDA homepage (that I 
could find), but according to the BioPerl "OBDA" HOWTO at 	 
http://bioperl.open-bio.org/wiki/HOWTO:OBDA
it is "Open Biological Database Access" which I have now made clear in 
the Bioperl WIKI entry for OBDA at http://bioperl.open-bio.org/wiki/OBDA

Some acronyms have a life of their own! :-)

--Torsten Seemann


From hlapp at gnf.org  Fri Feb 10 00:13:31 2006
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu, 09 Feb 2006 16:13:31 -0800
Subject: [BioSQL-l] OBDA abbreviation
In-Reply-To: <532D7AB7D5A2A34EAD3A39A7B650E8773DE48C@postfux.fhs-hagenberg.ac.at>
Message-ID: <C0111AAB.6D35%hlapp@gnf.org>

I believe somebody answered this already, but just in case, it stands for
Open Bioinformatics Database Access.

    -hilmar


On 2/8/06 5:57 AM, "Siegl Doris" <Doris.Siegl at fh-hagenberg.at> wrote:

> Dear ng,
> 
> I was wondering whether the abbreviation OBDA stands for "Open
> Bioinformatics Database Access" or "Open Bioinformatics Data Access". I
> could not find any information about this issue on the OBDA homepage.
> Could anybody clear this up for me?
> 
> 
> Thanks,
> Doris
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From cjfields at uiuc.edu  Tue Feb 14 20:32:42 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 14 Feb 2006 14:32:42 -0600
Subject: [BioSQL-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
Message-ID: <001201c631a5$ce7496f0$15327e82@pyrimidine>

Hilmar, 

Good News: I've added a section to the bioperl wiki on installing bioperl-db
in Windows:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Installing_bioperl
-db

Bad News:  There's a new problem now. I updated from CVS yesterday; I walked
through the steps and ran 'nmake test', with everything passing fine.
However, load_seqdatabase.pl is extremely slow; it's loading a sequence
every 5 minutes or so.  I noticed (when using '-debug') that it is hanging
up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a database,
load the biosql schema, and load sequences w/o loading taxonomy, the problem
goes away.

Here's the debugging output (I cut it off at the point it hangs up):
----------------------------------------------------------------------------
-------------------------
C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -driver
mysql -namespace test -dbname biosql -dbuser root -dbpass ********** -format
genbank  -debug NP_252217.gpt
Loading NP_252217.gpt ...
attempting to load adaptor class for Bio::Seq::RichSeq
        attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
        attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
        attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Annotation::Collection
        attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::Root::Root
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
        attempting to load module Bio::DB::BioSQL::RootIAdaptor
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionIAdaptor
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
        attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
        attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
        attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
        attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
        attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
        attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
        attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
        attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
        attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
        attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
        attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
        attempting to load module Bio::DB::BioSQL::LocationIAdaptor
        attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
        attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer
for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "test" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES
(?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "test" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id =
?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)  
----------------------------------------------------------------------------
-------------------------

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hlapp at gmx.net  Thu Feb 16 01:54:01 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 17:54:01 -0800
Subject: [BioSQL-l] [Bioperl-l] Added 'Installing bioperl-db in Windows'
	to wiki, problems with bioperl-db
In-Reply-To: <001201c631a5$ce7496f0$15327e82@pyrimidine>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
Message-ID: <c0a15fab7d1cbd8b6bc0554e7d9dd45b@gmx.net>


On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:

> Hilmar,
>
> Good News: I've added a section to the bioperl wiki on installing  
> bioperl-db
> in Windows:
>
> http://www.bioperl.org/wiki/ 
> Installing_Bioperl_on_Windows#Installing_bioperl
> -db
>
> Bad News:  There's a new problem now. I updated from CVS yesterday; I  
> walked
> through the steps and ran 'nmake test', with everything passing fine.
> However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> every 5 minutes or so.  I noticed (when using '-debug') that it is  
> hanging
> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a  
> database,
> load the biosql schema, and load sequences w/o loading taxonomy, the  
> problem
> goes away.
>
> Here's the debugging output (I cut it off at the point it hangs up):
> [...]

> preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND  
> ncbi_taxon_id =
> ?
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)

I'm a bit surprised if this is the query where it hangs. Are the  
indexes all there? There should be a primary key index on  
taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name  
over (taxon_id,name,name_class). Also, there should be separate indexes  
on taxon_name.taxon_id and taxon_name.name. Are they all there? If you  
reinstantiated the schema from the DDL then it seems unlikely that  
somehow the indexes have vanished except if you messed with the schema  
or the DDL.

Putting an index on taxon_name.name_class really can't make sense, so  
let's assume it can't be that.

So really I suspect this has something to do with the state of the  
database and the version of MySQL. In particular, from some 4.x version  
of MySQL under certain circumstances you have to analyze the statistics  
of the tables in order to get the optimizer pick up the indexes  
properly. Are you on MySQL 4.x and if so, have you done that?

There's the ANALYZE TABLE command:
http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html

Note the comment: "This statement works with MyISAM, BDB, and (as of  
MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?

Also, you can check the execution plan for the query using EXPLAIN.
http://dev.mysql.com/doc/refman/4.1/en/explain.html

This should show you whether the index would be picked up for the query  
or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to  
the db using the mysql shell (mysql).

I believe something similarly strange was encountered by someone using  
DB::GFF (or Chado) under MySQL, and if I recall correctly the solution  
was to optimize (analyze) the tables. Maybe someone who was in that  
thread reads this and can comment?

	-hilmar


>
> ----------------------------------------------------------------------- 
> -----
> -------------------------
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------


From cjfields at uiuc.edu  Thu Feb 16 03:56:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 21:56:14 -0600
Subject: [BioSQL-l] [Bioperl-l] Added 'Installing bioperl-db in Windows'
	to wiki, problems with bioperl-db
In-Reply-To: <c0a15fab7d1cbd8b6bc0554e7d9dd45b@gmx.net>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	<c0a15fab7d1cbd8b6bc0554e7d9dd45b@gmx.net>
Message-ID: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>


On Feb 15, 2006, at 7:54 PM, Hilmar Lapp wrote:

>
> On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:
>
>> Hilmar,
>>
>> Good News: I've added a section to the bioperl wiki on installing
>> bioperl-db
>> in Windows:
>>
>> http://www.bioperl.org/wiki/
>> Installing_Bioperl_on_Windows#Installing_bioperl
>> -db
>>
>> Bad News:  There's a new problem now. I updated from CVS yesterday; I
>> walked
>> through the steps and ran 'nmake test', with everything passing fine.
>> However, load_seqdatabase.pl is extremely slow; it's loading a  
>> sequence
>> every 5 minutes or so.  I noticed (when using '-debug') that it is
>> hanging
>> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a
>> database,
>> load the biosql schema, and load sequences w/o loading taxonomy, the
>> problem
>> goes away.
>>
>> Here's the debugging output (I cut it off at the point it hangs up):
>> [...]
>
>> preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
>> NULL,
>> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name  
>> WHERE
>> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND
>> ncbi_taxon_id =
>> ?
>> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
>> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)
>
> I'm a bit surprised if this is the query where it hangs. Are the
> indexes all there? There should be a primary key index on
> taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on  
> taxon_name
> over (taxon_id,name,name_class). Also, there should be separate  
> indexes
> on taxon_name.taxon_id and taxon_name.name. Are they all there? If you
> reinstantiated the schema from the DDL then it seems unlikely that
> somehow the indexes have vanished except if you messed with the schema
> or the DDL.

I looked in the mailing list archives and Barry mentions something here:

http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html

He rebuilt the database from scratch and got it working; no reason  
was given.  I wouldn't be surprised if it is something Mysql-related  
that pops up.  The strange thing is that only a few months ago  
everything ran well with this version of MySQL (v.5); this was with  
the first test database I installed on it.  Another strange thing (I  
think I mentioned it) is that NOT loading the taxonomy with  
load_ncbi_taxonomy.pl worked (everything was entered).  I'll try  
rebuilding the database from scratch to see what happens.  I am  
running this on Windows, so this is new territory...

> Putting an index on taxon_name.name_class really can't make sense, so
> let's assume it can't be that.
>
> So really I suspect this has something to do with the state of the
> database and the version of MySQL. In particular, from some 4.x  
> version
> of MySQL under certain circumstances you have to analyze the  
> statistics
> of the tables in order to get the optimizer pick up the indexes
> properly. Are you on MySQL 4.x and if so, have you done that?
>
> There's the ANALYZE TABLE command:
> http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html
>
> Note the comment: "This statement works with MyISAM, BDB, and (as of
> MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?
>
> Also, you can check the execution plan for the query using EXPLAIN.
> http://dev.mysql.com/doc/refman/4.1/en/explain.html
>
> This should show you whether the index would be picked up for the  
> query
> or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to
> the db using the mysql shell (mysql).

I'll give these a shot and post what I find in the next few days.

> I believe something similarly strange was encountered by someone using
> DB::GFF (or Chado) under MySQL, and if I recall correctly the solution
> was to optimize (analyze) the tables. Maybe someone who was in that
> thread reads this and can comment?
>
> 	-hilmar

I wanted to also mention that we shouldn't check in the modifications  
to Bio::Root:Root until I confirm something (I'm at home and  
currently can't).  I tried running a script on an unrelated module  
using the modified Bio::Root::Roo (with the commas added after the  
'throw $class' statements.  Everything worked for $self->throw(),  
except the thrown message wasn't displayed.  I'll dig into it a bit  
more to see what happens.

>
>
>>
>> --------------------------------------------------------------------- 
>> --
>> -----
>> -------------------------
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> -- 
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Thu Feb 16 06:31:54 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 22:31:54 -0800
Subject: [BioSQL-l] [Bioperl-l] Added 'Installing bioperl-db in Windows'
	to wiki, problems with bioperl-db
In-Reply-To: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	<c0a15fab7d1cbd8b6bc0554e7d9dd45b@gmx.net>
	<12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
Message-ID: <cdf04e5fcb1471e4de168a73cc24ae88@gmx.net>


On Feb 15, 2006, at 7:56 PM, Chris Fields wrote:

> [...]
> I looked in the mailing list archives and Barry mentions something 
> here:
>
> http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html
>
> He rebuilt the database from scratch and got it working; no reason
> was given.  I wouldn't be surprised if it is something Mysql-related
> that pops up.

Note though that he was using PostgreSQL. With Pg you definitely need 
to 'vacuum,' which is their name for analyzing/optimizing the table(s).

>   The strange thing is that only a few months ago
> everything ran well with this version of MySQL (v.5); this was with
> the first test database I installed on it.  Another strange thing (I
> think I mentioned it) is that NOT loading the taxonomy with
> load_ncbi_taxonomy.pl worked (everything was entered).

That's not really strange, it is in fact consistent with the query you 
report as taking a long time. If you don't pre-load the taxonomy then 
the taxon and taxon_name tables are empty or almost empty and look-ups 
and joins of empty tables are amazingly fast :-J

[...]
> I wanted to also mention that we shouldn't check in the modifications
> to Bio::Root:Root until I confirm something (I'm at home and
> currently can't).

OK we'll hold off.

	-hilmar
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------


From cjfields at uiuc.edu  Wed Feb 22 05:13:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 23:13:18 -0600
Subject: [BioSQL-l] removing sequences from a database?
Message-ID: <000001c6376e$b113c170$15327e82@pyrimidine>

I think this has been posed once but I couldn't find a straight answer on
the mailing list; is there a way to remove sequences in a BioSQL database
using bioperl-db?  This is the last I heard about it:

http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hlapp at gmx.net  Wed Feb 22 05:20:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 21:20:05 -0800
Subject: [BioSQL-l] [Bioperl-l] removing sequences from a database?
In-Reply-To: <000001c6376e$b113c170$15327e82@pyrimidine>
References: <000001c6376e$b113c170$15327e82@pyrimidine>
Message-ID: <aea845b90602212120w73c7740en7a199b2bd435cab3@mail.gmail.com>

This is a pretty old posting :-) Sure you can remove sequences. In
fact you can remove any persistent object by calling $pobj->remove().
I.e., for a persistent sequence (which is what you get from the
adaptors): $pseq->remove()

Do not forget to call commit() on the persistence adaptor or the
persistent object itself or otherwise the operation is rolled back
when you disconnect.

BTW there are examples for objects other than the sequence object
itself (say you want to remove only the features) in the
scripts/biosql directory; some of the --mergeobjs closure examples do
this.

    -hilmar

On 2/21/06, Chris Fields <cjfields at uiuc.edu> wrote:
> I think this has been posed once but I couldn't find a straight answer on
> the mailing list; is there a way to remove sequences in a BioSQL database
> using bioperl-db?  This is the last I heard about it:
>
> http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------


From mjcipriano at lbl.gov  Thu Feb 23 00:58:45 2006
From: mjcipriano at lbl.gov (Michael Cipriano)
Date: Wed, 22 Feb 2006 16:58:45 -0800
Subject: [BioSQL-l] Load seqfeature from biosql database with perl
Message-ID: <1140656325.2888.13.camel@alien>

Hello BioSQLers,

I have a simple question (I hope), Can I easily load a seqfeature from a
biosql database into a perl Bio::SeqFeatureI object?  I have the
database value for the  seqfeature.seqfeature_id and would like to load
it using this alone.

I do not want to have to load the whole bioentry object then search for
the feature, I just want the feature object since the bioentry is a
whole genome and loading that will take more time then necessary. 

I have searched the documentation and have even tried looking through
the code for the modules, but could not find an easy fast method.

Please reply directly to me as well as the list as I am not a list
member.

Thanks for your help,


Michael Cipriano


From mjcipriano at lbl.gov  Fri Feb 24 01:29:21 2006
From: mjcipriano at lbl.gov (Michael Cipriano)
Date: Thu, 23 Feb 2006 17:29:21 -0800
Subject: [BioSQL-l] Load seqfeature from biosql database with perl
In-Reply-To: <1140656325.2888.13.camel@alien>
References: <1140656325.2888.13.camel@alien>
Message-ID: <1140744561.2888.19.camel@alien>

Ah, I think I figured it out.

my $seqfeature_id = '401138';
my $adaptor = $seqdb_obj->get_object_adaptor("Bio::SeqFeatureI");

my $query = Bio::DB::Query::BioQuery->new(

-datacollections=>["Bio::SeqFeatureI t1"],
                                        -where => ["t1.Bio::SeqFeatureI
= ?"]);

my $qres = $adaptor->find_by_query($query,      -name=>'FIND FEATURE BY
SEQ',

-values=>[$seqfeature_id]);

while(my $loc = $qres->next_object())
{
        my $obj = $loc;

        print $obj->primary_key() . "\n";
        print 'location:' . $obj->location->to_FTstring() . "\n";
        $obj->add_tag_value("test", "moretest");
        foreach my $tag ($obj->get_all_tags())
        {
                print " Values for tag $tag: ";
                print join(' ',$obj->get_tag_values($tag));
                print "\n";
        }
        print "------------------\n";

}


This seems to work
On Wed, 2006-02-22 at 16:58 -0800, Michael Cipriano wrote:
> Hello BioSQLers,
> 
> I have a simple question (I hope), Can I easily load a seqfeature from a
> biosql database into a perl Bio::SeqFeatureI object?  I have the
> database value for the  seqfeature.seqfeature_id and would like to load
> it using this alone.
> 
> I do not want to have to load the whole bioentry object then search for
> the feature, I just want the feature object since the bioentry is a
> whole genome and loading that will take more time then necessary. 
> 
> I have searched the documentation and have even tried looking through
> the code for the modules, but could not find an easy fast method.
> 
> Please reply directly to me as well as the list as I am not a list
> member.
> 
> Thanks for your help,
> 
> 
> Michael Cipriano
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l


From hlapp at gnf.org  Fri Feb 24 02:10:13 2006
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu, 23 Feb 2006 18:10:13 -0800
Subject: [BioSQL-l] Load seqfeature from biosql database with perl
In-Reply-To: <1140744561.2888.19.camel@alien>
Message-ID: <C023AB05.71A2%hlapp@gnf.org>

Yes, kudos to you for figuring this out yourself, and you actually figured
out the more difficult way. I apologize for my delay in responding, I was
tied up this morning and last night.

You got the first key step right, namely obtaining the right persistence
adaptor. This step determines which object you get back.

Your query will work, and in fact will be equally fast as the simple
solution (which is simple only because it is simpler to code, not because
the internally executed query is simpler). The simple solution is that every
Bio::DB::PersistenceAdaptorI implementing object (i.e., any object you get
back from $db->get_object_adaptor(..)) has a method
$adp->find_by_primary_key(). So, using that method:

    $feature = $adaptor->find_by_primary_key($seqfeature_id);

You can also control the type of object to be created (so long as it is a
Bio::SeqFeatureI) by passing in an object factory in addition.

BTW as an aside, using the finder method will also make the object cache
used for lookup first if the cache is enabled. It doesn't matter for seq
features because due to the potentially large number of objects the cache is
not enabled by default for this adaptor.

    -hilmar  

On 2/23/06 5:29 PM, "Michael Cipriano" <mjcipriano at lbl.gov> wrote:

> Ah, I think I figured it out.
> 
> my $seqfeature_id = '401138';
> my $adaptor = $seqdb_obj->get_object_adaptor("Bio::SeqFeatureI");
> 
> my $query = Bio::DB::Query::BioQuery->new(
> 
> -datacollections=>["Bio::SeqFeatureI t1"],
>                                         -where => ["t1.Bio::SeqFeatureI
> = ?"]);
> 
> my $qres = $adaptor->find_by_query($query,      -name=>'FIND FEATURE BY
> SEQ',
> 
> -values=>[$seqfeature_id]);
> 
> while(my $loc = $qres->next_object())
> {
>         my $obj = $loc;
> 
>         print $obj->primary_key() . "\n";
>         print 'location:' . $obj->location->to_FTstring() . "\n";
>         $obj->add_tag_value("test", "moretest");
>         foreach my $tag ($obj->get_all_tags())
>         {
>                 print " Values for tag $tag: ";
>                 print join(' ',$obj->get_tag_values($tag));
>                 print "\n";
>         }
>         print "------------------\n";
> 
> }
> 
> 
> 
> This seems to work
> On Wed, 2006-02-22 at 16:58 -0800, Michael Cipriano wrote:
>> Hello BioSQLers,
>> 
>> I have a simple question (I hope), Can I easily load a seqfeature from a
>> biosql database into a perl Bio::SeqFeatureI object?  I have the
>> database value for the  seqfeature.seqfeature_id and would like to load
>> it using this alone.
>> 
>> I do not want to have to load the whole bioentry object then search for
>> the feature, I just want the feature object since the bioentry is a
>> whole genome and loading that will take more time then necessary.
>> 
>> I have searched the documentation and have even tried looking through
>> the code for the modules, but could not find an easy fast method.
>> 
>> Please reply directly to me as well as the list as I am not a list
>> member.
>> 
>> Thanks for your help,
>> 
>> 
>> Michael Cipriano
>> 
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------