From indapa at gmail.com  Mon Oct  3 13:59:27 2005
From: indapa at gmail.com (Amit Indap)
Date: Mon Oct  3 18:53:45 2005
Subject: [BioSQL-l] removing features from a sequence
Message-ID: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com>

Hi,

I was trying to remove features for sequences stored in my BioSQL
database. Once I run the code snippet below to remove sequence
features, I tested to see if the features really had been removed by
running a script that reterieves seq features from bioentries.
Unfortunately, the features are still there. I'm still learning my
around the Bio::DB API

Here is my code to attempts to remove sequence features:

foreach  (@accs) {

    my $acc = $_;
    my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

    my $seq = Bio::Seq->new(-accession_number => $acc,
			    -namespace => $namespace
			    );


    my $dbseq = $adp->find_by_unique_key($seq);
    warn $acc, "  not found in database $namespace" unless  $dbseq;

    $dbseq->remove_SeqFeatures(); # remove seqfeatures

    $dbseq->commit();
    print LOG "removed all seq features for $acc\n";
}


--
Amit Indap
http://www.bscb.cornell.edu/Homepages/Amit_Indap/

From hlapp at gnf.org  Mon Oct  3 19:59:17 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Oct  3 20:39:48 2005
Subject: [BioSQL-l] removing features from a sequence
In-Reply-To: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com>
References: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com>
Message-ID: <fa0620baa88036e4880444d4c4a89298@gnf.org>

Yeah I guess this is one of the gotcha's that deserve better 
documentation. The bioperl-db adaptors will not automatically 'sync' 
the database with an object. The reason is that there are too many 
flavors of what you could possibly want as a user, so instead of making 
decisions for you you need to make explicit what you want; however, 
doing so should be reasonably simple.

So here's what's going on and how you can fix it.

First, $dbseq->remove_SeqFeatures() is a Bio::SeqI method present for 
all SeqI objects, not just persistent ones. Except in a few cases where 
lazy loading is already implemented, methods from the native Bioperl 
API are not overridden for persistent objects; i.e., you can manipulate 
your persistent object to your heart's content and nothing will happen 
to the respective row(s) in the database. You need to say 
$dbseq->store() to let your changes take effect. But see below!

Second, $dbseq->store() will only store it; i.e., it will update the 
object and either update or insert all attached objects (like features, 
annotations, etc). If you want to delete attached objects then you need 
to do so explicitly by calling $pobj->remove().

For example, in your case:

	# ... find $dbseq ...
	# delete all features from the database
	# Note: I could use $dbseq->get_SeqFeatures() if I
	# wanted to keep the features on the in-memory object
	foreach my $pfeat ($dbseq->remove_SeqFeatures()) {
		$pfeat->remove();
	}
	# now $dbseq and the object in the db don't have features

Same thing for annotation. You can check out some of the sample closure 
implementations for merging objects provided in the scripts/biosql 
directory of bioperl-db, for instance freshen-annot.pl deletes all 
annotation (in the db) from the existing object.

Hth,

	-hilmar

On Oct 3, 2005, at 10:59 AM, Amit Indap wrote:

> Hi,
>
> I was trying to remove features for sequences stored in my BioSQL
> database. Once I run the code snippet below to remove sequence
> features, I tested to see if the features really had been removed by
> running a script that reterieves seq features from bioentries.
> Unfortunately, the features are still there. I'm still learning my
> around the Bio::DB API
>
> Here is my code to attempts to remove sequence features:
>
> foreach  (@accs) {
>
>     my $acc = $_;
>     my $adp = $dbadp->get_object_adaptor("Bio::SeqI");
>
>     my $seq = Bio::Seq->new(-accession_number => $acc,
> 			    -namespace => $namespace
> 			    );
>
>
>     my $dbseq = $adp->find_by_unique_key($seq);
>     warn $acc, "  not found in database $namespace" unless  $dbseq;
>
>     $dbseq->remove_SeqFeatures(); # remove seqfeatures
>
>     $dbseq->commit();
>     print LOG "removed all seq features for $acc\n";
> }
>
>
>
> --
> Amit Indap
> http://www.bscb.cornell.edu/Homepages/Amit_Indap/
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From indapa at gmail.com  Tue Oct  4 14:02:24 2005
From: indapa at gmail.com (Amit Indap)
Date: Tue Oct  4 14:02:10 2005
Subject: [BioSQL-l] removing features from a sequence
In-Reply-To: <fa0620baa88036e4880444d4c4a89298@gnf.org>
References: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com>
	<fa0620baa88036e4880444d4c4a89298@gnf.org>
Message-ID: <3cfaa4040510041102l77fbc6d6v28968a13a9a4a515@mail.gmail.com>

Thanks Hilmar, this snippet worked:

foreach my $pfeat( $dbseq->remove_SeqFeatures() ) {

    $pfeat->remove();

}
$dbseq->store();
$dbseq->commit();

> Second, $dbseq->store() will only store it; i.e., it will update the
> object and either update or insert all attached objects (like features,
> annotations, etc). If you want to delete attached objects then you need
> to do so explicitly by calling $pobj->remove().

Thanks for clarifying this point. I was a bit confused about whether I
needed to call store or remove. But you are right, I need to
explicitly call remove.

On 10/3/05, Hilmar Lapp <hlapp@gnf.org> wrote:
> Yeah I guess this is one of the gotcha's that deserve better
> documentation. The bioperl-db adaptors will not automatically 'sync'
> the database with an object. The reason is that there are too many
> flavors of what you could possibly want as a user, so instead of making
> decisions for you you need to make explicit what you want; however,
> doing so should be reasonably simple.
>
> So here's what's going on and how you can fix it.
>
> First, $dbseq->remove_SeqFeatures() is a Bio::SeqI method present for
> all SeqI objects, not just persistent ones. Except in a few cases where
> lazy loading is already implemented, methods from the native Bioperl
> API are not overridden for persistent objects; i.e., you can manipulate
> your persistent object to your heart's content and nothing will happen
> to the respective row(s) in the database. You need to say
> $dbseq->store() to let your changes take effect. But see below!
>
> Second, $dbseq->store() will only store it; i.e., it will update the
> object and either update or insert all attached objects (like features,
> annotations, etc). If you want to delete attached objects then you need
> to do so explicitly by calling $pobj->remove().
>
> For example, in your case:
>
>         # ... find $dbseq ...
>         # delete all features from the database
>         # Note: I could use $dbseq->get_SeqFeatures() if I
>         # wanted to keep the features on the in-memory object
>         foreach my $pfeat ($dbseq->remove_SeqFeatures()) {
>                 $pfeat->remove();
>         }
>         # now $dbseq and the object in the db don't have features
>
> Same thing for annotation. You can check out some of the sample closure
> implementations for merging objects provided in the scripts/biosql
> directory of bioperl-db, for instance freshen-annot.pl deletes all
> annotation (in the db) from the existing object.
>
> Hth,
>
>         -hilmar
>
> On Oct 3, 2005, at 10:59 AM, Amit Indap wrote:
>
> > Hi,
> >
> > I was trying to remove features for sequences stored in my BioSQL
> > database. Once I run the code snippet below to remove sequence
> > features, I tested to see if the features really had been removed by
> > running a script that reterieves seq features from bioentries.
> > Unfortunately, the features are still there. I'm still learning my
> > around the Bio::DB API
> >
> > Here is my code to attempts to remove sequence features:
> >
> > foreach  (@accs) {
> >
> >     my $acc = $_;
> >     my $adp = $dbadp->get_object_adaptor("Bio::SeqI");
> >
> >     my $seq = Bio::Seq->new(-accession_number => $acc,
> >                           -namespace => $namespace
> >                           );
> >
> >
> >     my $dbseq = $adp->find_by_unique_key($seq);
> >     warn $acc, "  not found in database $namespace" unless  $dbseq;
> >
> >     $dbseq->remove_SeqFeatures(); # remove seqfeatures
> >
> >     $dbseq->commit();
> >     print LOG "removed all seq features for $acc\n";
> > }
> >
> >
> >
> > --
> > Amit Indap
> > http://www.bscb.cornell.edu/Homepages/Amit_Indap/
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l@open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>


--
Amit Indap
http://www.bscb.cornell.edu/Homepages/Amit_Indap/

From hlapp at gnf.org  Tue Oct  4 14:19:28 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Oct  4 14:21:24 2005
Subject: [BioSQL-l] removing features from a sequence
In-Reply-To: <3cfaa4040510041102l77fbc6d6v28968a13a9a4a515@mail.gmail.com>
References: <3cfaa4040510031059l79780e7bl49686d47be9108c0@mail.gmail.com>
	<fa0620baa88036e4880444d4c4a89298@gnf.org>
	<3cfaa4040510041102l77fbc6d6v28968a13a9a4a515@mail.gmail.com>
Message-ID: <f00bc06fa66352077d2d75b21b8dc5ca@gnf.org>


On Oct 4, 2005, at 11:02 AM, Amit Indap wrote:

> Thanks Hilmar, this snippet worked:
>
> foreach my $pfeat( $dbseq->remove_SeqFeatures() ) {
>
>     $pfeat->remove();
>
> }
> $dbseq->store();

Just as a small and technical note: if the only changes you made to the 
object were to delete attached objects, then you don't need to call 
$dbseq->store().

It doesn't hurt though either because the persistence adaptor will 
check the is_dirty() property of a persistent object before issuing the 
update command, and so if you only removed attached objects then 
is_dirty() should still be false.

	-hilmar

> $dbseq->commit();
>
>> Second, $dbseq->store() will only store it; i.e., it will update the
>> object and either update or insert all attached objects (like 
>> features,
>> annotations, etc). If you want to delete attached objects then you 
>> need
>> to do so explicitly by calling $pobj->remove().
>
> Thanks for clarifying this point. I was a bit confused about whether I
> needed to call store or remove. But you are right, I need to
> explicitly call remove.
>
> On 10/3/05, Hilmar Lapp <hlapp@gnf.org> wrote:
>> Yeah I guess this is one of the gotcha's that deserve better
>> documentation. The bioperl-db adaptors will not automatically 'sync'
>> the database with an object. The reason is that there are too many
>> flavors of what you could possibly want as a user, so instead of 
>> making
>> decisions for you you need to make explicit what you want; however,
>> doing so should be reasonably simple.
>>
>> So here's what's going on and how you can fix it.
>>
>> First, $dbseq->remove_SeqFeatures() is a Bio::SeqI method present for
>> all SeqI objects, not just persistent ones. Except in a few cases 
>> where
>> lazy loading is already implemented, methods from the native Bioperl
>> API are not overridden for persistent objects; i.e., you can 
>> manipulate
>> your persistent object to your heart's content and nothing will happen
>> to the respective row(s) in the database. You need to say
>> $dbseq->store() to let your changes take effect. But see below!
>>
>> Second, $dbseq->store() will only store it; i.e., it will update the
>> object and either update or insert all attached objects (like 
>> features,
>> annotations, etc). If you want to delete attached objects then you 
>> need
>> to do so explicitly by calling $pobj->remove().
>>
>> For example, in your case:
>>
>>         # ... find $dbseq ...
>>         # delete all features from the database
>>         # Note: I could use $dbseq->get_SeqFeatures() if I
>>         # wanted to keep the features on the in-memory object
>>         foreach my $pfeat ($dbseq->remove_SeqFeatures()) {
>>                 $pfeat->remove();
>>         }
>>         # now $dbseq and the object in the db don't have features
>>
>> Same thing for annotation. You can check out some of the sample 
>> closure
>> implementations for merging objects provided in the scripts/biosql
>> directory of bioperl-db, for instance freshen-annot.pl deletes all
>> annotation (in the db) from the existing object.
>>
>> Hth,
>>
>>         -hilmar
>>
>> On Oct 3, 2005, at 10:59 AM, Amit Indap wrote:
>>
>>> Hi,
>>>
>>> I was trying to remove features for sequences stored in my BioSQL
>>> database. Once I run the code snippet below to remove sequence
>>> features, I tested to see if the features really had been removed by
>>> running a script that reterieves seq features from bioentries.
>>> Unfortunately, the features are still there. I'm still learning my
>>> around the Bio::DB API
>>>
>>> Here is my code to attempts to remove sequence features:
>>>
>>> foreach  (@accs) {
>>>
>>>     my $acc = $_;
>>>     my $adp = $dbadp->get_object_adaptor("Bio::SeqI");
>>>
>>>     my $seq = Bio::Seq->new(-accession_number => $acc,
>>>                           -namespace => $namespace
>>>                           );
>>>
>>>
>>>     my $dbseq = $adp->find_by_unique_key($seq);
>>>     warn $acc, "  not found in database $namespace" unless  $dbseq;
>>>
>>>     $dbseq->remove_SeqFeatures(); # remove seqfeatures
>>>
>>>     $dbseq->commit();
>>>     print LOG "removed all seq features for $acc\n";
>>> }
>>>
>>>
>>>
>>> --
>>> Amit Indap
>>> http://www.bscb.cornell.edu/Homepages/Amit_Indap/
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>
>
> --
> Amit Indap
> http://www.bscb.cornell.edu/Homepages/Amit_Indap/
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From trissl at informatik.hu-berlin.de  Fri Oct  7 11:53:19 2005
From: trissl at informatik.hu-berlin.de (Silke Trissl)
Date: Thu Oct 13 06:49:42 2005
Subject: [BioSQL-l] Pubmed-ID's from SwissPort
In-Reply-To: <1739447c18c60ffeaec14c7fcdc54259@gnf.org>
References: <431423E3.6080504@informatik.hu-berlin.de>
	<1739447c18c60ffeaec14c7fcdc54259@gnf.org>
Message-ID: <434699EF.50903@informatik.hu-berlin.de>

Hi Hilmar,

thank you for your very fast answer, but I still have problems with
PubMed-IDs. I simply don't get them at all.

Hilmar Lapp wrote:
> The annotation is taken from what's in the source record, so I'm
> assuming you're referring to those references that have a PubMed as well
> as a MEDLINE ID annotated in the SwissProt record.

Yes, these are the one's I am looking for, but there are some with only
the PubMed-ID or Medline-ID alone. But in any case, I am interested in
PubMed-IDs.
> 
> If only one ID is provided, that ID will be stored in the database
> (using a foreign key in the Reference table to Dbxref), so if the
> MEDLINE ID is absent the PubMed ID will substitute for it if it was
> present in the source entry. 

I hoped so, but I don't even get one PubMed-ID, although I produced a
swissprot file where some entries only have a PubMed-ID. But in my
database there is no entry in the dbxref-table that has 'PUBMED' as dbname.

I attached a small swissprot-file, where I deleted some references to
Medline. If it is working for you, i.e. you get the PubMed-ID's, please
tell me which versions of BioPerl and BioSQL you use. I work with perl
and fill a PostgreSQL database.

Regards,

	Silke Trissl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.dat
Type: video/mpeg
Size: 101381 bytes
Desc: not available
Url : http://open-bio.org/pipermail/biosql-l/attachments/20051007/0fede429/test-0001.mpeg
From trissl at informatik.hu-berlin.de  Thu Oct 13 07:15:58 2005
From: trissl at informatik.hu-berlin.de (Silke Trissl)
Date: Thu Oct 13 07:15:31 2005
Subject: [BioSQL-l] Pubmed-ID's from SwissPort
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5602428EF8@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5602428EF8@BIONIC.biopolis.one-north.com>
Message-ID: <434E41EE.90105@informatik.hu-berlin.de>

Richard HOLLAND wrote:
> This may sound a bit obvious, but I think BioPerl stores the PUBMED
> references with a dbname of 'PubMed' not 'PUBMED'. Have you tried
> searching with the correct use of case, or is your PostgreSQL
> case-insensitive?

Well I did get no PubMed-Id - either spelling or case.

By now I found out what was the problem with the PubMed-Ids. Every time
I only had a PubMed-ID given in a SwissProt file this ID was not taken
and parsed into the database. I thought it is a problem with BioSQL, but
it was a problem with BioPERL, where the swissprot parser has a problem.
I reported the bug and it should be fixed in a forthcoming release.

Therefore no problem with BioSQL, but BioPerl and sorry that I have
bothered the list.

Regards,

      Silke Tri?l


> 
> cheers,
> Richard
> 
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
> 
> 
>>-----Original Message-----
>>From: biosql-l-bounces@portal.open-bio.org 
>>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
>>Silke Trissl
>>Sent: Friday, October 07, 2005 11:53 PM
>>To: Hilmar Lapp
>>Cc: Biosql
>>Subject: Re: [BioSQL-l] Pubmed-ID's from SwissPort
>>
>>
>>Hi Hilmar,
>>
>>thank you for your very fast answer, but I still have problems with
>>PubMed-IDs. I simply don't get them at all.
>>
>>Hilmar Lapp wrote:
>>>The annotation is taken from what's in the source record, so I'm
>>>assuming you're referring to those references that have a 
>>PubMed as well
>>>as a MEDLINE ID annotated in the SwissProt record.
>>Yes, these are the one's I am looking for, but there are some 
>>with only
>>the PubMed-ID or Medline-ID alone. But in any case, I am interested in
>>PubMed-IDs.
>>>If only one ID is provided, that ID will be stored in the database
>>>(using a foreign key in the Reference table to Dbxref), so if the
>>>MEDLINE ID is absent the PubMed ID will substitute for it if it was
>>>present in the source entry. 
>>I hoped so, but I don't even get one PubMed-ID, although I produced a
>>swissprot file where some entries only have a PubMed-ID. But in my
>>database there is no entry in the dbxref-table that has 
>>'PUBMED' as dbname.
>>
>>I attached a small swissprot-file, where I deleted some references to
>>Medline. If it is working for you, i.e. you get the 
>>PubMed-ID's, please
>>tell me which versions of BioPerl and BioSQL you use. I work with perl
>>and fill a PostgreSQL database.
>>
>>Regards,
>>
>>	Silke Trissl
>>
> 
> 


-- 
________________________________________________________________________

Silke Tri?l, MSc
Wissensmanagement in der Bioinformatik          Raum IV.104
                                                Rudower Chaussee 25
Institut f?r Informatik                         12489 Berlin, GERMANY
Humboldt-Universit?t zu Berlin
                                                Tel: ++49 30 2093 3904
www.informatik.hu-berlin.de/~trissl             Fax: ++49 30 2093 5484


From hollandr at gis.a-star.edu.sg  Thu Oct 13 07:07:49 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Thu Oct 13 07:19:48 2005
Subject: [BioSQL-l] Pubmed-ID's from SwissPort
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602428EF8@BIONIC.biopolis.one-north.com>

This may sound a bit obvious, but I think BioPerl stores the PUBMED
references with a dbname of 'PubMed' not 'PUBMED'. Have you tried
searching with the correct use of case, or is your PostgreSQL
case-insensitive?

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> Silke Trissl
> Sent: Friday, October 07, 2005 11:53 PM
> To: Hilmar Lapp
> Cc: Biosql
> Subject: Re: [BioSQL-l] Pubmed-ID's from SwissPort
> 
> 
> Hi Hilmar,
> 
> thank you for your very fast answer, but I still have problems with
> PubMed-IDs. I simply don't get them at all.
> 
> Hilmar Lapp wrote:
> > The annotation is taken from what's in the source record, so I'm
> > assuming you're referring to those references that have a 
> PubMed as well
> > as a MEDLINE ID annotated in the SwissProt record.
> 
> Yes, these are the one's I am looking for, but there are some 
> with only
> the PubMed-ID or Medline-ID alone. But in any case, I am interested in
> PubMed-IDs.
> > 
> > If only one ID is provided, that ID will be stored in the database
> > (using a foreign key in the Reference table to Dbxref), so if the
> > MEDLINE ID is absent the PubMed ID will substitute for it if it was
> > present in the source entry. 
> 
> I hoped so, but I don't even get one PubMed-ID, although I produced a
> swissprot file where some entries only have a PubMed-ID. But in my
> database there is no entry in the dbxref-table that has 
> 'PUBMED' as dbname.
> 
> I attached a small swissprot-file, where I deleted some references to
> Medline. If it is working for you, i.e. you get the 
> PubMed-ID's, please
> tell me which versions of BioPerl and BioSQL you use. I work with perl
> and fill a PostgreSQL database.
> 
> Regards,
> 
> 	Silke Trissl
> 

From hlapp at gnf.org  Thu Oct 13 12:35:34 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu Oct 13 13:40:04 2005
Subject: [BioSQL-l] Pubmed-ID's from SwissPort
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5602428EF8@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5602428EF8@BIONIC.biopolis.one-north.com>
Message-ID: <a5ceae8178375d8f61007cd8812db51c@gnf.org>


On Oct 13, 2005, at 4:07 AM, Richard HOLLAND wrote:

> This may sound a bit obvious, but I think BioPerl stores the PUBMED
> references with a dbname of 'PubMed' not 'PUBMED'. Have you tried
> searching with the correct use of case, or is your PostgreSQL
> case-insensitive?

As an aside, this will actually only matter if you query the database 
directly, or in fact also if you use Bioperl for loading and Biojava 
for retrieval :-)

If you stay within Bioperl/Bioperl-db the dbname obviously should be 
recognized upon retrieval (and I believe it is).

BTW Silke, even if the cause of this hadn't resided in the Bioperl 
swissprot parser, it would have still been a Bioperl issue because then 
the bug would have been in Bioperl-db. The BioSQL list scope is more or 
less the relational model and schema and right now one script that 
comes with BioSQL (for loading the taxonomy database).

Cheers, nine time zones to the left and right,

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From daniel.lang at biologie.uni-freiburg.de  Tue Oct 18 05:31:41 2005
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Tue Oct 18 05:47:42 2005
Subject: [BioSQL-l] Problem with Bio::DB::DBI::Pg after upgrading to
 core-1.5.1 and latest cvs
Message-ID: <4354C0FD.1070904@biologie.uni-freiburg.de>

Hi,

I?ve just upgraded my bioperl-cvs-version(december 2004, the one right
before 1.5 and the trouble with how Annotations were written out using
Bio::Seq) to bioperl-1.5.1 in order to see if the features are now
written out correctly. (See also "[BioSQL-l] strange error after
changing to RC1.5" 09.03.2005)

Seems like the core code is now working like it used to. But now the
bioperl-db code for Pg has problems:(

When I try to retrieve a sequence from a Pg-biosql db I receive the
following error:

Can't locate object method "dsn" via package "Bio::DB::SimpleDBContext"
(perhaps you forgot to load "Bio::DB::SimpleDBContext"?) at
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/DBI/Pg.pm line 221.!

I'm using the cvs-version (today) of bioperl-db with the 1.5.1 of core
and run.

I tried to use SimpleDBContext in Pg.pm - no effect.

I don?t really get it what is happening there...

Thanks in advance.

Regards Daniel:)

From daniel.lang at biologie.uni-freiburg.de  Wed Oct 19 11:18:22 2005
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Wed Oct 19 11:17:25 2005
Subject: [BioSQL-l] Re: Problem with Bio::DB::DBI::Pg after upgrading to
 core-1.5.1 and latest cvs
In-Reply-To: <504d32ec41926e5f89b8c454535a6e37@gnf.org>
References: <4354C0FD.1070904@biologie.uni-freiburg.de>
	<504d32ec41926e5f89b8c454535a6e37@gnf.org>
Message-ID: <435663BE.40109@biologie.uni-freiburg.de>

Hi Hilmar,
Yes, I have installed the latest bioperl-db:
lang@frontend-0-0:~/bioperl> perldoc -m Bio::DB::SimpleDBContext.pm |
grep '$Id'
# $Id: SimpleDBContext.pm,v 1.5 2005/08/26 19:34:14 lapp Exp $
lang@frontend-0-0:~/bioperl> perldoc -m Bio::DB::DBI::Pg.pm | grep '$Id'
# $Id: Pg.pm,v 1.2 2005/08/26 19:34:14 lapp Exp $

But your guess was right, I had a stone-age version in my @INC :(

Now everything works fine...

Thanks!

Regards,
Daniel:)


Hilmar Lapp wrote:
> Bioperl-db required some changes to work fine with the 1.5.x releases,
> so it is critical that you upgrade bioperl-db as well if you upgrade
> Bioperl to 1.5. I believe the reason you're getting the error is a
> version incompatibility. SimpleDBContext does implement dsn(), and Pg.pm
> doesn't use SimpleDBContext as a literal anywhere.
> 
> However, you're saying that you are using the latest cvs update of
> bioperl-db, so maybe you haven't installed your upgraded bioperl-db
> version but did install the previous one?
> 
> You can check for individual versions of files by grep'ing for the $Id
> tag. Here's what you should see:
> 
> reigen: 9:49 19>perldoc -m Bio::DB::SimpleDBContext.pm | grep '$Id'
> # $Id: SimpleDBContext.pm,v 1.5 2005/08/26 19:34:14 lapp Exp $
> reigen: 9:51 20>perldoc -m Bio::DB::DBI::Pg.pm | grep '$Id'
> # $Id: Pg.pm,v 1.2 2005/08/26 19:34:14 lapp Exp $
> reigen: 9:52 21>
> 
> Did you run the tests? Was there a problem? If the tests run fine (they
> should) then it is almost certainly older modules installed somewhere
> else in your @INC that interfere with the new ones.
> 
>     -hilmar
> 
> On Oct 18, 2005, at 2:31 AM, Daniel Lang wrote:
> 
From crackeur at comcast.net  Wed Oct 26 02:25:31 2005
From: crackeur at comcast.net (Jimmy zhang)
Date: Wed Oct 26 02:34:15 2005
Subject: [BioSQL-l] [ANN]: VTD-XML 1.0 released
Message-ID: <002701c5d9f6$18f94a20$0202a8c0@ximplewa9u1u8q>

I am pleased to announce that both Java and C version 1.0 of
VTD-XML -- an open-source, high-performance and non-extractive
XML processing API -- is freely available on sourceforge.net.
For source code, documentation, detailed description of API
and code examples, please visit

  http://vtd-xml.sourceforge.net

New in VTD-XML 1.0 is the integrated support of XPath that also
features a easy-to-use interface that further enhances VTD-XML's
inherent benefits, such as CPU/memory efficiency, random access,
and incremental update. A demo of the XPath capability is available
at

  http://vtd-xml.sourceforge.net/demo.html

For further reading, please refer to the following articles about
VTD-XML

* Process SOAP with VTD-XML
  http://xml.sys-con.com/read/48764.htm

* Better, faster XML processing with VTD-XML
  http://www.devx.com/xml/Article/22219?trk=DXRSS_XML

* XML on a chip
  http://www.xml.com/pub/a/2005/03/09/chip.html

* Improve XML processing with VTD-XML
  http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/xeon/multicore/211657.htm


From maria.mirto at unile.it  Tue Oct 25 11:55:07 2005
From: maria.mirto at unile.it (Maria Mirto)
Date: Sat Oct 29 00:05:00 2005
Subject: [BioSQL-l] CFP: Special Issue FGCS on Life Science Grids for
 Biomedicine and Bioinformatics
Message-ID: <3458.193.204.74.230.1130255707.squirrel@webmail2.unile.it>

*************************************************************************
* Call for papers for the Special issue                                 *
*                                                                       *
* Life Science Grids for Biomedicine and Bioinformatics                 *
*                                                                       *
* Future Generation Computer System                                     *
* Elsevier                                                              *
* http://www.elsevier.com/inca/publications/misc/lifesciencegrid05.doc  *
*                                                                       *
*************************************************************************


IMPORTANT DATES
Submission for manuscripts:                December 15, 2005
Acceptance notification:                January 28, 2006
Due date of revised manuscripts:         February 28, 2006
Approximate date of publication:         Spring, 2006


Purpose of the Special Issue
----------------------------

Omics technologies (genomics - DNA, transcriptomics - RNA, proteomics -
protein, metabolomics - metabolite and phenomics ?V phenotype, etc.) and
medical informatics have changed the arena of life sciences research
forever. They allow generation of data at a large-scale, which started
with the whole-genome followed by micro-array gene-expression analysis,
mass spectrometry of proteins and metabolites, biomedical imaging
processing and health care.
Omics technologies require substantial paradigm shifts for the way life
sciences research is carried out. Biological experiments are relatively
expensive, which forces scientists to focus on advanced design for
experimentation as part of a whole-chain research approach. Furthermore,
the data generally contains information outside the scope of the original
experiment. Hence, to maximize scope of experiments, biological data needs
to be reusable, shareable and suitable for in-silico experiments. All of
this poses high demands on annotation of data and standardization of data
formats. Furthermore, the conversion of data into information and
knowledge to support scientists answering biological questions, requires
advanced analysis methods and tools that enable mining and integrating
these complex datasets. The bottlenecks for life sciences have shifted
from data generation to data storage, pre-processing, analysis, and
interpretation. The current challenge is to remove these bottlenecks by a
combination of life sciences and information technology (IT).

The effective and efficient management and use of stored data, and in
particular the transformation of these data into information and
knowledge, is thus a key requirement for success in Life Sciences, as
already has been recognized in many others sectors such as industry,
science, government.

Life Science Grids are based on the integration of emerging technologies
such as Grids, Bioinformatics, Web/Grid Services, Workflow, Semantic Web,
to support applications and research in different fields of Life Sciences,
such as Health Care, Biomedicine, Computational Chemistry. They promise to
provide reliable and secure computing infrastructures facilitating the
seamless use of distributed datasets, bioinformatics tools and systems,
data mining applications, and knowledge, building a so-called Grid Problem
Solving Environment (G-PSE), for solving complex problems in Biomedicine
and Health Care.

The scope of this special issue is to focus on challenge, applications and
services in modern Life Science Grid computing environments.
Topics of interest include, but are not limited to the following:

        - Grid solutions for Life Science applications
        - Grid infrastructures for bio data analysis
        - Parallel bio data-intensive applications
        - Grid infrastructures, middleware and tools for Life Science Grids
        - Web Services for Life Science Grids
        - Workflow for Life Science Grids
        - Semantic Grid for Life Science applications
        - Bio data analysis and management
        - Databases and the grid in biomedical field
        - Data grids for biomedicine and bioinformatics
        - Data mining of truly large and high-dimensional bio data sets
        - Security in bio data grids
        - Biology, Biochemistry and Biomedicine for Grid Environments
              - Drug Design
              - Protein Folding
              - Systems Biology
              - Genome informatics and phylogeny


Guest Editors
-------------

Giovanni Aloisio
University of Lecce, Italy
giovanni.aloisio@unile.it

Vincent Breton
CNRS/IN2P3, LPC Clermont-Ferrand, France
breton@clermont.in2p3.fr

Maria Mirto
University of Lecce, Italy
maria.mirto@unile.it

Almerico Murli
University of Naples, Italy
almerico.murli@dma.unina.it

Tony Solomonides
University of West of England, UK
Tony.Solomonides@uwe.ac.uk


Important Dates
---------------

Paper submission deadline       December 15, 2005
Notification of acceptance      January 28, 2006
Camera-ready papers             February 28, 2006
Desired publication             End of 2006


-- 
============================================================

Maria Mirto

PhD student, Center for Advanced Computational Technologies
via per Monteroni, 73100 Lecce (Le), ITALY

S.P.A.C.I. srl
ph:  +39 0832 297304
fax: +39 0832 297279
============================================================

From hollandr at gis.a-star.edu.sg  Mon Oct 31 04:28:09 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Tue Nov  1 16:47:07 2005
Subject: [BioSQL-l] BioJavaX ready for testing
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560265652E@BIONIC.biopolis.one-north.com>

Hello people!

Mark is away so I'm taking the liberty of sneaking this one out... :)

I've cross-posted this to both BioJava and BioSQL as much of what is new in BioJavaX will probably be of interest to BioSQL users too.

We've been doing a lot of work recently on creating some extensions to BioJava called BioJavaX. Primarily the purpose of these extensions is to provide better interaction with BioSQL databases, which has been achieved using Hibernate (www.hibernate.org). You can now fully interact with every column of every table in BioSQL, using Hibernate's own HQL language to construct queries that result in sets of BioJavaX objects. Selects, inserts, updates, primary key assignment, foreign key relations, and deletes are all handled transparently by Hibernate, removing the need for any SQL at all to be included in BioJavaX.

As a side effect of constructing a Hibernate-compatible extension to the BioJava object model, we were required to define objects that hold much more detailed information about themselves. For instance, a Sequence object cannot tell you what namespace it lives in in the BioSQL database, but our extension to it, RichSequence, can. As RichSequence extends Sequence and doesn't replace it, this means you can use the new objects with your existing code without any hassle casting them.

To be able to load information from files into these new RichSequence objects in a meaningful way, we had to create a more detailed SeqIOListener, called RichSeqIOListener. Then, we had to create new file parsers for the common file formats which were able to extract more detailed information than before in order to satisfy the RichSeqIOListener. 

It's pretty safe to say that the file parsers in BioJavaX are leagues ahead of the existing ones in BioJava, even if I do say so myself. :P The downside of this extra detail though is that the parsers are much more sensitive and will not play well at all with incomplete or incorrectly formed files. If someone can edit them to be less sensitive whilst still retaining the level of detail required, that'd be great.

We've included parsers for FASTA, GenBank, EMBL, UniProt, INSDseq, EMBLxml, UniProtXML, and an extra one for parsing NCBI Taxonomy data.

Do note that BioJavaX cannot fully convert sequences created using the old BioJava model into the new BioJavaX model. It'll do its best, but the RichSequence object you'll end up with will have lots of properties set to null and a tonne of annotations instead, pretty much the same as the original Sequence object I suppose. So its best to try to avoid conversions and deal with RichSequence objects from the ground up. This is particularly important to consider when converting a BioSQL database previously used with BioJava into one for use with BioJavaX. You'll also find that if you pass a converted old-style Sequence object to one of the new file parsers for writing it may fail or produce output with lots of missing fields, as it will not find the information it is looking for in the places it expects. 

The whole lot is specifically designed to mimic and be compatible with BioSQL, but you don't need to have a BioSQL database to use it. Everything is standalone and will work just fine without a backing data source. Also there is no reason why you couldn't create a new set of Hibernate mappings that map the BioJavaX object model to some other relational database schema of your choice.

The upshot of it all is the org.biojavax package, which you can find in biojava-live branch on CVS. Development is pretty much complete, and it now needs some serious testing.

We need volunteers to:

	a) test the BioSQL interaction via Hibernate with the various database flavours supported (HSQL, Oracle, MySQL, PostGreSQL)
	b) test the various file formats, particularly looking for special-case exceptions which the parsers may not be aware of yet
	c) do some load-testing and help us find ways to improve it if it turns out to be too slow when under pressure

Documentation of the new features can be found in DocBook XML format in docs/docbook/BioJavaX.xml in the biojava-live branch of CVS. It's as detailed as I could make it without getting bored to death writing it. I've never been the world's best documentation writer, so if anyone would like to help improve it you're more than welcome.

Our plan is to make all this an official part of BioJava come the 1.5 release, whenever that may be. For now though it is very very much a testing-stage thing, not even an alpha release.

Questions on a postcard to either Mark or myself. Feedback most welcome.

cheers,
Richard


Richard Holland
Bioinformatics Specialist
Genome Institute of Singapore
60 Biopolis Street, #02-01 Genome, Singapore 138672
Tel: (65) 6478 8000   DID: (65) 6478 8199
Email: hollandr@gis.a-star.edu.sg
---------------------------------------------
This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you.
---------------------------------------------