From andreas at sdsc.edu  Fri Mar  5 11:56:40 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 5 Mar 2010 08:56:40 -0800
Subject: [Biojava-dev] Google summer of code
Message-ID: <59a41c431003050856v17c83b80sf1fb59f2587c9cd1@mail.gmail.com>

Hi,

The Open Bioinformatics Foundation (BioJava's mother organisation) is
preparing an application for the Google Summer of Code. If you are
interested in becoming a mentor for a BioJava related project, you can join
us in the application. If you are a student and are interested in a project,
please take a look at these pages:

http://www.open-bio.org/wiki/Google_Summer_of_Code

http://biojava.org/wiki/Google_Summer_of_Code

Andreas

From yogeshp08 at gmail.com  Sat Mar  6 14:38:13 2010
From: yogeshp08 at gmail.com (Yogesh)
Date: Sat, 6 Mar 2010 14:38:13 -0500
Subject: [Biojava-dev] Modules + GSoC2010
Message-ID: <193861401003061138gbd0fa77t785eaa15a25a971c@mail.gmail.com>

Hello,

I am a Graduate student in Bioinformatics. I am thrilled to know that OBF is
particiapting in GSoC2010
I also wish to participate in GSoC2010 for the first time this year.
I will like to apply for a project related to BioJava.

I am very comfortable with Java. Also, I use BioJava very often.

One of the projects from BioJava::Modules that I like and I think I can do
is:
            Support for SCOP file parsing.

Can I have some help on how to go about this project?

Another project that I would like to contribute to is:
            Develop a multiple sequence alignment algorithm entirely written
in Java
More info on this will also help me decide on which project to apply for in
GSoC2010.

Thank you.

Regards,

-Yogesh

From holland at eaglegenomics.com  Mon Mar 15 06:34:14 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 15 Mar 2010 10:34:14 +0000
Subject: [Biojava-dev] Hackathon in Boston, July 2010
Message-ID: <5FC2D8EC-5408-4126-9A7D-CB6B3500B61C@eaglegenomics.com>

Hi all,

Following the successful hackathon in Cambridge earlier this year, it was originally planned to hold a second one in Boston in conjunction with BOSC in order to give those who couldn't make it to the UK a chance to get involved.

However, OBF have beaten us to it by organising a cross-project CodeFest!

 http://www.open-bio.org/wiki/Codefest_2010

It would be great for BioJava people to get involved with this cross-project hackathon effort, and it saves organising one of our own! :)

All relevant info is on the web page linked to above, and if you have any questions, ask Brad as detailed on the page.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas at sdsc.edu  Tue Mar 16 11:57:38 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 16 Mar 2010 08:57:38 -0700
Subject: [Biojava-dev] biojava 3 progress
Message-ID: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>

Hi,

ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
BioJava release. As such it would be a good moment to discuss the current
status of the various new BioJava 3 modules.

The biojava-structure, biojava-structure-gui modules are essentially ready
for release and I started to update the Cookbook with the latest features
http://biojava.org/wiki/BioJava:CookBook:PDB:align

Some of the re-factored modules based on biojava 1.7 could be released
anytime soon as well. The documentation just needs to be updated to explain
where the functionality can be found now (e.g. alignment module)

What about the new code that has been under development since the hackathon?
Is it getting release ready slowly? Any plans for documentation? What is
missing before we can make the first Biojava 3 release?

Andreas

From ayates at ebi.ac.uk  Tue Mar 16 13:21:48 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 16 Mar 2010 17:21:48 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
Message-ID: <81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>

It's getting ready very slowly. Currently we need:

* Locations correctly implemented
** There's no way of requesting subseqs from them atmo
* Feature on sequences support
* Extra attributes which do not fit into top-level attributes
* Mapping between sequences/assemblies
* circular location support
** so no checks on start being less than end
* Documentation

Think that's it off the top of my head

Andy

On 16 Mar 2010, at 15:57, Andreas Prlic wrote:

> Hi,
> 
> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
> BioJava release. As such it would be a good moment to discuss the current
> status of the various new BioJava 3 modules.
> 
> The biojava-structure, biojava-structure-gui modules are essentially ready
> for release and I started to update the Cookbook with the latest features
> http://biojava.org/wiki/BioJava:CookBook:PDB:align
> 
> Some of the re-factored modules based on biojava 1.7 could be released
> anytime soon as well. The documentation just needs to be updated to explain
> where the functionality can be found now (e.g. alignment module)
> 
> What about the new code that has been under development since the hackathon?
> Is it getting release ready slowly? Any plans for documentation? What is
> missing before we can make the first Biojava 3 release?
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From HWillis at scripps.edu  Tue Mar 16 14:51:04 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Tue, 16 Mar 2010 14:51:04 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
Message-ID: <EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>

I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract. 

I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.

I will also plan on migrating the sequence alignment code as well.

I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization. 

I am planning on attending ISMB/BOSC.

Do we want to put some deadlines in place with a mini-project plan?

Thanks

Scooter


On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:

> It's getting ready very slowly. Currently we need:
> 
> * Locations correctly implemented
> ** There's no way of requesting subseqs from them atmo
> * Feature on sequences support
> * Extra attributes which do not fit into top-level attributes
> * Mapping between sequences/assemblies
> * circular location support
> ** so no checks on start being less than end
> * Documentation
> 
> Think that's it off the top of my head
> 
> Andy
> 
> On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
> 
>> Hi,
>> 
>> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
>> BioJava release. As such it would be a good moment to discuss the current
>> status of the various new BioJava 3 modules.
>> 
>> The biojava-structure, biojava-structure-gui modules are essentially ready
>> for release and I started to update the Cookbook with the latest features
>> http://biojava.org/wiki/BioJava:CookBook:PDB:align
>> 
>> Some of the re-factored modules based on biojava 1.7 could be released
>> anytime soon as well. The documentation just needs to be updated to explain
>> where the functionality can be found now (e.g. alignment module)
>> 
>> What about the new code that has been under development since the hackathon?
>> Is it getting release ready slowly? Any plans for documentation? What is
>> missing before we can make the first Biojava 3 release?
>> 
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From andreas at sdsc.edu  Tue Mar 16 16:58:02 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 16 Mar 2010 13:58:02 -0700
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
Message-ID: <59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>

Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts
do you think would be feasible to finish,  if we would say we are planning a
release  e.g. early May ? We can have a follow-up to this release once the
next round of features have been added. Probably it  makes sense to focus on
stabilizing what is currently there and documenting it, rather than trying
to be feature-complete. Critical features that are still missing should be
added of course...

Andreas

On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu>wrote:

> I am working on adding in additional features to the core module to round
> things out and will be able to do docs/wiki examples. I will be working on
> Features with the new sequence model and the ability to pull features from
> uniprot based on uniprot id as an example. I will use uniprot XML as the
> data model when figuring out the feature data model such that classes have
> biology relevance instead of being completely abstract.
>
> I will also see if I can do something with NCBI for genome sequence data
> where you don't need to download the entire sequence but based on gff
> annotations you can pull dna sequences for exons belonging to a particular
> gene.
>
> I will also plan on migrating the sequence alignment code as well.
>
> I think the focus for this release should be on the modularization of the
> modules and the maven integration. We also need to provide a repository for
> those who are not going to use maven and need just the jar files. We can
> then highlight the newer modules as a benefit of the modularization.
>
> I am planning on attending ISMB/BOSC.
>
> Do we want to put some deadlines in place with a mini-project plan?
>
> Thanks
>
> Scooter
>
>
> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
>
> > It's getting ready very slowly. Currently we need:
> >
> > * Locations correctly implemented
> > ** There's no way of requesting subseqs from them atmo
> > * Feature on sequences support
> > * Extra attributes which do not fit into top-level attributes
> > * Mapping between sequences/assemblies
> > * circular location support
> > ** so no checks on start being less than end
> > * Documentation
> >
> > Think that's it off the top of my head
> >
> > Andy
> >
> > On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
> >
> >> Hi,
> >>
> >> ISMB/BOSC is coming up rapidly and we should start to prepare for the
> annual
> >> BioJava release. As such it would be a good moment to discuss the
> current
> >> status of the various new BioJava 3 modules.
> >>
> >> The biojava-structure, biojava-structure-gui modules are essentially
> ready
> >> for release and I started to update the Cookbook with the latest
> features
> >> http://biojava.org/wiki/BioJava:CookBook:PDB:align
> >>
> >> Some of the re-factored modules based on biojava 1.7 could be released
> >> anytime soon as well. The documentation just needs to be updated to
> explain
> >> where the functionality can be found now (e.g. alignment module)
> >>
> >> What about the new code that has been under development since the
> hackathon?
> >> Is it getting release ready slowly? Any plans for documentation? What is
> >> missing before we can make the first Biojava 3 release?
> >>
> >> Andreas
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> > --
> > Andrew Yates                   Ensembl Genomes Engineer
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >
> >
> >
> >
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>

From ayates at ebi.ac.uk  Wed Mar 17 11:28:33 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 17 Mar 2010 15:28:33 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
Message-ID: <4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>

I think features are possible & this is really the missing piece of the puzzle with this project. How far on are you with them Scooter?

On 16 Mar 2010, at 20:58, Andreas Prlic wrote:

> Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts do you think would be feasible to finish,  if we would say we are planning a release  e.g. early May ? We can have a follow-up to this release once the next round of features have been added. Probably it  makes sense to focus on stabilizing what is currently there and documenting it, rather than trying to be feature-complete. Critical features that are still missing should be added of course... 
> 
> Andreas
> 
> On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract.
> 
> I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.
> 
> I will also plan on migrating the sequence alignment code as well.
> 
> I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization.
> 
> I am planning on attending ISMB/BOSC.
> 
> Do we want to put some deadlines in place with a mini-project plan?
> 
> Thanks
> 
> Scooter
> 
> 
> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
> 
> > It's getting ready very slowly. Currently we need:
> >
> > * Locations correctly implemented
> > ** There's no way of requesting subseqs from them atmo
> > * Feature on sequences support
> > * Extra attributes which do not fit into top-level attributes
> > * Mapping between sequences/assemblies
> > * circular location support
> > ** so no checks on start being less than end
> > * Documentation
> >
> > Think that's it off the top of my head
> >
> > Andy
> >
> > On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
> >
> >> Hi,
> >>
> >> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
> >> BioJava release. As such it would be a good moment to discuss the current
> >> status of the various new BioJava 3 modules.
> >>
> >> The biojava-structure, biojava-structure-gui modules are essentially ready
> >> for release and I started to update the Cookbook with the latest features
> >> http://biojava.org/wiki/BioJava:CookBook:PDB:align
> >>
> >> Some of the re-factored modules based on biojava 1.7 could be released
> >> anytime soon as well. The documentation just needs to be updated to explain
> >> where the functionality can be found now (e.g. alignment module)
> >>
> >> What about the new code that has been under development since the hackathon?
> >> Is it getting release ready slowly? Any plans for documentation? What is
> >> missing before we can make the first Biojava 3 release?
> >>
> >> Andreas
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> > --
> > Andrew Yates                   Ensembl Genomes Engineer
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >
> >
> >
> >
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From HWillis at scripps.edu  Wed Mar 17 11:52:01 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 11:52:01 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
Message-ID: <E0D0833C-5C65-476D-936E-1A300B27A463@scripps.edu>

Andy

Working on it at the moment. I am starting with some code I have been using from JavaGene that has a fairly good handle of gff parsing and handling negative strands. I am migrating to a new project called biojava3-genes(local only at the moment) where code related to gff parsing and dealing with various gene prediction program outputs can be used. I need to create a training file for GlimmerHMM so the short term goal is to take a XML blast output of predicted genes that match uniprot and then extract the exon features from DNASequences with exon features added from a gff file. I will then use these validated exon features to create the GlimmerHMM training file. The complexity of exon features with negative strand and frame shifts with the ability to splice together a coding sequence is probably the most complicated feature example we will encounter. After I get through that I will see what can be extended/refactored etc for other more generic features.

I also have some code to gather genome characteristics GC percent, avg gene length, etc. that can be included in the biojava3-genes module. I wanted to see if you know how Average Number of Introns per gene is calculated when a gene has no introns. Do you add a 0 to the average or only include genes with at least one intron in the average?

Can you think of a better name for a package that deals with gff,gff3 parsing and utilities to work with various gene prediction inputs/outputs?

Scooter


On Mar 17, 2010, at 11:28 AM, Andy Yates wrote:

> I think features are possible & this is really the missing piece of the puzzle with this project. How far on are you with them Scooter?
> 
> On 16 Mar 2010, at 20:58, Andreas Prlic wrote:
> 
>> Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts do you think would be feasible to finish,  if we would say we are planning a release  e.g. early May ? We can have a follow-up to this release once the next round of features have been added. Probably it  makes sense to focus on stabilizing what is currently there and documenting it, rather than trying to be feature-complete. Critical features that are still missing should be added of course... 
>> 
>> Andreas
>> 
>> On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>> I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract.
>> 
>> I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.
>> 
>> I will also plan on migrating the sequence alignment code as well.
>> 
>> I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization.
>> 
>> I am planning on attending ISMB/BOSC.
>> 
>> Do we want to put some deadlines in place with a mini-project plan?
>> 
>> Thanks
>> 
>> Scooter
>> 
>> 
>> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
>> 
>>> It's getting ready very slowly. Currently we need:
>>> 
>>> * Locations correctly implemented
>>> ** There's no way of requesting subseqs from them atmo
>>> * Feature on sequences support
>>> * Extra attributes which do not fit into top-level attributes
>>> * Mapping between sequences/assemblies
>>> * circular location support
>>> ** so no checks on start being less than end
>>> * Documentation
>>> 
>>> Think that's it off the top of my head
>>> 
>>> Andy
>>> 
>>> On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
>>> 
>>>> Hi,
>>>> 
>>>> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
>>>> BioJava release. As such it would be a good moment to discuss the current
>>>> status of the various new BioJava 3 modules.
>>>> 
>>>> The biojava-structure, biojava-structure-gui modules are essentially ready
>>>> for release and I started to update the Cookbook with the latest features
>>>> http://biojava.org/wiki/BioJava:CookBook:PDB:align
>>>> 
>>>> Some of the re-factored modules based on biojava 1.7 could be released
>>>> anytime soon as well. The documentation just needs to be updated to explain
>>>> where the functionality can be found now (e.g. alignment module)
>>>> 
>>>> What about the new code that has been under development since the hackathon?
>>>> Is it getting release ready slowly? Any plans for documentation? What is
>>>> missing before we can make the first Biojava 3 release?
>>>> 
>>>> Andreas
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> 
>>> --
>>> Andrew Yates                   Ensembl Genomes Engineer
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> 
>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 


From ayates at ebi.ac.uk  Wed Mar 17 12:04:50 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 17 Mar 2010 16:04:50 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <E0D0833C-5C65-476D-936E-1A300B27A463@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<E0D0833C-5C65-476D-936E-1A300B27A463@scripps.edu>
Message-ID: <4A0FFCFF-9EAA-4B27-BF11-AFD6D4CFEAE0@ebi.ac.uk>

Hey mate,

Sounds good anything with good GFF support is something hard to come by :). So you're going to get it working for the non-generic structures & then push it out into the core modules if I'm reading what you said correctly?

Add 0 to the percentage & make sure the docs describe what it's doing. Even if a gene has no introns it still affects the average of introns in a genome :).

All I can think of is "biojava3-features". Not sure what "biojava3-genes" says. Maybe it goes into an "io" package ... say one which goes with an EMBL/Genbank/CHADO formatter maybe. Naming is a horrible thing to have to do. 

Andy

On 17 Mar 2010, at 15:52, Scooter Willis wrote:

> Andy
> 
> Working on it at the moment. I am starting with some code I have been using from JavaGene that has a fairly good handle of gff parsing and handling negative strands. I am migrating to a new project called biojava3-genes(local only at the moment) where code related to gff parsing and dealing with various gene prediction program outputs can be used. I need to create a training file for GlimmerHMM so the short term goal is to take a XML blast output of predicted genes that match uniprot and then extract the exon features from DNASequences with exon features added from a gff file. I will then use these validated exon features to create the GlimmerHMM training file. The complexity of exon features with negative strand and frame shifts with the ability to splice together a coding sequence is probably the most complicated feature example we will encounter. After I get through that I will see what can be extended/refactored etc for other more generic features.
> 
> I also have some code to gather genome characteristics GC percent, avg gene length, etc. that can be included in the biojava3-genes module. I wanted to see if you know how Average Number of Introns per gene is calculated when a gene has no introns. Do you add a 0 to the average or only include genes with at least one intron in the average?
> 
> Can you think of a better name for a package that deals with gff,gff3 parsing and utilities to work with various gene prediction inputs/outputs?
> 
> Scooter
> 
> 
> 
> 
> 
> On Mar 17, 2010, at 11:28 AM, Andy Yates wrote:
> 
>> I think features are possible & this is really the missing piece of the puzzle with this project. How far on are you with them Scooter?
>> 
>> On 16 Mar 2010, at 20:58, Andreas Prlic wrote:
>> 
>>> Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts do you think would be feasible to finish,  if we would say we are planning a release  e.g. early May ? We can have a follow-up to this release once the next round of features have been added. Probably it  makes sense to focus on stabilizing what is currently there and documenting it, rather than trying to be feature-complete. Critical features that are still missing should be added of course... 
>>> 
>>> Andreas
>>> 
>>> On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract.
>>> 
>>> I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.
>>> 
>>> I will also plan on migrating the sequence alignment code as well.
>>> 
>>> I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization.
>>> 
>>> I am planning on attending ISMB/BOSC.
>>> 
>>> Do we want to put some deadlines in place with a mini-project plan?
>>> 
>>> Thanks
>>> 
>>> Scooter
>>> 
>>> 
>>> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
>>> 
>>>> It's getting ready very slowly. Currently we need:
>>>> 
>>>> * Locations correctly implemented
>>>> ** There's no way of requesting subseqs from them atmo
>>>> * Feature on sequences support
>>>> * Extra attributes which do not fit into top-level attributes
>>>> * Mapping between sequences/assemblies
>>>> * circular location support
>>>> ** so no checks on start being less than end
>>>> * Documentation
>>>> 
>>>> Think that's it off the top of my head
>>>> 
>>>> Andy
>>>> 
>>>> On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
>>>>> BioJava release. As such it would be a good moment to discuss the current
>>>>> status of the various new BioJava 3 modules.
>>>>> 
>>>>> The biojava-structure, biojava-structure-gui modules are essentially ready
>>>>> for release and I started to update the Cookbook with the latest features
>>>>> http://biojava.org/wiki/BioJava:CookBook:PDB:align
>>>>> 
>>>>> Some of the re-factored modules based on biojava 1.7 could be released
>>>>> anytime soon as well. The documentation just needs to be updated to explain
>>>>> where the functionality can be found now (e.g. alignment module)
>>>>> 
>>>>> What about the new code that has been under development since the hackathon?
>>>>> Is it getting release ready slowly? Any plans for documentation? What is
>>>>> missing before we can make the first Biojava 3 release?
>>>>> 
>>>>> Andreas
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>> 
>>>> --
>>>> Andrew Yates                   Ensembl Genomes Engineer
>>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> 
>>> 
>> 
>> -- 
>> Andrew Yates                   Ensembl Genomes Engineer
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>> 
>> 
>> 
>> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From HWillis at scripps.edu  Wed Mar 17 12:09:29 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 12:09:29 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
Message-ID: <D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>

Andy

Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.

I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.

We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
 

Scooter


From HWillis at scripps.edu  Wed Mar 17 12:14:02 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 12:14:02 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <4A0FFCFF-9EAA-4B27-BF11-AFD6D4CFEAE0@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<E0D0833C-5C65-476D-936E-1A300B27A463@scripps.edu>
	<4A0FFCFF-9EAA-4B27-BF11-AFD6D4CFEAE0@ebi.ac.uk>
Message-ID: <D8873897-5329-4FFF-A783-AB875FA995A9@scripps.edu>

Andy

I have two methods that calculate avg introns per gene both ways. Just wasn't sure what the standard is for reporting.

I think features should be part of the core because it is abstract regardless of the source that generated the feature. For the code related to gene prediction work that probably should be in a different package because it is not general. Calling it biojava-geneprediction also doesn't work because it implies gene prediction. 

Scooter

On Mar 17, 2010, at 12:04 PM, Andy Yates wrote:

> Hey mate,
> 
> Sounds good anything with good GFF support is something hard to come by :). So you're going to get it working for the non-generic structures & then push it out into the core modules if I'm reading what you said correctly?
> 
> Add 0 to the percentage & make sure the docs describe what it's doing. Even if a gene has no introns it still affects the average of introns in a genome :).
> 
> All I can think of is "biojava3-features". Not sure what "biojava3-genes" says. Maybe it goes into an "io" package ... say one which goes with an EMBL/Genbank/CHADO formatter maybe. Naming is a horrible thing to have to do. 
> 
> Andy
> 
> On 17 Mar 2010, at 15:52, Scooter Willis wrote:
> 
>> Andy
>> 
>> Working on it at the moment. I am starting with some code I have been using from JavaGene that has a fairly good handle of gff parsing and handling negative strands. I am migrating to a new project called biojava3-genes(local only at the moment) where code related to gff parsing and dealing with various gene prediction program outputs can be used. I need to create a training file for GlimmerHMM so the short term goal is to take a XML blast output of predicted genes that match uniprot and then extract the exon features from DNASequences with exon features added from a gff file. I will then use these validated exon features to create the GlimmerHMM training file. The complexity of exon features with negative strand and frame shifts with the ability to splice together a coding sequence is probably the most complicated feature example we will encounter. After I get through that I will see what can be extended/refactored etc for other more generic features.
>> 
>> I also have some code to gather genome characteristics GC percent, avg gene length, etc. that can be included in the biojava3-genes module. I wanted to see if you know how Average Number of Introns per gene is calculated when a gene has no introns. Do you add a 0 to the average or only include genes with at least one intron in the average?
>> 
>> Can you think of a better name for a package that deals with gff,gff3 parsing and utilities to work with various gene prediction inputs/outputs?
>> 
>> Scooter
>> 
>> 
>> 
>> 
>> 
>> On Mar 17, 2010, at 11:28 AM, Andy Yates wrote:
>> 
>>> I think features are possible & this is really the missing piece of the puzzle with this project. How far on are you with them Scooter?
>>> 
>>> On 16 Mar 2010, at 20:58, Andreas Prlic wrote:
>>> 
>>>> Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts do you think would be feasible to finish,  if we would say we are planning a release  e.g. early May ? We can have a follow-up to this release once the next round of features have been added. Probably it  makes sense to focus on stabilizing what is currently there and documenting it, rather than trying to be feature-complete. Critical features that are still missing should be added of course... 
>>>> 
>>>> Andreas
>>>> 
>>>> On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>> I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract.
>>>> 
>>>> I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.
>>>> 
>>>> I will also plan on migrating the sequence alignment code as well.
>>>> 
>>>> I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization.
>>>> 
>>>> I am planning on attending ISMB/BOSC.
>>>> 
>>>> Do we want to put some deadlines in place with a mini-project plan?
>>>> 
>>>> Thanks
>>>> 
>>>> Scooter
>>>> 
>>>> 
>>>> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
>>>> 
>>>>> It's getting ready very slowly. Currently we need:
>>>>> 
>>>>> * Locations correctly implemented
>>>>> ** There's no way of requesting subseqs from them atmo
>>>>> * Feature on sequences support
>>>>> * Extra attributes which do not fit into top-level attributes
>>>>> * Mapping between sequences/assemblies
>>>>> * circular location support
>>>>> ** so no checks on start being less than end
>>>>> * Documentation
>>>>> 
>>>>> Think that's it off the top of my head
>>>>> 
>>>>> Andy
>>>>> 
>>>>> On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
>>>>>> BioJava release. As such it would be a good moment to discuss the current
>>>>>> status of the various new BioJava 3 modules.
>>>>>> 
>>>>>> The biojava-structure, biojava-structure-gui modules are essentially ready
>>>>>> for release and I started to update the Cookbook with the latest features
>>>>>> http://biojava.org/wiki/BioJava:CookBook:PDB:align
>>>>>> 
>>>>>> Some of the re-factored modules based on biojava 1.7 could be released
>>>>>> anytime soon as well. The documentation just needs to be updated to explain
>>>>>> where the functionality can be found now (e.g. alignment module)
>>>>>> 
>>>>>> What about the new code that has been under development since the hackathon?
>>>>>> Is it getting release ready slowly? Any plans for documentation? What is
>>>>>> missing before we can make the first Biojava 3 release?
>>>>>> 
>>>>>> Andreas
>>>>>> _______________________________________________
>>>>>> biojava-dev mailing list
>>>>>> biojava-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>> 
>>>>> --
>>>>> Andrew Yates                   Ensembl Genomes Engineer
>>>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>> 
>>>> 
>>> 
>>> -- 
>>> Andrew Yates                   Ensembl Genomes Engineer
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>> 
>>> 
>>> 
>>> 
>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 


From andreas at sdsc.edu  Wed Mar 17 13:46:19 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 17 Mar 2010 10:46:19 -0700
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
Message-ID: <59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>

I like biojava-feature as a module name  for the GFF and features related
code. (should we try to keep the module names singular?) Let me know if you
want me to create the module for this...
A

On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:

> Andy
>
> Let me know if you have any major code changes for the core sequencing
> handling that have been or could be checked in. So far I haven't needed to
> touch any of the core sequence code but want to avoid merging code if you
> have made any significant changes.
>
> I should have code to check in today and if we can't come up with a better
> name I will ask Andreas to create a biojava3-genes module and I can then
> check that code in for your review. The current problem is that we have
> ExonSequence extending DNASequence when it could also be described as a
> feature. One way to look at this that a TranscriptSequence is also a feature
> of a DNA sequence and only when you want to have a stand alone class with
> internal links back to parent sequence do you return a TranscriptSequence.
> The TranscriptFeature would have ExonFeature and IntronFeature as children.
> You can ask for a ExonSequence based on the ExonFeature. Once you get a
> ProteinSequence you should be able to reverse the process and get back the
> TranscriptSequence and the corresponding ExonFeatures and some sort of
> mapping from a protein sequence position back to the three DNA sequence
> positions that coded for it. This would need to handle the case where you
> have a the end of an exon and the start of the next exon coding for a
> particular amino acid sequence position.
>
> We also need to add in the ability to have tracks as a way to group
> features. This way you export features based on a particular track as a
> GFF/GFF3 file for importing into various genome browsers. You have one
> genome you are working on with genes added in from three different gene
> prediction algorithms each organized by a track. You should then be able to
> determine overlaps of genes that were predicted and validated via blast
> against uniprot and create another summary track of validated genes and
> non-validate genes. If the feature classes we put together can make this
> easy then I think we will have a solid design.
>
>
> Scooter
>
>

From HWillis at scripps.edu  Wed Mar 17 14:17:59 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 14:17:59 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
Message-ID: <5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>

Andreas

The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.

We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.

Thanks

Scooter


On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:

I like biojava-feature as a module name  for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
A

On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu<mailto:HWillis at scripps.edu>> wrote:
Andy

Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.

I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.

We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.


Scooter


From ayates at ebi.ac.uk  Wed Mar 17 15:24:13 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 17 Mar 2010 19:24:13 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
	<5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
Message-ID: <1077DC26-42AB-4E41-BFA3-DEFD769F4C61@ebi.ac.uk>

biojava-genomes sounds good.

I've done nothing since my last check-in of code which was all to do with locations so there should be no problem there :)

On 17 Mar 2010, at 18:17, Scooter Willis wrote:

> Andreas
> 
> The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.
> 
> We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.
> 
> Thanks
> 
> Scooter
>  
> 
> 
> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
> 
>> I like biojava-feature as a module name  for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
>> A
>> 
>> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>> Andy
>> 
>> Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.
>> 
>> I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.
>> 
>> We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
>> 
>> 
>> Scooter
>> 
>> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From HWillis at scripps.edu  Wed Mar 17 15:58:42 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 15:58:42 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <1077DC26-42AB-4E41-BFA3-DEFD769F4C61@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
	<5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
	<1077DC26-42AB-4E41-BFA3-DEFD769F4C61@ebi.ac.uk>
Message-ID: <9F8616DE-710D-4971-8C63-52C5EB7789C2@scripps.edu>

Andy

Should be use this as our test case http://www.sequenceontology.org/gff3.shtml for a complex example of transcription?

Scooter

On Mar 17, 2010, at 3:24 PM, Andy Yates wrote:

> biojava-genomes sounds good.
> 
> I've done nothing since my last check-in of code which was all to do with locations so there should be no problem there :)
> 
> On 17 Mar 2010, at 18:17, Scooter Willis wrote:
> 
>> Andreas
>> 
>> The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.
>> 
>> We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.
>> 
>> Thanks
>> 
>> Scooter
>> 
>> 
>> 
>> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
>> 
>>> I like biojava-feature as a module name  for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
>>> A
>>> 
>>> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> Andy
>>> 
>>> Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.
>>> 
>>> I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.
>>> 
>>> We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
>>> 
>>> 
>>> Scooter
>>> 
>>> 
>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 


From ayates at ebi.ac.uk  Wed Mar 17 16:01:04 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 17 Mar 2010 20:01:04 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <9F8616DE-710D-4971-8C63-52C5EB7789C2@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
	<5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
	<1077DC26-42AB-4E41-BFA3-DEFD769F4C61@ebi.ac.uk>
	<9F8616DE-710D-4971-8C63-52C5EB7789C2@scripps.edu>
Message-ID: <2A33D045-0AD9-4948-90D3-48636D074514@ebi.ac.uk>

Perfect :). Nothing like using someone else's test case as ours

Andy

On 17 Mar 2010, at 19:58, Scooter Willis wrote:

> Andy
> 
> Should be use this as our test case http://www.sequenceontology.org/gff3.shtml for a complex example of transcription?
> 
> Scooter
> 
> On Mar 17, 2010, at 3:24 PM, Andy Yates wrote:
> 
>> biojava-genomes sounds good.
>> 
>> I've done nothing since my last check-in of code which was all to do with locations so there should be no problem there :)
>> 
>> On 17 Mar 2010, at 18:17, Scooter Willis wrote:
>> 
>>> Andreas
>>> 
>>> The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.
>>> 
>>> We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.
>>> 
>>> Thanks
>>> 
>>> Scooter
>>> 
>>> 
>>> 
>>> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
>>> 
>>>> I like biojava-feature as a module name  for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
>>>> A
>>>> 
>>>> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>> Andy
>>>> 
>>>> Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.
>>>> 
>>>> I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.
>>>> 
>>>> We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
>>>> 
>>>> 
>>>> Scooter
>>>> 
>>>> 
>>> 
>> 
>> -- 
>> Andrew Yates                   Ensembl Genomes Engineer
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>> 
>> 
>> 
>> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From andreas at sdsc.edu  Wed Mar 17 18:14:40 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 17 Mar 2010 15:14:40 -0700
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
	<5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
Message-ID: <59a41c431003171514u1357ecf1ndab75fa4d461124e@mail.gmail.com>

ok, a new module biojava3-genome is now in SVN...
A

On Wed, Mar 17, 2010 at 11:17 AM, Scooter Willis <HWillis at scripps.edu>wrote:

> Andreas
>
> The problem with putting feature classes in a separate module is that
> biojava-core sequences would then have a dependency on biojava-feature. A
> sequence needs to hold a collection of features so feature classes need to
> go in core. If features are created from gff the core module doesn't care
> where features come from.
>
> We could go with biojava-genomes and code related to dealing with genomes
> goes in that module. If you like biojava-genome or biojava-genomes go ahead
> and create it and email me so I can check it out.
>
> Thanks
>
> Scooter
>
>
>
> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
>
> I like biojava-feature as a module name  for the GFF and features related
> code. (should we try to keep the module names singular?) Let me know if you
> want me to create the module for this...
> A
>
> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu>wrote:
>
>> Andy
>>
>> Let me know if you have any major code changes for the core sequencing
>> handling that have been or could be checked in. So far I haven't needed to
>> touch any of the core sequence code but want to avoid merging code if you
>> have made any significant changes.
>>
>> I should have code to check in today and if we can't come up with a better
>> name I will ask Andreas to create a biojava3-genes module and I can then
>> check that code in for your review. The current problem is that we have
>> ExonSequence extending DNASequence when it could also be described as a
>> feature. One way to look at this that a TranscriptSequence is also a feature
>> of a DNA sequence and only when you want to have a stand alone class with
>> internal links back to parent sequence do you return a TranscriptSequence.
>> The TranscriptFeature would have ExonFeature and IntronFeature as children.
>> You can ask for a ExonSequence based on the ExonFeature. Once you get a
>> ProteinSequence you should be able to reverse the process and get back the
>> TranscriptSequence and the corresponding ExonFeatures and some sort of
>> mapping from a protein sequence position back to the three DNA sequence
>> positions that coded for it. This would need to handle the case where you
>> have a the end of an exon and the start of the next exon coding for a
>> particular amino acid sequence position.
>>
>> We also need to add in the ability to have tracks as a way to group
>> features. This way you export features based on a particular track as a
>> GFF/GFF3 file for importing into various genome browsers. You have one
>> genome you are working on with genes added in from three different gene
>> prediction algorithms each organized by a track. You should then be able to
>> determine overlaps of genes that were predicted and validated via blast
>> against uniprot and create another summary track of validated genes and
>> non-validate genes. If the feature classes we put together can make this
>> easy then I think we will have a solid design.
>>
>>
>> Scooter
>>
>>
>
>

From heuermh at acm.org  Wed Mar 17 23:28:23 2010
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 17 Mar 2010 22:28:23 -0500 (EST)
Subject: [Biojava-dev] Hackathon in Boston, July 2010
In-Reply-To: <5FC2D8EC-5408-4126-9A7D-CB6B3500B61C@eaglegenomics.com>
Message-ID: <Pine.GSO.4.44.1003172227210.25986-100000@shell3.shore.net>

On Mon, 15 Mar 2010, Richard Holland wrote:

> Hi all,
>
> Following the successful hackathon in Cambridge earlier this year, it was originally planned to hold a second one in Boston in conjunction with BOSC in order to give those who couldn't make it to the UK a chance to get involved.
>
> However, OBF have beaten us to it by organising a cross-project CodeFest!
>
>  http://www.open-bio.org/wiki/Codefest_2010
>
> It would be great for BioJava people to get involved with this cross-project hackathon effort, and it saves organising one of our own! :)

Yep, I'm already signed up.  Look forward to seeing some of you there.

   michael


From andreas at sdsc.edu  Thu Mar 18 16:36:38 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 18 Mar 2010 13:36:38 -0700
Subject: [Biojava-dev] Google summer of code
Message-ID: <59a41c431003181336i33d388aak4b5a26e11ee4161b@mail.gmail.com>

Hi,

It seems our (the Open Biology Foundation's) Google Summer of Code
application has been accepted.
http://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010

As such we are now looking for an interested and skilled student to work on
the BioJava multiple sequence alignment project. Take a look at the project
description, and if you think you are up for the challenge, send me an email
with your application.

http://biojava.org/wiki/Google_Summer_of_Code

Andreas

From andreas at sdsc.edu  Tue Mar 23 20:33:09 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 23 Mar 2010 17:33:09 -0700
Subject: [Biojava-dev] GSoC update
Message-ID: <59a41c431003231733t1e259753k55fbe0a8bfb801a3@mail.gmail.com>

Hi,

A quick update regarding the current status of our Google Summer of Code
project: Several students already have expressed their interest. In fact the
response was so good that I believe BioJava should try to run more than just
one project.  In the meanwhile we added another "mentor proposed" project to
our GSoC page : http://biojava.org/wiki/Google_Summer_of_Code . Identification
and Classification of Posttranslational Modification of Proteins:  Develop a
Postranslational Modification package for the BioJava project.

In general Google strongly encourages to have student-proposed projects,
since historically those are often the most successful GSoC projects. It is
recommended that students contact us / possible mentors prior to their
application so we can match up students with suitable mentors and projects
and we can help in solidifying your project ideas. In principle any BioJava
contributor is suitable as a mentor. Students can apply between March 22nd
and April 9th via the google web site. http://socghop.appspot.com/

Andreas

From biopython at maubp.freeserve.co.uk  Wed Mar 24 10:51:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Mar 2010 14:51:46 +0000
Subject: [Biojava-dev] [Bioperl-l] Fwd: [Utilities-announce] NCBI
	Revised E-utility Usage Policy
In-Reply-To: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov>
	<320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
	<38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>
Message-ID: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>

On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> On Mar 24, 2010, at 9:08 AM, Peter wrote:
>
>> Hi,
>>
>> This is probably of interest to all the Bio* projects offering access
>> to the NCBI Entrez utilities. See forwarded message below.
>>
>> I *think* the new guidelines basically say that the email & tool parameters are
>> optional BUT if your IP address ever gets banned for excessive use you then
>> have to register an email & tool combination.
>>
>> Regarding the email address, the NCBI say to use the email of the developer
>> (not the end user). However, they do not distinguish between the developers
>> of a library (like us), and the developers of an application or script using a
>> library (who may also be the end user).
>>
>> Currently we (Biopython) and I think BioPerl ask developers using our libraries
>> to populate the email address themselves. I *think* this is still the
>> right action.
>>
>> Peter
>
>
> Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I
> think with the SOAP-based ones as well). ?We're providing a specific set of
> tools for user to write up their own applications end applications. ?I can try
> contacting them regarding this to get an official response to clarify this
> somewhat.

Please give the NCBI an email - you can CC me too if you like.

> Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a
> default, but always leave the email blank and issue a warning if it isn't
> set. ?We could just as easily leave both blank and issue warnings for both.

We currently leave out the email and set the tool parameter to "Biopython"
by default but this can be overridden. Currently leaving out the email does
cause Biopython to give a warning.

Peter


From cjfields at illinois.edu  Wed Mar 24 10:37:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Mar 2010 09:37:13 -0500
Subject: [Biojava-dev] [Bioperl-l] Fwd: [Utilities-announce] NCBI
	Revised E-utility Usage Policy
In-Reply-To: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov>
	<320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
Message-ID: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>

On Mar 24, 2010, at 9:08 AM, Peter wrote:

> Hi,
> 
> This is probably of interest to all the Bio* projects offering access
> to the NCBI
> Entrez utilities. See forwarded message below.
> 
> I *think* the new guidelines basically say that the email & tool parameters are
> optional BUT if your IP address ever gets banned for excessive use you then
> have to register an email & tool combination.
> 
> Regarding the email address, the NCBI say to use the email of the developer
> (not the end user). However, they do not distinguish between the developers
> of a library (like us), and the developers of an application or script using a
> library (who may also be the end user).
> 
> Currently we (Biopython) and I think BioPerl ask developers using our libraries
> to populate the email address themselves. I *think* this is still the
> right action.
> 
> Peter


Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I think with the SOAP-based ones as well).  We're providing a specific set of tools for user to write up their own applications end applications.  I can try contacting them regarding this to get an official response to clarify this somewhat.

Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a default, but always leave the email blank and issue a warning if it isn't set.  We could just as easily leave both blank and issue warnings for both.

chris


> ---------- Forwarded message ----------
> From:  <utilities-announce at ncbi.nlm.nih.gov>
> Date: Wed, Mar 24, 2010 at 1:53 PM
> Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy
> To: NLM/NCBI List utilities-announce <utilities-announce at ncbi.nlm.nih.gov>
> 
> 
> New E-utility documentation now on the NCBI Bookshelf
> 
> The Entrez Programming Utilities (E-Utilities) Help documentation has
> been added to the NCBI Bookshelf, and so is now fully integrated with
> the Entrez search and retrieval system as a part of the Bookshelf
> database. This help document has been divided into chapters for better
> organization and includes several new sample Perl scripts. At present
> this book covers the standard URL interface for the E-utilties;
> material about the SOAP interface will be added soon and is still
> available at the same URL:
> http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html.
> 
> 
> 
> Revised E-utility usage policy
> 
> In December, 2009 NCBI announced a change to the usage policy for the
> E-utilities that would require all requests to contain non-null values
> for both the &email and &tool parameters. After several consultations
> with our users and developers, we have decided to revise this policy
> change, and the revised policy is described in detail at the following
> link:
> 
> http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen
> 
> Please let us know if you have any questions or concerns about this
> policy change.
> 
> 
> 
> Thank you,
> 
> The E-Utilities Team
> 
> NIH/NLM/NCBI
> 
> eutilities at ncbi.nlm.nih.gov.
> 
> 
> 
> _______________________________________________
> Utilities-announce mailing list
> http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce
> <ATT00001.txt>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Wed Mar 24 11:27:37 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Wed, 24 Mar 2010 11:27:37 -0400
Subject: [Biojava-dev] [Open-bio-l] [Bioperl-l] Fwd:
	[Utilities-announce] NCBI Revised E-utility Usage Policy
In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov>
	<320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
	<38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>
	<320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
Message-ID: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net>


On Mar 24, 2010, at 10:51 AM, Peter wrote:

> Please give the NCBI an email - you can CC me too if you like.


Can't this be the developers' mailing list (or lists, the appropriate  
one for each toolkit)? We can even whitelist all NCBI sender addresses  
so they can easily email us if there are issues.

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From cjfields at illinois.edu  Wed Mar 24 11:44:21 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Mar 2010 10:44:21 -0500
Subject: [Biojava-dev] [Bioperl-l] Fwd: [Utilities-announce] NCBI
	Revised E-utility Usage Policy
In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov>
	<320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
	<38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>
	<320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
Message-ID: <338BDDD8-2A66-4086-BFB7-35EC8F8F0D66@illinois.edu>


On Mar 24, 2010, at 9:51 AM, Peter wrote:

> On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> 
>> On Mar 24, 2010, at 9:08 AM, Peter wrote:
>> 
>>> Hi,
>>> 
>>> This is probably of interest to all the Bio* projects offering access
>>> to the NCBI Entrez utilities. See forwarded message below.
>>> 
>>> I *think* the new guidelines basically say that the email & tool parameters are
>>> optional BUT if your IP address ever gets banned for excessive use you then
>>> have to register an email & tool combination.
>>> 
>>> Regarding the email address, the NCBI say to use the email of the developer
>>> (not the end user). However, they do not distinguish between the developers
>>> of a library (like us), and the developers of an application or script using a
>>> library (who may also be the end user).
>>> 
>>> Currently we (Biopython) and I think BioPerl ask developers using our libraries
>>> to populate the email address themselves. I *think* this is still the
>>> right action.
>>> 
>>> Peter
>> 
>> 
>> Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I
>> think with the SOAP-based ones as well).  We're providing a specific set of
>> tools for user to write up their own applications end applications.  I can try
>> contacting them regarding this to get an official response to clarify this
>> somewhat.
> 
> Please give the NCBI an email - you can CC me too if you like.

Sent, have cc'd the open-bio list.  Don't want to cross-post this too much, so I think we should move the discussion there.

>> Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a
>> default, but always leave the email blank and issue a warning if it isn't
>> set.  We could just as easily leave both blank and issue warnings for both.
> 
> We currently leave out the email and set the tool parameter to "Biopython"
> by default but this can be overridden. Currently leaving out the email does
> cause Biopython to give a warning.
> 
> Peter

We follow the same, then (down to the warning).  This is mentioned in my post to them, I'll wait to see what they say.  

My concern is the wording of the new rules.  Each tool and email must be registered with them if an IP is blocked.  Does this mean each tool is assigned one specific email?  And an IP that is blocked can register it to be allowed back into the fold?  With that in mind, should we register each of our toolkits with them?  Probably not a bad thing (it might help us as devs to get an idea of use), but then if one user abuses the rules will their actions affect all toolkit users?  Is this all done on a per-IP basis, per-toolkit basis, etc?  

Unfortunately, at least to me, none of this is made very clear, so I'm hoping there is some clarification from their end.

chris

From maj at fortinbras.us  Wed Mar 24 12:37:56 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 24 Mar 2010 12:37:56 -0400
Subject: [Biojava-dev] [Bioperl-l] [Open-bio-l] Fwd:
	[Utilities-announce] NCBI RevisedE-utility Usage Policy
In-Reply-To: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov><320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
	<5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net>
Message-ID: <B6692F38693D41B3BE76FF47F227D257@NewLife>

I think this is a great idea--- MAJ
----- Original Message ----- 
From: "Hilmar Lapp" <hlapp at drycafe.net>
To: "Peter" <biopython at maubp.freeserve.co.uk>
Cc: <bioruby at lists.open-bio.org>; "Biopython-Dev Mailing List" 
<biopython-dev at biopython.org>; <biojava-dev at lists.open-bio.org>; "bioperl-l 
list" <bioperl-l at lists.open-bio.org>; "Chris Fields" <cjfields at illinois.edu>; 
<open-bio-l at lists.open-bio.org>
Sent: Wednesday, March 24, 2010 11:27 AM
Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI 
RevisedE-utility Usage Policy


>
> On Mar 24, 2010, at 10:51 AM, Peter wrote:
>
>> Please give the NCBI an email - you can CC me too if you like.
>
>
> Can't this be the developers' mailing list (or lists, the appropriate  one for 
> each toolkit)? We can even whitelist all NCBI sender addresses  so they can 
> easily email us if there are issues.
>
> -hilmar
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From sheoran143 at gmail.com  Wed Mar 24 21:19:29 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Wed, 24 Mar 2010 20:19:29 -0500
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with subject
 :( Hibernate Exception and suggestion for change in BioSqlSchema)
Message-ID: <4BAABA21.4000301@gmail.com>

I am writing this email again, I didn't get any response weather this 
bugs are patched or are they lost some where on mailing list. I am not 
sure that's why I am writing this back. I don't know how to apply this 
patch So I am counting on you guys to apply theses patch and reply me 
back so I know its fixed.


Thanks
Deepak Sheoran


Hi
In response to bug fix suggested by Richard I have created some patches. 
We need to apply these to fix biojava from processing references from a 
genbank record in a wrong manner which cause more hibernate exceptions. 
After applying patch, reference resolution code will test pubmed or 
medline id, then if no match then test author/title/location, then if 
still no match create a new reference. I even tested it with 
GenbankRelease 175 and I gained almost 3159 more records in my database.

Can somebody please have a look on second issue of it and fix it
"

2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

"

Also I am planning on making a bridge between biosql database loaded 
using bioperl and biojava, here is my some of the investigation can you 
guys suggest some direction on it.
Have a look on attached files
1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank 
record is stored in biosql instance by bioperl and biojava
2) GenbankRecord.doc  ==> its word document having a genbank showing 
where its information goes in biosql using bioperl and biojava
3) BioSqlRichobjectBuilder.patch ==> patch needed for 
BioSqlRichObjectBuild.java class
4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class


Thanks
Deepak Sheoran


-------- Original Message --------
Subject: 	Re: Hibernate Exception and suggestion for change in BioSqlSchema
Date: 	Tue, 9 Feb 2010 20:34:32 +1300
From: 	Richard Holland <holland at eaglegenomics.com>
To: 	Deepak Sheoran <sheoran143 at gmail.com>
CC: 	biojava-l at biojava.org


Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.

However, in answer to your two questions:

   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).

   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

cheers,
Richard

On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:

>
>  Hi Richard
>
>  Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
>
>
>  Thanks
>  Deepak Sheoran
>  -------- Original Message --------
>  Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
>  Date:	Wed, 03 Feb 2010 08:07:35 -0600
>  From:	Deepak Sheoran<sheoran143 at gmail.com>
>  To:	biojava-l at lists.open-bio.org
>
>  Hi guys,
>
>  A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
>  On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
>  	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
>  This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
>   Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
>  But problem is with below part of that method:
>  ?..LineNumber: 114
>  else if (SimpleDocRef.class.isAssignableFrom(clazz))
>   {                queryType = "DocRef";
>                  // convert List constructor to String representation for query
>                  ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
>                  if (ourParamsList.size()<3) {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
>                  } else {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
>                  }
>   }
>  ..LineNubmer: 123
>  Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
>  ?.LineNumber: 447
>  else {
>                                          try {
>                                              CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
>                                              RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
>                                              rlistener.getCurrentFeature().addRankedCrossRef(rcr);
>                                          } catch (ChangeVetoException e) {
>                                              throw new ParseException(e+", accession:"+accession);
>                                          }
>                                      }
>                      ?..LineNumber:455
>  Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
>
>  The only way to get these record in database is:
>  		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
>  		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
>
>  Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
>  Reference_id
>  Dbxref_id
>  Location
>  Title
>  Authors
>  crc
>  216
>  18554304
>  FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  9E940E01F4BE3CD0
>  230
>  18554304
>  FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  D3BC0C17F3F786C9
>  415
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  60AEDFA0CEEACC38
>  969
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  4B1232999F6E8130
>  929
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  3E79B40DD2AAA2B7
>  932
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  094EB3384F8D6DE8
>  1426
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
>  357648D8FD8C6C8A
>  1481
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
>  115411EB2DEE5654
>  1497
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D5D376EECCD186B
>  1501
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D57954EECDED66B
>  1556
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  698688FB6DB95247
>  1559
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  E25E1BA99DB18F3D
>
>  	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>  		? Which means in richsequence object some feature have location object which have its feature set to null.
>  		? My Observation:
>  			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
>  			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
>  			? Below is the screen shot of one of my tests
>  				? Settings before trying to persits the richsequence object to database
>
>  <Mail Attachment.png>
>  		?
>  		? After trying to persits the richsequence object to database and got in hibernate exception catch
>
>  		?<Mail Attachment.png>
>
>  		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
>  		? Some extra information to make things more clear to you guys.
>  			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
>  				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
>  					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
>  				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
>  					? richSequence.feature Index : 127 and line number in the genbank record : 2137
>  				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
>  					? richSequence.feature Index : 389 and line number in the genbank record : 3632
>  				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
>  					? richSequence.feature Index : 47 and line number in the genbank record : 4841
>  				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
>  					? richSequence.feature Index : 45 and line number in the genbank record : 442
>  		? The complete exception msg :
>  org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>          at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
>          at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
>          at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
http://www.eaglegenomics.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Biojava_BioPerl_diff.xls
Type: application/vnd.ms-excel
Size: 346624 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20100324/7ecffa4a/attachment-0001.xls>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: BioSqlRichObjectBuilder.patch
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20100324/7ecffa4a/attachment-0002.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: GenbankFormat.patch
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20100324/7ecffa4a/attachment-0003.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GenbankRecord.doc
Type: application/msword
Size: 59392 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20100324/7ecffa4a/attachment-0001.doc>

From holland at eaglegenomics.com  Thu Mar 25 12:27:17 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 25 Mar 2010 16:27:17 +0000
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with
	subject :( Hibernate Exception and suggestion for change in
	BioSqlSchema)
In-Reply-To: <4BAABA21.4000301@gmail.com>
References: <4BAABA21.4000301@gmail.com>
Message-ID: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>

Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL.

On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:

> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed.
> 
> 
> 
> Thanks
> Deepak Sheoran
> 
> 
> Hi
> In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database.
> 
> Can somebody please have a look on second issue of it and fix it
> "
> 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> "
> 
> Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it.
> Have a look on attached files 
> 1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava
> 2) GenbankRecord.doc  ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava
> 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class
> 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
> 
> 
> Thanks
> Deepak Sheoran
> 
> 
> 
> -------- Original Message --------
> Subject:	Re: Hibernate Exception and suggestion for change in BioSqlSchema
> Date:	Tue, 9 Feb 2010 20:34:32 +1300
> From:	Richard Holland <holland at eaglegenomics.com>
> To:	Deepak Sheoran <sheoran143 at gmail.com>
> CC:	biojava-l at biojava.org
> 
> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.
> 
> However, in answer to your two questions:
> 
>   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).
> 
>   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> 
> cheers,
> Richard
> 
> On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
> 
> > 
> > Hi Richard
> > 
> > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
> > 
> > 
> > Thanks
> > Deepak Sheoran
> > -------- Original Message --------
> > Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
> > Date:	Wed, 03 Feb 2010 08:07:35 -0600
> > From:	Deepak Sheoran 
> <sheoran143 at gmail.com>
> 
> > To:	
> biojava-l at lists.open-bio.org
> 
> > 
> > Hi guys,
> > 
> > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:  
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
> 
> > On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
> > 	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
> > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
> >  Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
> > But problem is with below part of that method:
> > ?..LineNumber: 114
> > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> >  {                queryType = "DocRef";
> >                 // convert List constructor to String representation for query
> >                 ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> >                 if (ourParamsList.size()<3) {
> >                         queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
> >                 } else {
> >                         queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
> >                 }       
> >  }
> > ..LineNubmer: 123
> > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
> > ?.LineNumber: 447
> > else {
> >                                         try {
> >                                             CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
> >                                             RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
> >                                             rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> >                                         } catch (ChangeVetoException e) {
> >                                             throw new ParseException(e+", accession:"+accession);
> >                                         }
> >                                     }
> >                     ?..LineNumber:455
> > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
> >  
> > The only way to get these record in database is:
> > 		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
> > 		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
> >  
> > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
> > Reference_id
> > Dbxref_id         
> > Location
> > Title
> > Authors
> > crc
> > 216
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > 9E940E01F4BE3CD0
> > 230
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > D3BC0C17F3F786C9
> > 415
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 60AEDFA0CEEACC38
> > 969
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 4B1232999F6E8130
> > 929
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 3E79B40DD2AAA2B7
> > 932
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 094EB3384F8D6DE8
> > 1426
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
> > 357648D8FD8C6C8A
> > 1481
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > 115411EB2DEE5654
> > 1497
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D5D376EECCD186B
> > 1501
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D57954EECDED66B
> > 1556
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > 698688FB6DB95247
> > 1559
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > E25E1BA99DB18F3D
> >  
> > 	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> > 		? Which means in richsequence object some feature have location object which have its feature set to null.
> > 		? My Observation:
> > 			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
> > 			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
> > 			? Below is the screen shot of one of my tests
> > 				? Settings before trying to persits the richsequence object to database
> >  
> > <Mail Attachment.png>
> > 		?  
> > 		? After trying to persits the richsequence object to database and got in hibernate exception catch
> >  
> > 		? <Mail Attachment.png>
> >  
> > 		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
> > 		? Some extra information to make things more clear to you guys.
> > 			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
> > 				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
> > 					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
> > 				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
> > 					? richSequence.feature Index : 127 and line number in the genbank record : 2137
> > 				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
> > 					? richSequence.feature Index : 389 and line number in the genbank record : 3632
> > 				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
> > 					? richSequence.feature Index : 47 and line number in the genbank record : 4841
> > 				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
> > 					? richSequence.feature Index : 45 and line number in the genbank record : 442
> > 		? The complete exception msg :
> > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> >         at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> >         at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> >         at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> >         at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> >         at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> >         at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> >         at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> >         at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> >         at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> >         at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> >         at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> >  
> >  
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: 
> holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 
> 
> 
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas at sdsc.edu  Thu Mar 25 12:47:45 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 25 Mar 2010 09:47:45 -0700
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with
	subject :( Hibernate Exception and suggestion for change in
	BioSqlSchema)
In-Reply-To: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
Message-ID: <59a41c431003250947g6ecd11cbw21c5be5858b9aa09@mail.gmail.com>

Excellent, thanks Richard and Deepak!
Andreas

On Thu, Mar 25, 2010 at 9:27 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Patched and in subversion on the head in the new Biojava 3 code. I modified
> the code slightly to simplify it. There were also parallel changes required
> over in SimpleDocRef itself to enable it to continue working without being
> connected to BioSQL.
>
> On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:
>
> > I am writing this email again, I didn't get any response weather this
> bugs are patched or are they lost some where on mailing list. I am not sure
> that's why I am writing this back. I don't know how to apply this patch So I
> am counting on you guys to apply theses patch and reply me back so I know
> its fixed.
> >
> >
> >
> > Thanks
> > Deepak Sheoran
> >
> >
> > Hi
> > In response to bug fix suggested by Richard I have created some patches.
> We need to apply these to fix biojava from processing references from a
> genbank record in a wrong manner which cause more hibernate exceptions.
> After applying patch, reference resolution code will test pubmed or medline
> id, then if no match then test author/title/location, then if still no match
> create a new reference. I even tested it with GenbankRelease 175 and I
> gained almost 3159 more records in my database.
> >
> > Can somebody please have a look on second issue of it and fix it
> > "
> > 2. I think that's a bug (compound locations with null features) but not
> sure why. Could be that the process of constructing a CompoundRichLocation
> is somehow losing the feature reference from the original
> SimpleRichLocation. Again I can't investigate until March - can someone else
> take a look at the code? (A good starting point would be to look at how a
> CompoundRichLocation decides to select the feature from the
> SimpleRichLocations it is made up from).
> > "
> >
> > Also I am planning on making a bridge between biosql database loaded
> using bioperl and biojava, here is my some of the investigation can you guys
> suggest some direction on it.
> > Have a look on attached files
> > 1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank
> record is stored in biosql instance by bioperl and biojava
> > 2) GenbankRecord.doc  ==> its word document having a genbank showing
> where its information goes in biosql using bioperl and biojava
> > 3) BioSqlRichobjectBuilder.patch ==> patch needed for
> BioSqlRichObjectBuild.java class
> > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
> >
> >
> > Thanks
> > Deepak Sheoran
> >
> >
> >
> > -------- Original Message --------
> > Subject:      Re: Hibernate Exception and suggestion for change in
> BioSqlSchema
> > Date: Tue, 9 Feb 2010 20:34:32 +1300
> > From: Richard Holland <holland at eaglegenomics.com>
> > To:   Deepak Sheoran <sheoran143 at gmail.com>
> > CC:   biojava-l at biojava.org
> >
> > Hi. It's possible that your original email didn't make it to the list
> because it is HTML format, and the list only accepts plain text.
> >
> > However, in answer to your two questions:
> >
> >   1. The code that does the resolution of references might be better if
> it looks up existing IDs rather than using author, title, location to
> identify existing records. I would suggest modifying it to a three-step
> process - test ID, then if no match then test author/title/location, then if
> still no match create a new reference. Could someone do that? (I'm unable to
> do anything until late March).
> >
> >   2. I think that's a bug (compound locations with null features) but not
> sure why. Could be that the process of constructing a CompoundRichLocation
> is somehow losing the feature reference from the original
> SimpleRichLocation. Again I can't investigate until March - can someone else
> take a look at the code? (A good starting point would be to look at how a
> CompoundRichLocation decides to select the feature from the
> SimpleRichLocations it is made up from).
> >
> > cheers,
> > Richard
> >
> > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
> >
> > >
> > > Hi Richard
> > >
> > > Below is the email which I sent to Biojava-1 mailing list but it never
> get posted on the mailing list server neither do i got any response, so
> please have a look on this email and tell what can be the solution of the
> problem described in the message.
> > >
> > >
> > > Thanks
> > > Deepak Sheoran
> > > -------- Original Message --------
> > > Subject:    Hibernate Exception and suggestion for change in
> BioSqlSchema
> > > Date:       Wed, 03 Feb 2010 08:07:35 -0600
> > > From:       Deepak Sheoran
> > <sheoran143 at gmail.com>
> >
> > > To:
> > biojava-l at lists.open-bio.org
> >
> > >
> > > Hi guys,
> > >
> > > A couple of days back I was having some problem with hibernate
> exception but that exception got resolved and the reference to that email
> is:
> >
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
> >
> > > On Richard  suggestion in above link  I am able to resolve some of
>  issues but then, I got stuck in to some other error with hibernate and then
> decided to investigate the matter and below are some facts and information
> which I found and I guess it is going to affect all of us.
> > >     ? The "Reference" table in bioSql schema have unique constraint on
> "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)).
> Which mean only one entry in reference table can use on dbxref_id.
> > > This Works wells but in cases when you have little variation in value
> of following column "location", "title", "authors" and all these variation
> refers to same PUBMED_ID. Then we can't persist or create a richsequence
> object .
> > >  Now when you tie RichObjectFactory to a  active hibernate session then
> the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class
> clazz, List paramsList) " which is responsible  for looking up details of
> object in the database and if it find one then it will return that object,
> else it will try to persist the new object into the database.
> > > But problem is with below part of that method:
> > > ?..LineNumber: 114
> > > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> > >  {                queryType = "DocRef";
> > >                 // convert List constructor to String representation
> for query
> > >                 ourParamsList.set(0,
> DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> > >                 if (ourParamsList.size()<3) {
> > >                         queryText = "from DocRef as cr where cr.authors
> = ? and cr.location = ? and cr.title is null";
> > >                 } else {
> > >                         queryText = "from DocRef as cr where cr.authors
> = ? and cr.location = ? and cr.title = ?";
> > >                 }
> > >  }
> > > ..LineNubmer: 123
> > > Now when hibernate search the database, it won't find any other record
> in "reference" table because those two record are different in string
> comparison, so it will return a new object back to "GenbankFormat" to
> following piece of code
> > > ?.LineNumber: 447
> > > else {
> > >                                         try {
> > >                                             CrossRef cr =
> (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new
> Object[]{dbname, raccession, new Integer(0)});
> > >                                             RankedCrossRef rcr = new
> SimpleRankedCrossRef(cr, ++rcrossrefCount);
> > >
> rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> > >                                         } catch (ChangeVetoException e)
> {
> > >                                             throw new
> ParseException(e+", accession:"+accession);
> > >                                         }
> > >                                     }
> > >                     ?..LineNumber:455
> > > Then we will add that object to rlistener. And move to next part of
> genbank record and then biojava search for a new crossref in database and it
> will try to persist the old one it get a hibernate exception regarding
> violation of  "unique constraint on dbxref_id" column.
> > >
> > > The only way to get these record in database is:
> > >             ? The very easy solution and the way I did it for testing
> my theory is Change the bioSql schema so that it can allow many to one on
> relation between "reference" and "dbxref" table.  Which even make sense
> because one paper can have many different variation of naming, and this
> change allow us to store that info too. But this is something BioSql people
> have decide and I don't know how to approach them.
> > >             ? Second solution is slightly difficult to implement, is to
> change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List
> paramsList)"  make decision about weather a particular DocRef already exist
> in database or not. I am mean testing all possible string variations of
> authors, location, title of the docRef which we are searching. Which does
> have many complications and may slow down process of creating a richsequence
> object when link RichObjectFactory with a active hibernate session.
> > >
> > > Example:Below is a sample of what i have in my local biosql schema
> which has modification suggested by me. (dbxref_id column have Pubmed_id , I
> replaced the local dbxref_id which was present on this table in my database
> with pubmed_id stored in "dbxref" table, for easy reference with outside
> world in this email)
> > > Reference_id
> > > Dbxref_id
> > > Location
> > > Title
> > > Authors
> > > crc
> > > 216
> > > 18554304
> > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536
> (2008)
> > > Isolation of lactate-utilizing butyrate-producing bacteria from human
> feces and in vivo administration of Anaerostipes caccae strain L2 and
> galacto-oligosaccharides in a rat model
> > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y.,
> Nomoto,K., Ito,M. and Sawada,H.
> > > 9E940E01F4BE3CD0
> > > 230
> > > 18554304
> > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > > Isolation of lactate-utilizing butyrate-producing bacteria from human
> feces and in vivo administration of Anaerostipes caccae strain L2 and
> galacto-oligosaccharides in a rat model
> > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y.,
> Nomoto,K., Ito,M. and Sawada,H.
> > > D3BC0C17F3F786C9
> > > 415
> > > 16790744
> > > Infect. Immun. 74 (7), 3715-3726 (2006)
> > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is
> Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via
> Recombination with Repetitive Chromosomal Sequences
> > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and
> Totten,P.A.
> > > 60AEDFA0CEEACC38
> > > 969
> > > 16790744
> > > Infect. Immun. 74 (7), 3715-3726 (2006)
> > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is
> extensive in vitro and in vivo and suggests that variation is generated via
> recombination with repetitive chromosomal sequences
> > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and
> Totten,P.A.
> > > 4B1232999F6E8130
> > > 929
> > > 8688087
> > > Science 273 (5278), 1058-1073 (1996)
> > > Complete genome sequence of the methanogenic archaeon, Methanococcus
> jannaschii
> > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D.,
> Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D.,
> Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I.,
> Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A.,
> Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A.,
> Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W.,
> Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P.,
> Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and
> Venter,J.C.
> > > 3E79B40DD2AAA2B7
> > > 932
> > > 8688087
> > > Science 273 (5278), 1058-1073 (1996)
> > > Complete genome sequence of the methanogenic archaeon, Methanococcus
> jannaschii
> > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D.,
> Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D.,
> Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I.,
> Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A.,
> Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T.,
> Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C.,
> Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M.,
> Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > > 094EB3384F8D6DE8
> > > 1426
> > > 10684935
> > > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae
> AR39
> > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O.,
> Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S.,
> Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M.,
> Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and
> Fraser,C.M.
> > > 357648D8FD8C6C8A
> > > 1481
> > > 10684935
> > > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae
> AR39
> > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O.,
> Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K.,
> Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W.,
> DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > > 115411EB2DEE5654
> > > 1497
> > > 14689165
> > > Arch. Microbiol. 181 (2), 144-154 (2004)
> > > The effect of FITA mutations on the symbiotic properties of
> Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R.,
> del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G.
> and Ruiz-Sainz,J.E.
> > > 4D5D376EECCD186B
> > > 1501
> > > 14689165
> > > Arch. Microbiol. 181 (2), 144-154 (2004)
> > > The effect of FITA mutations on the symbiotic properties of
> Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R.,
> Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G.
> and Ruiz-Sainz,J.E.
> > > 4D57954EECDED66B
> > > 1556
> > > 18060065
> > > PLoS ONE 2 (12), E1271 (2007)
> > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4
> and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C.,
> Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > > 698688FB6DB95247
> > > 1559
> > > 18060065
> > > PLoS ONE 2 (12), E1271 (2007)
> > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4
> and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C.,
> Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > > E25E1BA99DB18F3D
> > >
> > >     ? The second kind of error which I got was :
> org.hibernate.PropertyValueException: not-null property references a null or
> transient value: Location.feature
> > >             ? Which means in richsequence object some feature have
> location object which have its feature set to null.
> > >             ? My Observation:
> > >                     ? Usually occur when you try to persist a
> richsequence object to database, and occur to those features which have
> CompoundRichLocation usually "joins" and "complement" in cds region of a
> genbank record
> > >                     ? After catching the hibernate exception I went
> through all the features and either biojava or hibernate  changed the object
> type of a CompoundRichLocation  to SimpleRichLocation and set the feature
> variable to null.
> > >                     ? Below is the screen shot of one of my tests
> > >                             ? Settings before trying to persits the
> richsequence object to database
> > >
> > > <Mail Attachment.png>
> > >             ?
> > >             ? After trying to persits the richsequence object to
> database and got in hibernate exception catch
> > >
> > >             ? <Mail Attachment.png>
> > >
> > >             ? So my question is why is this happening and how to stop
> or how to get these record into database, I have no clue why is this
> happening.
> > >             ? Some extra information to make things more clear to you
> guys.
> > >                     ? Below are some Locus line from genbank record for
> which I know the error of location, I mean the cds region causing error, and
> array index in richsequence.feature arrayList object.
> > >                             ? LOCUS       AE001439             1643831
> bp    DNA     circular BCT 19-JAN-2006
> > >                                     ? richSequence.feature Index : 2540
> and line number in the genbank record : 22115
> > >                             ? LOCUS       CP001189             3887492
> bp    DNA     circular BCT 16-OCT-2008
> > >                                     ? richSequence.feature Index : 127
> and line number in the genbank record : 2137
> > >                             ? LOCUS       CP001292              328635
> bp    DNA     circular BCT 17-DEC-2008
> > >                                     ? richSequence.feature Index : 389
> and line number in the genbank record : 3632
> > >                             ? LOCUS       AM279694              238517
> bp    DNA     linear   BCT 23-OCT-2008
> > >                                     ? richSequence.feature Index : 47
> and line number in the genbank record : 4841
> > >                             ? LOCUS       CR931663               18517
> bp    DNA     linear   BCT 18-SEP-2008
> > >                                     ? richSequence.feature Index : 45
> and line number in the genbank record : 442
> > >             ? The complete exception msg :
> > > org.hibernate.PropertyValueException: not-null property references a
> null or transient value: Location.feature
> > >         at
> org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > >         at
> org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > >         at
> org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > >         at
> org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > >         at
> org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> > >         at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> > >         at
> trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> > >
> > >
> >
> > --
> > Richard Holland, BSc MBCS
> > Operations and Delivery Director, Eagle Genomics Ltd
> > T: +44 (0)1223 654481 ext 3 | E:
> > holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> >
> >
> >
> >
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>


From deepak.sheoran at orionbiosciences.com  Thu Mar 25 14:46:57 2010
From: deepak.sheoran at orionbiosciences.com (Deepak Sheoran)
Date: Thu, 25 Mar 2010 13:46:57 -0500
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with
 subject : ( Hibernate Exception and suggestion for change in BioSqlSchema)
In-Reply-To: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
Message-ID: <4BABAFA1.6090806@orionbiosciences.com>

That is reason why I was getting error when i was creating a 
Richsequence object without any active session to biosql, I didn't had 
the clue that I created one more bug by fixing one, thanks for noticing 
that and fixing that.

I am thinking should we use bioperl -biojava and biosql compatibility  
as one of the google summer of code project. I have vision on this, but 
don't know right way to being with. This can  help people who want to 
use biojava but can't because they are afraid to loos their Perl 
code,which is heavily dependent on perl way of loading the schema. Or 
come out with a hybrid way which have good from both languages.

Deepak Sheoran

On 3/25/2010 11:27 AM, Richard Holland wrote:
> Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL.
>
> On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:
>
>    
>> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed.
>>
>>
>>
>> Thanks
>> Deepak Sheoran
>>
>>
>> Hi
>> In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database.
>>
>> Can somebody please have a look on second issue of it and fix it
>> "
>> 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
>> "
>>
>> Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it.
>> Have a look on attached files
>> 1) Biojava_BioPerl_Diff.xls  ==>  it have view of tables where genbank record is stored in biosql instance by bioperl and biojava
>> 2) GenbankRecord.doc  ==>  its word document having a genbank showing where its information goes in biosql using bioperl and biojava
>> 3) BioSqlRichobjectBuilder.patch ==>  patch needed for BioSqlRichObjectBuild.java class
>> 4) GenBankFormat.patch ==>  patch needed for GenBankFormat.java class
>>
>>
>> Thanks
>> Deepak Sheoran
>>
>>
>>
>> -------- Original Message --------
>> Subject:	Re: Hibernate Exception and suggestion for change in BioSqlSchema
>> Date:	Tue, 9 Feb 2010 20:34:32 +1300
>> From:	Richard Holland<holland at eaglegenomics.com>
>> To:	Deepak Sheoran<sheoran143 at gmail.com>
>> CC:	biojava-l at biojava.org
>>
>> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.
>>
>> However, in answer to your two questions:
>>
>>    1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).
>>
>>    2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
>>
>> cheers,
>> Richard
>>
>> On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
>>
>>      
>>> Hi Richard
>>>
>>> Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
>>>
>>>
>>> Thanks
>>> Deepak Sheoran
>>> -------- Original Message --------
>>> Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
>>> Date:	Wed, 03 Feb 2010 08:07:35 -0600
>>> From:	Deepak Sheoran
>>>        
>> <sheoran143 at gmail.com>
>>
>>      
>>> To:	
>>>        
>> biojava-l at lists.open-bio.org
>>
>>      
>>> Hi guys,
>>>
>>> A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:
>>>        
>> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
>>
>>      
>>> On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
>>> 	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
>>> This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
>>>   Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
>>> But problem is with below part of that method:
>>> ?..LineNumber: 114
>>> else if (SimpleDocRef.class.isAssignableFrom(clazz))
>>>   {                queryType = "DocRef";
>>>                  // convert List constructor to String representation for query
>>>                  ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
>>>                  if (ourParamsList.size()<3) {
>>>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
>>>                  } else {
>>>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
>>>                  }
>>>   }
>>> ..LineNubmer: 123
>>> Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
>>> ?.LineNumber: 447
>>> else {
>>>                                          try {
>>>                                              CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
>>>                                              RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
>>>                                              rlistener.getCurrentFeature().addRankedCrossRef(rcr);
>>>                                          } catch (ChangeVetoException e) {
>>>                                              throw new ParseException(e+", accession:"+accession);
>>>                                          }
>>>                                      }
>>>                      ?..LineNumber:455
>>> Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
>>>
>>> The only way to get these record in database is:
>>> 		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
>>> 		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
>>>
>>> Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
>>> Reference_id
>>> Dbxref_id
>>> Location
>>> Title
>>> Authors
>>> crc
>>> 216
>>> 18554304
>>> FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
>>> Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>>> Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>>> 9E940E01F4BE3CD0
>>> 230
>>> 18554304
>>> FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
>>> Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>>> Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>>> D3BC0C17F3F786C9
>>> 415
>>> 16790744
>>> Infect. Immun. 74 (7), 3715-3726 (2006)
>>> Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
>>> Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>>> 60AEDFA0CEEACC38
>>> 969
>>> 16790744
>>> Infect. Immun. 74 (7), 3715-3726 (2006)
>>> Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
>>> Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>>> 4B1232999F6E8130
>>> 929
>>> 8688087
>>> Science 273 (5278), 1058-1073 (1996)
>>> Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>>> Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>>> 3E79B40DD2AAA2B7
>>> 932
>>> 8688087
>>> Science 273 (5278), 1058-1073 (1996)
>>> Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>>> Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>>> 094EB3384F8D6DE8
>>> 1426
>>> 10684935
>>> Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>>> Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>>> Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
>>> 357648D8FD8C6C8A
>>> 1481
>>> 10684935
>>> Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>>> Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>>> Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
>>> 115411EB2DEE5654
>>> 1497
>>> 14689165
>>> Arch. Microbiol. 181 (2), 144-154 (2004)
>>> The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>>> Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>>> 4D5D376EECCD186B
>>> 1501
>>> 14689165
>>> Arch. Microbiol. 181 (2), 144-154 (2004)
>>> The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>>> Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>>> 4D57954EECDED66B
>>> 1556
>>> 18060065
>>> PLoS ONE 2 (12), E1271 (2007)
>>> Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
>>> Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>>> 698688FB6DB95247
>>> 1559
>>> 18060065
>>> PLoS ONE 2 (12), E1271 (2007)
>>> Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
>>> Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>>> E25E1BA99DB18F3D
>>>
>>> 	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>>> 		? Which means in richsequence object some feature have location object which have its feature set to null.
>>> 		? My Observation:
>>> 			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
>>> 			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
>>> 			? Below is the screen shot of one of my tests
>>> 				? Settings before trying to persits the richsequence object to database
>>>
>>> <Mail Attachment.png>
>>> 		?
>>> 		? After trying to persits the richsequence object to database and got in hibernate exception catch
>>>
>>> 		?<Mail Attachment.png>
>>>
>>> 		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
>>> 		? Some extra information to make things more clear to you guys.
>>> 			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
>>> 				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
>>> 					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
>>> 				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
>>> 					? richSequence.feature Index : 127 and line number in the genbank record : 2137
>>> 				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
>>> 					? richSequence.feature Index : 389 and line number in the genbank record : 3632
>>> 				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
>>> 					? richSequence.feature Index : 47 and line number in the genbank record : 4841
>>> 				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
>>> 					? richSequence.feature Index : 45 and line number in the genbank record : 442
>>> 		? The complete exception msg :
>>> org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>>>          at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>>>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>>>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>>>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>>>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>>>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>>>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>>>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>>>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>>>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>>>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>>>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>>>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>>>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>>>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>>>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>>>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>>>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>>>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>>>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>>>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>>>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>>>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>>>          at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>>>          at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>>>          at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
>>>          at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
>>>          at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
>>>
>>>
>>>        
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E:
>> holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>>
>> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>
>>      
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>    


From biopython at maubp.freeserve.co.uk  Thu Mar 25 18:16:55 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Mar 2010 22:16:55 +0000
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with
	subject : ( Hibernate Exception and suggestion for change in
	BioSqlSchema)
In-Reply-To: <4BABAFA1.6090806@orionbiosciences.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
	<4BABAFA1.6090806@orionbiosciences.com>
Message-ID: <320fb6e01003251516w2977ab2h9869342f94576287@mail.gmail.com>

On Thu, Mar 25, 2010 at 6:46 PM, Deepak Sheoran
<deepak.sheoran at orionbiosciences.com> wrote:
>
> That is reason why I was getting error when i was creating a Richsequence
> object without any active session to biosql, I didn't had the clue that I
> created one more bug by fixing one, thanks for noticing that and fixing
> that.
>
> I am thinking should we use bioperl -biojava and biosql compatibility ?as
> one of the google summer of code project. I have vision on this, but don't
> know right way to being with. This can ?help people who want to use biojava
> but can't because they are afraid to loos their Perl code,which is heavily
> dependent on perl way of loading the schema. Or come out with a hybrid way
> which have good from both languages.
>
> Deepak Sheoran

That is an interesting idea for GSoC, I wonder if we at Biopython
should do the same. I know of a few things where we differ from
BioPerl's BioSQL support (e.g. SwissProt comment lines).

[I take we agree that bioperl-db is the de facto reference
implementation for mapping GenBank etc into BioSQL?]

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Mar 26 02:14:17 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Mar 2010 02:14:17 -0400
Subject: [Biojava-dev] [Bug 3035] New: ParseException thrown when parsing
	PDB file.
Message-ID: <bug-3035-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=3035

           Summary: ParseException thrown when parsing PDB file.
           Product: BioJava
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: structure
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: nakagawa-hiroyuki at mki.co.jp


When reading a PDB file using org.biojava.bio.structure.io.PDBFileReader on
non-English platform, java.text.ParseException is thrown.
java.text.ParseException: Unparseable date: "26-DEC-97"
        at java.text.DateFormat.parse(Unknown Source)
        at
org.biojava.bio.structure.io.PDBFileParser.pdb_HEADER_Handler(PDBFileParser.java:433)
        at
org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2067)
        at
org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:1963)
        at
org.biojava.bio.structure.io.PDBFileReader.getStructure(PDBFileReader.java:486)
        at
org.biojava.bio.structure.io.PDBFileReader.getStructure(PDBFileReader.java:466)
        at Test.main(Test.java:9)

To reproduce this symptom, 
1.      Set your operating system???s default locale to non-English one(e.g.
Japanese).
2.      Then run the test code described below.
Or simply run the test code with the option ???-Duser.language=ja???
> java -Duser.language=ja Test

----Begin Test.java ----
import org.biojava.bio.structure.io.PDBFileReader;
import org.biojava.bio.structure.Structure;
public class Test {
        public static void main(String[] args) {
                String filename =  "1a2b.pdb" ;
                PDBFileReader pdbreader = new PDBFileReader();
                try{
                        Structure structure = pdbreader.getStructure(filename);
                } catch (Exception e){
                        e.printStackTrace();
                }
        }
}
----End Test.java ----

This cause, that java.text.SimpleDateFormat can???t parse PDB style "dd-MMM-yy"
date format on some non-English locale.
I attached a patch to correct this problem.

---- Begin PDBFileParser.java.diff ----
*** .\biojava-1.7.1\src\org\biojava\bio\structure\io\PDBFileParser.java.orig   
2010-01-24 22:35:24.000000000 +0900
--- .\biojava-1.7.1\src\org\biojava\bio\structure\io\PDBFileParser.java
2010-03-19 11:34:28.571551900 +0900
***************
*** 271,277 ****
                current_compound = new Compound();
                dbrefs        = new ArrayList<DBRef>();

!               dateFormat = new SimpleDateFormat("dd-MMM-yy");
                atomCount = 0;
                atomOverflow = false;

--- 271,277 ----
                current_compound = new Compound();
                dbrefs        = new ArrayList<DBRef>();

!               dateFormat = new SimpleDateFormat("dd-MMM-yy",
java.util.Locale.ENGLISH);
                atomCount = 0;
                atomOverflow = false;

---- End PDBFileParser.java.diff ----


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 26 02:18:26 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Mar 2010 02:18:26 -0400
Subject: [Biojava-dev] [Bug 3035] ParseException thrown when parsing PDB
	file.
In-Reply-To: <bug-3035-485@http.bugzilla.open-bio.org/>
Message-ID: <201003260618.o2Q6IQEV023480@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3035


------- Comment #1 from nakagawa-hiroyuki at mki.co.jp  2010-03-26 02:18 EST -------
Created an attachment (id=1467)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1467&action=view)
A patch to correct this problem


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 26 12:25:14 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Mar 2010 12:25:14 -0400
Subject: [Biojava-dev] [Bug 3035] ParseException thrown when parsing PDB
	file.
In-Reply-To: <bug-3035-485@http.bugzilla.open-bio.org/>
Message-ID: <201003261625.o2QGPEVe012950@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3035


andreas at sdsc.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 26 12:27:56 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Mar 2010 12:27:56 -0400
Subject: [Biojava-dev] [Bug 3035] ParseException thrown when parsing PDB
	file.
In-Reply-To: <bug-3035-485@http.bugzilla.open-bio.org/>
Message-ID: <201003261627.o2QGRu2r013123@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3035


andreas at sdsc.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from andreas at sdsc.edu  2010-03-26 12:27 EST -------
applied user provided patch, problem should be fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From andreas at sdsc.edu  Sun Mar 28 22:02:49 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 28 Mar 2010 19:02:49 -0700
Subject: [Biojava-dev] Biojava3 structure
In-Reply-To: <C842AAAA-DF3B-4EB8-B240-8F9E76CFAD20@scripps.edu>
References: <C842AAAA-DF3B-4EB8-B240-8F9E76CFAD20@scripps.edu>
Message-ID: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com>

Hi Scooter,

at the present the structure modules depend on the alignment module and on
the (old) core module.  This is for aligning ATOM and SEQRES residues in the
PDB files, and for the Smith Waterman alignment based 3D structure
superposition. If we target a release of biojava 3 in about a month, I don't
think it will be possible to break this out, mainly because the alignment
module is still based on the biojava 1 code base. Overall I think that the
core module probably should still be part of the BioJava 3 release. Any
opinions on that?

Andreas

On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis <HWillis at scripps.edu> wrote:

> Andreas
>
> I needed to do some work with a PDB file so started to use the structure
> library. It looks like it depends on all the old biojava code. Mainly the
> structure exceptions that extend bioexception is the first thing tripping me
> up. Should the biojava3-structure module have any external dependencies or
> am I working with the wrong package?
>
> Thanks
>
> Scooter

From andreas at sdsc.edu  Fri Mar  5 16:56:40 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 5 Mar 2010 08:56:40 -0800
Subject: [Biojava-dev] Google summer of code
Message-ID: <59a41c431003050856v17c83b80sf1fb59f2587c9cd1@mail.gmail.com>

Hi,

The Open Bioinformatics Foundation (BioJava's mother organisation) is
preparing an application for the Google Summer of Code. If you are
interested in becoming a mentor for a BioJava related project, you can join
us in the application. If you are a student and are interested in a project,
please take a look at these pages:

http://www.open-bio.org/wiki/Google_Summer_of_Code

http://biojava.org/wiki/Google_Summer_of_Code

Andreas


From yogeshp08 at gmail.com  Sat Mar  6 19:38:13 2010
From: yogeshp08 at gmail.com (Yogesh)
Date: Sat, 6 Mar 2010 14:38:13 -0500
Subject: [Biojava-dev] Modules + GSoC2010
Message-ID: <193861401003061138gbd0fa77t785eaa15a25a971c@mail.gmail.com>

Hello,

I am a Graduate student in Bioinformatics. I am thrilled to know that OBF is
particiapting in GSoC2010
I also wish to participate in GSoC2010 for the first time this year.
I will like to apply for a project related to BioJava.

I am very comfortable with Java. Also, I use BioJava very often.

One of the projects from BioJava::Modules that I like and I think I can do
is:
            Support for SCOP file parsing.

Can I have some help on how to go about this project?

Another project that I would like to contribute to is:
            Develop a multiple sequence alignment algorithm entirely written
in Java
More info on this will also help me decide on which project to apply for in
GSoC2010.

Thank you.

Regards,

-Yogesh


From holland at eaglegenomics.com  Mon Mar 15 10:34:14 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 15 Mar 2010 10:34:14 +0000
Subject: [Biojava-dev] Hackathon in Boston, July 2010
Message-ID: <5FC2D8EC-5408-4126-9A7D-CB6B3500B61C@eaglegenomics.com>

Hi all,

Following the successful hackathon in Cambridge earlier this year, it was originally planned to hold a second one in Boston in conjunction with BOSC in order to give those who couldn't make it to the UK a chance to get involved.

However, OBF have beaten us to it by organising a cross-project CodeFest!

 http://www.open-bio.org/wiki/Codefest_2010

It would be great for BioJava people to get involved with this cross-project hackathon effort, and it saves organising one of our own! :)

All relevant info is on the web page linked to above, and if you have any questions, ask Brad as detailed on the page.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas at sdsc.edu  Tue Mar 16 15:57:38 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 16 Mar 2010 08:57:38 -0700
Subject: [Biojava-dev] biojava 3 progress
Message-ID: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>

Hi,

ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
BioJava release. As such it would be a good moment to discuss the current
status of the various new BioJava 3 modules.

The biojava-structure, biojava-structure-gui modules are essentially ready
for release and I started to update the Cookbook with the latest features
http://biojava.org/wiki/BioJava:CookBook:PDB:align

Some of the re-factored modules based on biojava 1.7 could be released
anytime soon as well. The documentation just needs to be updated to explain
where the functionality can be found now (e.g. alignment module)

What about the new code that has been under development since the hackathon?
Is it getting release ready slowly? Any plans for documentation? What is
missing before we can make the first Biojava 3 release?

Andreas


From ayates at ebi.ac.uk  Tue Mar 16 17:21:48 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 16 Mar 2010 17:21:48 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
Message-ID: <81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>

It's getting ready very slowly. Currently we need:

* Locations correctly implemented
** There's no way of requesting subseqs from them atmo
* Feature on sequences support
* Extra attributes which do not fit into top-level attributes
* Mapping between sequences/assemblies
* circular location support
** so no checks on start being less than end
* Documentation

Think that's it off the top of my head

Andy

On 16 Mar 2010, at 15:57, Andreas Prlic wrote:

> Hi,
> 
> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
> BioJava release. As such it would be a good moment to discuss the current
> status of the various new BioJava 3 modules.
> 
> The biojava-structure, biojava-structure-gui modules are essentially ready
> for release and I started to update the Cookbook with the latest features
> http://biojava.org/wiki/BioJava:CookBook:PDB:align
> 
> Some of the re-factored modules based on biojava 1.7 could be released
> anytime soon as well. The documentation just needs to be updated to explain
> where the functionality can be found now (e.g. alignment module)
> 
> What about the new code that has been under development since the hackathon?
> Is it getting release ready slowly? Any plans for documentation? What is
> missing before we can make the first Biojava 3 release?
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From HWillis at scripps.edu  Tue Mar 16 18:51:04 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Tue, 16 Mar 2010 14:51:04 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
Message-ID: <EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>

I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract. 

I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.

I will also plan on migrating the sequence alignment code as well.

I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization. 

I am planning on attending ISMB/BOSC.

Do we want to put some deadlines in place with a mini-project plan?

Thanks

Scooter


On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:

> It's getting ready very slowly. Currently we need:
> 
> * Locations correctly implemented
> ** There's no way of requesting subseqs from them atmo
> * Feature on sequences support
> * Extra attributes which do not fit into top-level attributes
> * Mapping between sequences/assemblies
> * circular location support
> ** so no checks on start being less than end
> * Documentation
> 
> Think that's it off the top of my head
> 
> Andy
> 
> On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
> 
>> Hi,
>> 
>> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
>> BioJava release. As such it would be a good moment to discuss the current
>> status of the various new BioJava 3 modules.
>> 
>> The biojava-structure, biojava-structure-gui modules are essentially ready
>> for release and I started to update the Cookbook with the latest features
>> http://biojava.org/wiki/BioJava:CookBook:PDB:align
>> 
>> Some of the re-factored modules based on biojava 1.7 could be released
>> anytime soon as well. The documentation just needs to be updated to explain
>> where the functionality can be found now (e.g. alignment module)
>> 
>> What about the new code that has been under development since the hackathon?
>> Is it getting release ready slowly? Any plans for documentation? What is
>> missing before we can make the first Biojava 3 release?
>> 
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From andreas at sdsc.edu  Tue Mar 16 20:58:02 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 16 Mar 2010 13:58:02 -0700
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
Message-ID: <59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>

Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts
do you think would be feasible to finish,  if we would say we are planning a
release  e.g. early May ? We can have a follow-up to this release once the
next round of features have been added. Probably it  makes sense to focus on
stabilizing what is currently there and documenting it, rather than trying
to be feature-complete. Critical features that are still missing should be
added of course...

Andreas

On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu>wrote:

> I am working on adding in additional features to the core module to round
> things out and will be able to do docs/wiki examples. I will be working on
> Features with the new sequence model and the ability to pull features from
> uniprot based on uniprot id as an example. I will use uniprot XML as the
> data model when figuring out the feature data model such that classes have
> biology relevance instead of being completely abstract.
>
> I will also see if I can do something with NCBI for genome sequence data
> where you don't need to download the entire sequence but based on gff
> annotations you can pull dna sequences for exons belonging to a particular
> gene.
>
> I will also plan on migrating the sequence alignment code as well.
>
> I think the focus for this release should be on the modularization of the
> modules and the maven integration. We also need to provide a repository for
> those who are not going to use maven and need just the jar files. We can
> then highlight the newer modules as a benefit of the modularization.
>
> I am planning on attending ISMB/BOSC.
>
> Do we want to put some deadlines in place with a mini-project plan?
>
> Thanks
>
> Scooter
>
>
> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
>
> > It's getting ready very slowly. Currently we need:
> >
> > * Locations correctly implemented
> > ** There's no way of requesting subseqs from them atmo
> > * Feature on sequences support
> > * Extra attributes which do not fit into top-level attributes
> > * Mapping between sequences/assemblies
> > * circular location support
> > ** so no checks on start being less than end
> > * Documentation
> >
> > Think that's it off the top of my head
> >
> > Andy
> >
> > On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
> >
> >> Hi,
> >>
> >> ISMB/BOSC is coming up rapidly and we should start to prepare for the
> annual
> >> BioJava release. As such it would be a good moment to discuss the
> current
> >> status of the various new BioJava 3 modules.
> >>
> >> The biojava-structure, biojava-structure-gui modules are essentially
> ready
> >> for release and I started to update the Cookbook with the latest
> features
> >> http://biojava.org/wiki/BioJava:CookBook:PDB:align
> >>
> >> Some of the re-factored modules based on biojava 1.7 could be released
> >> anytime soon as well. The documentation just needs to be updated to
> explain
> >> where the functionality can be found now (e.g. alignment module)
> >>
> >> What about the new code that has been under development since the
> hackathon?
> >> Is it getting release ready slowly? Any plans for documentation? What is
> >> missing before we can make the first Biojava 3 release?
> >>
> >> Andreas
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> > --
> > Andrew Yates                   Ensembl Genomes Engineer
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >
> >
> >
> >
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>


From ayates at ebi.ac.uk  Wed Mar 17 15:28:33 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 17 Mar 2010 15:28:33 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
Message-ID: <4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>

I think features are possible & this is really the missing piece of the puzzle with this project. How far on are you with them Scooter?

On 16 Mar 2010, at 20:58, Andreas Prlic wrote:

> Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts do you think would be feasible to finish,  if we would say we are planning a release  e.g. early May ? We can have a follow-up to this release once the next round of features have been added. Probably it  makes sense to focus on stabilizing what is currently there and documenting it, rather than trying to be feature-complete. Critical features that are still missing should be added of course... 
> 
> Andreas
> 
> On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu> wrote:
> I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract.
> 
> I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.
> 
> I will also plan on migrating the sequence alignment code as well.
> 
> I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization.
> 
> I am planning on attending ISMB/BOSC.
> 
> Do we want to put some deadlines in place with a mini-project plan?
> 
> Thanks
> 
> Scooter
> 
> 
> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
> 
> > It's getting ready very slowly. Currently we need:
> >
> > * Locations correctly implemented
> > ** There's no way of requesting subseqs from them atmo
> > * Feature on sequences support
> > * Extra attributes which do not fit into top-level attributes
> > * Mapping between sequences/assemblies
> > * circular location support
> > ** so no checks on start being less than end
> > * Documentation
> >
> > Think that's it off the top of my head
> >
> > Andy
> >
> > On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
> >
> >> Hi,
> >>
> >> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
> >> BioJava release. As such it would be a good moment to discuss the current
> >> status of the various new BioJava 3 modules.
> >>
> >> The biojava-structure, biojava-structure-gui modules are essentially ready
> >> for release and I started to update the Cookbook with the latest features
> >> http://biojava.org/wiki/BioJava:CookBook:PDB:align
> >>
> >> Some of the re-factored modules based on biojava 1.7 could be released
> >> anytime soon as well. The documentation just needs to be updated to explain
> >> where the functionality can be found now (e.g. alignment module)
> >>
> >> What about the new code that has been under development since the hackathon?
> >> Is it getting release ready slowly? Any plans for documentation? What is
> >> missing before we can make the first Biojava 3 release?
> >>
> >> Andreas
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> > --
> > Andrew Yates                   Ensembl Genomes Engineer
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >
> >
> >
> >
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From HWillis at scripps.edu  Wed Mar 17 15:52:01 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 11:52:01 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
Message-ID: <E0D0833C-5C65-476D-936E-1A300B27A463@scripps.edu>

Andy

Working on it at the moment. I am starting with some code I have been using from JavaGene that has a fairly good handle of gff parsing and handling negative strands. I am migrating to a new project called biojava3-genes(local only at the moment) where code related to gff parsing and dealing with various gene prediction program outputs can be used. I need to create a training file for GlimmerHMM so the short term goal is to take a XML blast output of predicted genes that match uniprot and then extract the exon features from DNASequences with exon features added from a gff file. I will then use these validated exon features to create the GlimmerHMM training file. The complexity of exon features with negative strand and frame shifts with the ability to splice together a coding sequence is probably the most complicated feature example we will encounter. After I get through that I will see what can be extended/refactored etc for other more generic features.

I also have some code to gather genome characteristics GC percent, avg gene length, etc. that can be included in the biojava3-genes module. I wanted to see if you know how Average Number of Introns per gene is calculated when a gene has no introns. Do you add a 0 to the average or only include genes with at least one intron in the average?

Can you think of a better name for a package that deals with gff,gff3 parsing and utilities to work with various gene prediction inputs/outputs?

Scooter


On Mar 17, 2010, at 11:28 AM, Andy Yates wrote:

> I think features are possible & this is really the missing piece of the puzzle with this project. How far on are you with them Scooter?
> 
> On 16 Mar 2010, at 20:58, Andreas Prlic wrote:
> 
>> Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts do you think would be feasible to finish,  if we would say we are planning a release  e.g. early May ? We can have a follow-up to this release once the next round of features have been added. Probably it  makes sense to focus on stabilizing what is currently there and documenting it, rather than trying to be feature-complete. Critical features that are still missing should be added of course... 
>> 
>> Andreas
>> 
>> On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>> I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract.
>> 
>> I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.
>> 
>> I will also plan on migrating the sequence alignment code as well.
>> 
>> I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization.
>> 
>> I am planning on attending ISMB/BOSC.
>> 
>> Do we want to put some deadlines in place with a mini-project plan?
>> 
>> Thanks
>> 
>> Scooter
>> 
>> 
>> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
>> 
>>> It's getting ready very slowly. Currently we need:
>>> 
>>> * Locations correctly implemented
>>> ** There's no way of requesting subseqs from them atmo
>>> * Feature on sequences support
>>> * Extra attributes which do not fit into top-level attributes
>>> * Mapping between sequences/assemblies
>>> * circular location support
>>> ** so no checks on start being less than end
>>> * Documentation
>>> 
>>> Think that's it off the top of my head
>>> 
>>> Andy
>>> 
>>> On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
>>> 
>>>> Hi,
>>>> 
>>>> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
>>>> BioJava release. As such it would be a good moment to discuss the current
>>>> status of the various new BioJava 3 modules.
>>>> 
>>>> The biojava-structure, biojava-structure-gui modules are essentially ready
>>>> for release and I started to update the Cookbook with the latest features
>>>> http://biojava.org/wiki/BioJava:CookBook:PDB:align
>>>> 
>>>> Some of the re-factored modules based on biojava 1.7 could be released
>>>> anytime soon as well. The documentation just needs to be updated to explain
>>>> where the functionality can be found now (e.g. alignment module)
>>>> 
>>>> What about the new code that has been under development since the hackathon?
>>>> Is it getting release ready slowly? Any plans for documentation? What is
>>>> missing before we can make the first Biojava 3 release?
>>>> 
>>>> Andreas
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> 
>>> --
>>> Andrew Yates                   Ensembl Genomes Engineer
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> 
>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 


From ayates at ebi.ac.uk  Wed Mar 17 16:04:50 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 17 Mar 2010 16:04:50 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <E0D0833C-5C65-476D-936E-1A300B27A463@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<E0D0833C-5C65-476D-936E-1A300B27A463@scripps.edu>
Message-ID: <4A0FFCFF-9EAA-4B27-BF11-AFD6D4CFEAE0@ebi.ac.uk>

Hey mate,

Sounds good anything with good GFF support is something hard to come by :). So you're going to get it working for the non-generic structures & then push it out into the core modules if I'm reading what you said correctly?

Add 0 to the percentage & make sure the docs describe what it's doing. Even if a gene has no introns it still affects the average of introns in a genome :).

All I can think of is "biojava3-features". Not sure what "biojava3-genes" says. Maybe it goes into an "io" package ... say one which goes with an EMBL/Genbank/CHADO formatter maybe. Naming is a horrible thing to have to do. 

Andy

On 17 Mar 2010, at 15:52, Scooter Willis wrote:

> Andy
> 
> Working on it at the moment. I am starting with some code I have been using from JavaGene that has a fairly good handle of gff parsing and handling negative strands. I am migrating to a new project called biojava3-genes(local only at the moment) where code related to gff parsing and dealing with various gene prediction program outputs can be used. I need to create a training file for GlimmerHMM so the short term goal is to take a XML blast output of predicted genes that match uniprot and then extract the exon features from DNASequences with exon features added from a gff file. I will then use these validated exon features to create the GlimmerHMM training file. The complexity of exon features with negative strand and frame shifts with the ability to splice together a coding sequence is probably the most complicated feature example we will encounter. After I get through that I will see what can be extended/refactored etc for other more generic features.
> 
> I also have some code to gather genome characteristics GC percent, avg gene length, etc. that can be included in the biojava3-genes module. I wanted to see if you know how Average Number of Introns per gene is calculated when a gene has no introns. Do you add a 0 to the average or only include genes with at least one intron in the average?
> 
> Can you think of a better name for a package that deals with gff,gff3 parsing and utilities to work with various gene prediction inputs/outputs?
> 
> Scooter
> 
> 
> 
> 
> 
> On Mar 17, 2010, at 11:28 AM, Andy Yates wrote:
> 
>> I think features are possible & this is really the missing piece of the puzzle with this project. How far on are you with them Scooter?
>> 
>> On 16 Mar 2010, at 20:58, Andreas Prlic wrote:
>> 
>>> Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts do you think would be feasible to finish,  if we would say we are planning a release  e.g. early May ? We can have a follow-up to this release once the next round of features have been added. Probably it  makes sense to focus on stabilizing what is currently there and documenting it, rather than trying to be feature-complete. Critical features that are still missing should be added of course... 
>>> 
>>> Andreas
>>> 
>>> On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract.
>>> 
>>> I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.
>>> 
>>> I will also plan on migrating the sequence alignment code as well.
>>> 
>>> I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization.
>>> 
>>> I am planning on attending ISMB/BOSC.
>>> 
>>> Do we want to put some deadlines in place with a mini-project plan?
>>> 
>>> Thanks
>>> 
>>> Scooter
>>> 
>>> 
>>> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
>>> 
>>>> It's getting ready very slowly. Currently we need:
>>>> 
>>>> * Locations correctly implemented
>>>> ** There's no way of requesting subseqs from them atmo
>>>> * Feature on sequences support
>>>> * Extra attributes which do not fit into top-level attributes
>>>> * Mapping between sequences/assemblies
>>>> * circular location support
>>>> ** so no checks on start being less than end
>>>> * Documentation
>>>> 
>>>> Think that's it off the top of my head
>>>> 
>>>> Andy
>>>> 
>>>> On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
>>>>> BioJava release. As such it would be a good moment to discuss the current
>>>>> status of the various new BioJava 3 modules.
>>>>> 
>>>>> The biojava-structure, biojava-structure-gui modules are essentially ready
>>>>> for release and I started to update the Cookbook with the latest features
>>>>> http://biojava.org/wiki/BioJava:CookBook:PDB:align
>>>>> 
>>>>> Some of the re-factored modules based on biojava 1.7 could be released
>>>>> anytime soon as well. The documentation just needs to be updated to explain
>>>>> where the functionality can be found now (e.g. alignment module)
>>>>> 
>>>>> What about the new code that has been under development since the hackathon?
>>>>> Is it getting release ready slowly? Any plans for documentation? What is
>>>>> missing before we can make the first Biojava 3 release?
>>>>> 
>>>>> Andreas
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>> 
>>>> --
>>>> Andrew Yates                   Ensembl Genomes Engineer
>>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> 
>>> 
>> 
>> -- 
>> Andrew Yates                   Ensembl Genomes Engineer
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>> 
>> 
>> 
>> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From HWillis at scripps.edu  Wed Mar 17 16:09:29 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 12:09:29 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
Message-ID: <D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>

Andy

Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.

I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.

We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
 

Scooter


From HWillis at scripps.edu  Wed Mar 17 16:14:02 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 12:14:02 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <4A0FFCFF-9EAA-4B27-BF11-AFD6D4CFEAE0@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<E0D0833C-5C65-476D-936E-1A300B27A463@scripps.edu>
	<4A0FFCFF-9EAA-4B27-BF11-AFD6D4CFEAE0@ebi.ac.uk>
Message-ID: <D8873897-5329-4FFF-A783-AB875FA995A9@scripps.edu>

Andy

I have two methods that calculate avg introns per gene both ways. Just wasn't sure what the standard is for reporting.

I think features should be part of the core because it is abstract regardless of the source that generated the feature. For the code related to gene prediction work that probably should be in a different package because it is not general. Calling it biojava-geneprediction also doesn't work because it implies gene prediction. 

Scooter

On Mar 17, 2010, at 12:04 PM, Andy Yates wrote:

> Hey mate,
> 
> Sounds good anything with good GFF support is something hard to come by :). So you're going to get it working for the non-generic structures & then push it out into the core modules if I'm reading what you said correctly?
> 
> Add 0 to the percentage & make sure the docs describe what it's doing. Even if a gene has no introns it still affects the average of introns in a genome :).
> 
> All I can think of is "biojava3-features". Not sure what "biojava3-genes" says. Maybe it goes into an "io" package ... say one which goes with an EMBL/Genbank/CHADO formatter maybe. Naming is a horrible thing to have to do. 
> 
> Andy
> 
> On 17 Mar 2010, at 15:52, Scooter Willis wrote:
> 
>> Andy
>> 
>> Working on it at the moment. I am starting with some code I have been using from JavaGene that has a fairly good handle of gff parsing and handling negative strands. I am migrating to a new project called biojava3-genes(local only at the moment) where code related to gff parsing and dealing with various gene prediction program outputs can be used. I need to create a training file for GlimmerHMM so the short term goal is to take a XML blast output of predicted genes that match uniprot and then extract the exon features from DNASequences with exon features added from a gff file. I will then use these validated exon features to create the GlimmerHMM training file. The complexity of exon features with negative strand and frame shifts with the ability to splice together a coding sequence is probably the most complicated feature example we will encounter. After I get through that I will see what can be extended/refactored etc for other more generic features.
>> 
>> I also have some code to gather genome characteristics GC percent, avg gene length, etc. that can be included in the biojava3-genes module. I wanted to see if you know how Average Number of Introns per gene is calculated when a gene has no introns. Do you add a 0 to the average or only include genes with at least one intron in the average?
>> 
>> Can you think of a better name for a package that deals with gff,gff3 parsing and utilities to work with various gene prediction inputs/outputs?
>> 
>> Scooter
>> 
>> 
>> 
>> 
>> 
>> On Mar 17, 2010, at 11:28 AM, Andy Yates wrote:
>> 
>>> I think features are possible & this is really the missing piece of the puzzle with this project. How far on are you with them Scooter?
>>> 
>>> On 16 Mar 2010, at 20:58, Andreas Prlic wrote:
>>> 
>>>> Ok, cool. Thanks for all this state-of-the-art pushing there... Which parts do you think would be feasible to finish,  if we would say we are planning a release  e.g. early May ? We can have a follow-up to this release once the next round of features have been added. Probably it  makes sense to focus on stabilizing what is currently there and documenting it, rather than trying to be feature-complete. Critical features that are still missing should be added of course... 
>>>> 
>>>> Andreas
>>>> 
>>>> On Tue, Mar 16, 2010 at 11:51 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>> I am working on adding in additional features to the core module to round things out and will be able to do docs/wiki examples. I will be working on Features with the new sequence model and the ability to pull features from uniprot based on uniprot id as an example. I will use uniprot XML as the data model when figuring out the feature data model such that classes have biology relevance instead of being completely abstract.
>>>> 
>>>> I will also see if I can do something with NCBI for genome sequence data where you don't need to download the entire sequence but based on gff annotations you can pull dna sequences for exons belonging to a particular gene.
>>>> 
>>>> I will also plan on migrating the sequence alignment code as well.
>>>> 
>>>> I think the focus for this release should be on the modularization of the modules and the maven integration. We also need to provide a repository for those who are not going to use maven and need just the jar files. We can then highlight the newer modules as a benefit of the modularization.
>>>> 
>>>> I am planning on attending ISMB/BOSC.
>>>> 
>>>> Do we want to put some deadlines in place with a mini-project plan?
>>>> 
>>>> Thanks
>>>> 
>>>> Scooter
>>>> 
>>>> 
>>>> On Mar 16, 2010, at 1:21 PM, Andy Yates wrote:
>>>> 
>>>>> It's getting ready very slowly. Currently we need:
>>>>> 
>>>>> * Locations correctly implemented
>>>>> ** There's no way of requesting subseqs from them atmo
>>>>> * Feature on sequences support
>>>>> * Extra attributes which do not fit into top-level attributes
>>>>> * Mapping between sequences/assemblies
>>>>> * circular location support
>>>>> ** so no checks on start being less than end
>>>>> * Documentation
>>>>> 
>>>>> Think that's it off the top of my head
>>>>> 
>>>>> Andy
>>>>> 
>>>>> On 16 Mar 2010, at 15:57, Andreas Prlic wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> ISMB/BOSC is coming up rapidly and we should start to prepare for the annual
>>>>>> BioJava release. As such it would be a good moment to discuss the current
>>>>>> status of the various new BioJava 3 modules.
>>>>>> 
>>>>>> The biojava-structure, biojava-structure-gui modules are essentially ready
>>>>>> for release and I started to update the Cookbook with the latest features
>>>>>> http://biojava.org/wiki/BioJava:CookBook:PDB:align
>>>>>> 
>>>>>> Some of the re-factored modules based on biojava 1.7 could be released
>>>>>> anytime soon as well. The documentation just needs to be updated to explain
>>>>>> where the functionality can be found now (e.g. alignment module)
>>>>>> 
>>>>>> What about the new code that has been under development since the hackathon?
>>>>>> Is it getting release ready slowly? Any plans for documentation? What is
>>>>>> missing before we can make the first Biojava 3 release?
>>>>>> 
>>>>>> Andreas
>>>>>> _______________________________________________
>>>>>> biojava-dev mailing list
>>>>>> biojava-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>> 
>>>>> --
>>>>> Andrew Yates                   Ensembl Genomes Engineer
>>>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>> 
>>>> 
>>> 
>>> -- 
>>> Andrew Yates                   Ensembl Genomes Engineer
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>> 
>>> 
>>> 
>>> 
>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 


From andreas at sdsc.edu  Wed Mar 17 17:46:19 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 17 Mar 2010 10:46:19 -0700
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
Message-ID: <59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>

I like biojava-feature as a module name  for the GFF and features related
code. (should we try to keep the module names singular?) Let me know if you
want me to create the module for this...
A

On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:

> Andy
>
> Let me know if you have any major code changes for the core sequencing
> handling that have been or could be checked in. So far I haven't needed to
> touch any of the core sequence code but want to avoid merging code if you
> have made any significant changes.
>
> I should have code to check in today and if we can't come up with a better
> name I will ask Andreas to create a biojava3-genes module and I can then
> check that code in for your review. The current problem is that we have
> ExonSequence extending DNASequence when it could also be described as a
> feature. One way to look at this that a TranscriptSequence is also a feature
> of a DNA sequence and only when you want to have a stand alone class with
> internal links back to parent sequence do you return a TranscriptSequence.
> The TranscriptFeature would have ExonFeature and IntronFeature as children.
> You can ask for a ExonSequence based on the ExonFeature. Once you get a
> ProteinSequence you should be able to reverse the process and get back the
> TranscriptSequence and the corresponding ExonFeatures and some sort of
> mapping from a protein sequence position back to the three DNA sequence
> positions that coded for it. This would need to handle the case where you
> have a the end of an exon and the start of the next exon coding for a
> particular amino acid sequence position.
>
> We also need to add in the ability to have tracks as a way to group
> features. This way you export features based on a particular track as a
> GFF/GFF3 file for importing into various genome browsers. You have one
> genome you are working on with genes added in from three different gene
> prediction algorithms each organized by a track. You should then be able to
> determine overlaps of genes that were predicted and validated via blast
> against uniprot and create another summary track of validated genes and
> non-validate genes. If the feature classes we put together can make this
> easy then I think we will have a solid design.
>
>
> Scooter
>
>


From HWillis at scripps.edu  Wed Mar 17 18:17:59 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 14:17:59 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
Message-ID: <5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>

Andreas

The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.

We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.

Thanks

Scooter


On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:

I like biojava-feature as a module name  for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
A

On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu<mailto:HWillis at scripps.edu>> wrote:
Andy

Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.

I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.

We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.


Scooter


From ayates at ebi.ac.uk  Wed Mar 17 19:24:13 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 17 Mar 2010 19:24:13 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
	<5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
Message-ID: <1077DC26-42AB-4E41-BFA3-DEFD769F4C61@ebi.ac.uk>

biojava-genomes sounds good.

I've done nothing since my last check-in of code which was all to do with locations so there should be no problem there :)

On 17 Mar 2010, at 18:17, Scooter Willis wrote:

> Andreas
> 
> The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.
> 
> We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.
> 
> Thanks
> 
> Scooter
>  
> 
> 
> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
> 
>> I like biojava-feature as a module name  for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
>> A
>> 
>> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>> Andy
>> 
>> Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.
>> 
>> I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.
>> 
>> We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
>> 
>> 
>> Scooter
>> 
>> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From HWillis at scripps.edu  Wed Mar 17 19:58:42 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 17 Mar 2010 15:58:42 -0400
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <1077DC26-42AB-4E41-BFA3-DEFD769F4C61@ebi.ac.uk>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
	<5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
	<1077DC26-42AB-4E41-BFA3-DEFD769F4C61@ebi.ac.uk>
Message-ID: <9F8616DE-710D-4971-8C63-52C5EB7789C2@scripps.edu>

Andy

Should be use this as our test case http://www.sequenceontology.org/gff3.shtml for a complex example of transcription?

Scooter

On Mar 17, 2010, at 3:24 PM, Andy Yates wrote:

> biojava-genomes sounds good.
> 
> I've done nothing since my last check-in of code which was all to do with locations so there should be no problem there :)
> 
> On 17 Mar 2010, at 18:17, Scooter Willis wrote:
> 
>> Andreas
>> 
>> The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.
>> 
>> We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.
>> 
>> Thanks
>> 
>> Scooter
>> 
>> 
>> 
>> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
>> 
>>> I like biojava-feature as a module name  for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
>>> A
>>> 
>>> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>> Andy
>>> 
>>> Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.
>>> 
>>> I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.
>>> 
>>> We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
>>> 
>>> 
>>> Scooter
>>> 
>>> 
>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 


From ayates at ebi.ac.uk  Wed Mar 17 20:01:04 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 17 Mar 2010 20:01:04 +0000
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <9F8616DE-710D-4971-8C63-52C5EB7789C2@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
	<5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
	<1077DC26-42AB-4E41-BFA3-DEFD769F4C61@ebi.ac.uk>
	<9F8616DE-710D-4971-8C63-52C5EB7789C2@scripps.edu>
Message-ID: <2A33D045-0AD9-4948-90D3-48636D074514@ebi.ac.uk>

Perfect :). Nothing like using someone else's test case as ours

Andy

On 17 Mar 2010, at 19:58, Scooter Willis wrote:

> Andy
> 
> Should be use this as our test case http://www.sequenceontology.org/gff3.shtml for a complex example of transcription?
> 
> Scooter
> 
> On Mar 17, 2010, at 3:24 PM, Andy Yates wrote:
> 
>> biojava-genomes sounds good.
>> 
>> I've done nothing since my last check-in of code which was all to do with locations so there should be no problem there :)
>> 
>> On 17 Mar 2010, at 18:17, Scooter Willis wrote:
>> 
>>> Andreas
>>> 
>>> The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.
>>> 
>>> We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.
>>> 
>>> Thanks
>>> 
>>> Scooter
>>> 
>>> 
>>> 
>>> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
>>> 
>>>> I like biojava-feature as a module name  for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
>>>> A
>>>> 
>>>> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>>>> Andy
>>>> 
>>>> Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.
>>>> 
>>>> I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.
>>>> 
>>>> We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
>>>> 
>>>> 
>>>> Scooter
>>>> 
>>>> 
>>> 
>> 
>> -- 
>> Andrew Yates                   Ensembl Genomes Engineer
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>> 
>> 
>> 
>> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From andreas at sdsc.edu  Wed Mar 17 22:14:40 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 17 Mar 2010 15:14:40 -0700
Subject: [Biojava-dev] biojava 3 progress
In-Reply-To: <5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
References: <59a41c431003160857s5fb8f4f8i89f410a1adfbca85@mail.gmail.com>
	<81FA76CF-D4F6-44A5-A92F-C92D48BC7F8C@ebi.ac.uk>
	<EB9E5EA8-58D9-439A-B942-79D54444E0FC@scripps.edu>
	<59a41c431003161358h45d55b36w73050c8d5a883c98@mail.gmail.com>
	<4A9A2D02-6E24-468B-9EC3-D58BE335406F@ebi.ac.uk>
	<D5D1D495-0B42-483C-9C67-39CBD43DC9B5@scripps.edu>
	<59a41c431003171046u57ef0d00vd4452074fc922b1@mail.gmail.com>
	<5C3EFA6A-68FF-4FF9-B92F-861E4E88B41C@scripps.edu>
Message-ID: <59a41c431003171514u1357ecf1ndab75fa4d461124e@mail.gmail.com>

ok, a new module biojava3-genome is now in SVN...
A

On Wed, Mar 17, 2010 at 11:17 AM, Scooter Willis <HWillis at scripps.edu>wrote:

> Andreas
>
> The problem with putting feature classes in a separate module is that
> biojava-core sequences would then have a dependency on biojava-feature. A
> sequence needs to hold a collection of features so feature classes need to
> go in core. If features are created from gff the core module doesn't care
> where features come from.
>
> We could go with biojava-genomes and code related to dealing with genomes
> goes in that module. If you like biojava-genome or biojava-genomes go ahead
> and create it and email me so I can check it out.
>
> Thanks
>
> Scooter
>
>
>
> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
>
> I like biojava-feature as a module name  for the GFF and features related
> code. (should we try to keep the module names singular?) Let me know if you
> want me to create the module for this...
> A
>
> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu>wrote:
>
>> Andy
>>
>> Let me know if you have any major code changes for the core sequencing
>> handling that have been or could be checked in. So far I haven't needed to
>> touch any of the core sequence code but want to avoid merging code if you
>> have made any significant changes.
>>
>> I should have code to check in today and if we can't come up with a better
>> name I will ask Andreas to create a biojava3-genes module and I can then
>> check that code in for your review. The current problem is that we have
>> ExonSequence extending DNASequence when it could also be described as a
>> feature. One way to look at this that a TranscriptSequence is also a feature
>> of a DNA sequence and only when you want to have a stand alone class with
>> internal links back to parent sequence do you return a TranscriptSequence.
>> The TranscriptFeature would have ExonFeature and IntronFeature as children.
>> You can ask for a ExonSequence based on the ExonFeature. Once you get a
>> ProteinSequence you should be able to reverse the process and get back the
>> TranscriptSequence and the corresponding ExonFeatures and some sort of
>> mapping from a protein sequence position back to the three DNA sequence
>> positions that coded for it. This would need to handle the case where you
>> have a the end of an exon and the start of the next exon coding for a
>> particular amino acid sequence position.
>>
>> We also need to add in the ability to have tracks as a way to group
>> features. This way you export features based on a particular track as a
>> GFF/GFF3 file for importing into various genome browsers. You have one
>> genome you are working on with genes added in from three different gene
>> prediction algorithms each organized by a track. You should then be able to
>> determine overlaps of genes that were predicted and validated via blast
>> against uniprot and create another summary track of validated genes and
>> non-validate genes. If the feature classes we put together can make this
>> easy then I think we will have a solid design.
>>
>>
>> Scooter
>>
>>
>
>


From heuermh at acm.org  Thu Mar 18 03:28:23 2010
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 17 Mar 2010 22:28:23 -0500 (EST)
Subject: [Biojava-dev] Hackathon in Boston, July 2010
In-Reply-To: <5FC2D8EC-5408-4126-9A7D-CB6B3500B61C@eaglegenomics.com>
Message-ID: <Pine.GSO.4.44.1003172227210.25986-100000@shell3.shore.net>

On Mon, 15 Mar 2010, Richard Holland wrote:

> Hi all,
>
> Following the successful hackathon in Cambridge earlier this year, it was originally planned to hold a second one in Boston in conjunction with BOSC in order to give those who couldn't make it to the UK a chance to get involved.
>
> However, OBF have beaten us to it by organising a cross-project CodeFest!
>
>  http://www.open-bio.org/wiki/Codefest_2010
>
> It would be great for BioJava people to get involved with this cross-project hackathon effort, and it saves organising one of our own! :)

Yep, I'm already signed up.  Look forward to seeing some of you there.

   michael


From andreas at sdsc.edu  Thu Mar 18 20:36:38 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 18 Mar 2010 13:36:38 -0700
Subject: [Biojava-dev] Google summer of code
Message-ID: <59a41c431003181336i33d388aak4b5a26e11ee4161b@mail.gmail.com>

Hi,

It seems our (the Open Biology Foundation's) Google Summer of Code
application has been accepted.
http://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010

As such we are now looking for an interested and skilled student to work on
the BioJava multiple sequence alignment project. Take a look at the project
description, and if you think you are up for the challenge, send me an email
with your application.

http://biojava.org/wiki/Google_Summer_of_Code

Andreas


From andreas at sdsc.edu  Wed Mar 24 00:33:09 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 23 Mar 2010 17:33:09 -0700
Subject: [Biojava-dev] GSoC update
Message-ID: <59a41c431003231733t1e259753k55fbe0a8bfb801a3@mail.gmail.com>

Hi,

A quick update regarding the current status of our Google Summer of Code
project: Several students already have expressed their interest. In fact the
response was so good that I believe BioJava should try to run more than just
one project.  In the meanwhile we added another "mentor proposed" project to
our GSoC page : http://biojava.org/wiki/Google_Summer_of_Code . Identification
and Classification of Posttranslational Modification of Proteins:  Develop a
Postranslational Modification package for the BioJava project.

In general Google strongly encourages to have student-proposed projects,
since historically those are often the most successful GSoC projects. It is
recommended that students contact us / possible mentors prior to their
application so we can match up students with suitable mentors and projects
and we can help in solidifying your project ideas. In principle any BioJava
contributor is suitable as a mentor. Students can apply between March 22nd
and April 9th via the google web site. http://socghop.appspot.com/

Andreas


From biopython at maubp.freeserve.co.uk  Wed Mar 24 14:51:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Mar 2010 14:51:46 +0000
Subject: [Biojava-dev] [Bioperl-l] Fwd: [Utilities-announce] NCBI
	Revised E-utility Usage Policy
In-Reply-To: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov>
	<320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
	<38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>
Message-ID: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>

On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> On Mar 24, 2010, at 9:08 AM, Peter wrote:
>
>> Hi,
>>
>> This is probably of interest to all the Bio* projects offering access
>> to the NCBI Entrez utilities. See forwarded message below.
>>
>> I *think* the new guidelines basically say that the email & tool parameters are
>> optional BUT if your IP address ever gets banned for excessive use you then
>> have to register an email & tool combination.
>>
>> Regarding the email address, the NCBI say to use the email of the developer
>> (not the end user). However, they do not distinguish between the developers
>> of a library (like us), and the developers of an application or script using a
>> library (who may also be the end user).
>>
>> Currently we (Biopython) and I think BioPerl ask developers using our libraries
>> to populate the email address themselves. I *think* this is still the
>> right action.
>>
>> Peter
>
>
> Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I
> think with the SOAP-based ones as well). ?We're providing a specific set of
> tools for user to write up their own applications end applications. ?I can try
> contacting them regarding this to get an official response to clarify this
> somewhat.

Please give the NCBI an email - you can CC me too if you like.

> Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a
> default, but always leave the email blank and issue a warning if it isn't
> set. ?We could just as easily leave both blank and issue warnings for both.

We currently leave out the email and set the tool parameter to "Biopython"
by default but this can be overridden. Currently leaving out the email does
cause Biopython to give a warning.

Peter


From cjfields at illinois.edu  Wed Mar 24 14:37:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Mar 2010 09:37:13 -0500
Subject: [Biojava-dev] [Bioperl-l] Fwd: [Utilities-announce] NCBI
	Revised E-utility Usage Policy
In-Reply-To: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov>
	<320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
Message-ID: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>

On Mar 24, 2010, at 9:08 AM, Peter wrote:

> Hi,
> 
> This is probably of interest to all the Bio* projects offering access
> to the NCBI
> Entrez utilities. See forwarded message below.
> 
> I *think* the new guidelines basically say that the email & tool parameters are
> optional BUT if your IP address ever gets banned for excessive use you then
> have to register an email & tool combination.
> 
> Regarding the email address, the NCBI say to use the email of the developer
> (not the end user). However, they do not distinguish between the developers
> of a library (like us), and the developers of an application or script using a
> library (who may also be the end user).
> 
> Currently we (Biopython) and I think BioPerl ask developers using our libraries
> to populate the email address themselves. I *think* this is still the
> right action.
> 
> Peter


Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I think with the SOAP-based ones as well).  We're providing a specific set of tools for user to write up their own applications end applications.  I can try contacting them regarding this to get an official response to clarify this somewhat.

Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a default, but always leave the email blank and issue a warning if it isn't set.  We could just as easily leave both blank and issue warnings for both.

chris


> ---------- Forwarded message ----------
> From:  <utilities-announce at ncbi.nlm.nih.gov>
> Date: Wed, Mar 24, 2010 at 1:53 PM
> Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy
> To: NLM/NCBI List utilities-announce <utilities-announce at ncbi.nlm.nih.gov>
> 
> 
> New E-utility documentation now on the NCBI Bookshelf
> 
> The Entrez Programming Utilities (E-Utilities) Help documentation has
> been added to the NCBI Bookshelf, and so is now fully integrated with
> the Entrez search and retrieval system as a part of the Bookshelf
> database. This help document has been divided into chapters for better
> organization and includes several new sample Perl scripts. At present
> this book covers the standard URL interface for the E-utilties;
> material about the SOAP interface will be added soon and is still
> available at the same URL:
> http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html.
> 
> 
> 
> Revised E-utility usage policy
> 
> In December, 2009 NCBI announced a change to the usage policy for the
> E-utilities that would require all requests to contain non-null values
> for both the &email and &tool parameters. After several consultations
> with our users and developers, we have decided to revise this policy
> change, and the revised policy is described in detail at the following
> link:
> 
> http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen
> 
> Please let us know if you have any questions or concerns about this
> policy change.
> 
> 
> 
> Thank you,
> 
> The E-Utilities Team
> 
> NIH/NLM/NCBI
> 
> eutilities at ncbi.nlm.nih.gov.
> 
> 
> 
> _______________________________________________
> Utilities-announce mailing list
> http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce
> <ATT00001.txt>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at drycafe.net  Wed Mar 24 15:27:37 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Wed, 24 Mar 2010 11:27:37 -0400
Subject: [Biojava-dev] [Open-bio-l] [Bioperl-l] Fwd:
	[Utilities-announce] NCBI Revised E-utility Usage Policy
In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov>
	<320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
	<38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>
	<320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
Message-ID: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net>


On Mar 24, 2010, at 10:51 AM, Peter wrote:

> Please give the NCBI an email - you can CC me too if you like.


Can't this be the developers' mailing list (or lists, the appropriate  
one for each toolkit)? We can even whitelist all NCBI sender addresses  
so they can easily email us if there are issues.

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From cjfields at illinois.edu  Wed Mar 24 15:44:21 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Mar 2010 10:44:21 -0500
Subject: [Biojava-dev] [Bioperl-l] Fwd: [Utilities-announce] NCBI
	Revised E-utility Usage Policy
In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov>
	<320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com>
	<38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu>
	<320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
Message-ID: <338BDDD8-2A66-4086-BFB7-35EC8F8F0D66@illinois.edu>


On Mar 24, 2010, at 9:51 AM, Peter wrote:

> On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> 
>> On Mar 24, 2010, at 9:08 AM, Peter wrote:
>> 
>>> Hi,
>>> 
>>> This is probably of interest to all the Bio* projects offering access
>>> to the NCBI Entrez utilities. See forwarded message below.
>>> 
>>> I *think* the new guidelines basically say that the email & tool parameters are
>>> optional BUT if your IP address ever gets banned for excessive use you then
>>> have to register an email & tool combination.
>>> 
>>> Regarding the email address, the NCBI say to use the email of the developer
>>> (not the end user). However, they do not distinguish between the developers
>>> of a library (like us), and the developers of an application or script using a
>>> library (who may also be the end user).
>>> 
>>> Currently we (Biopython) and I think BioPerl ask developers using our libraries
>>> to populate the email address themselves. I *think* this is still the
>>> right action.
>>> 
>>> Peter
>> 
>> 
>> Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I
>> think with the SOAP-based ones as well).  We're providing a specific set of
>> tools for user to write up their own applications end applications.  I can try
>> contacting them regarding this to get an official response to clarify this
>> somewhat.
> 
> Please give the NCBI an email - you can CC me too if you like.

Sent, have cc'd the open-bio list.  Don't want to cross-post this too much, so I think we should move the discussion there.

>> Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a
>> default, but always leave the email blank and issue a warning if it isn't
>> set.  We could just as easily leave both blank and issue warnings for both.
> 
> We currently leave out the email and set the tool parameter to "Biopython"
> by default but this can be overridden. Currently leaving out the email does
> cause Biopython to give a warning.
> 
> Peter

We follow the same, then (down to the warning).  This is mentioned in my post to them, I'll wait to see what they say.  

My concern is the wording of the new rules.  Each tool and email must be registered with them if an IP is blocked.  Does this mean each tool is assigned one specific email?  And an IP that is blocked can register it to be allowed back into the fold?  With that in mind, should we register each of our toolkits with them?  Probably not a bad thing (it might help us as devs to get an idea of use), but then if one user abuses the rules will their actions affect all toolkit users?  Is this all done on a per-IP basis, per-toolkit basis, etc?  

Unfortunately, at least to me, none of this is made very clear, so I'm hoping there is some clarification from their end.

chris


From maj at fortinbras.us  Wed Mar 24 16:37:56 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 24 Mar 2010 12:37:56 -0400
Subject: [Biojava-dev] [Bioperl-l] [Open-bio-l] Fwd:
	[Utilities-announce] NCBI RevisedE-utility Usage Policy
In-Reply-To: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A220D32B4@NIHMLBX15.nih.gov><320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com>
	<5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net>
Message-ID: <B6692F38693D41B3BE76FF47F227D257@NewLife>

I think this is a great idea--- MAJ
----- Original Message ----- 
From: "Hilmar Lapp" <hlapp at drycafe.net>
To: "Peter" <biopython at maubp.freeserve.co.uk>
Cc: <bioruby at lists.open-bio.org>; "Biopython-Dev Mailing List" 
<biopython-dev at biopython.org>; <biojava-dev at lists.open-bio.org>; "bioperl-l 
list" <bioperl-l at lists.open-bio.org>; "Chris Fields" <cjfields at illinois.edu>; 
<open-bio-l at lists.open-bio.org>
Sent: Wednesday, March 24, 2010 11:27 AM
Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI 
RevisedE-utility Usage Policy


>
> On Mar 24, 2010, at 10:51 AM, Peter wrote:
>
>> Please give the NCBI an email - you can CC me too if you like.
>
>
> Can't this be the developers' mailing list (or lists, the appropriate  one for 
> each toolkit)? We can even whitelist all NCBI sender addresses  so they can 
> easily email us if there are issues.
>
> -hilmar
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From sheoran143 at gmail.com  Thu Mar 25 01:19:29 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Wed, 24 Mar 2010 20:19:29 -0500
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with subject
 :( Hibernate Exception and suggestion for change in BioSqlSchema)
Message-ID: <4BAABA21.4000301@gmail.com>

I am writing this email again, I didn't get any response weather this 
bugs are patched or are they lost some where on mailing list. I am not 
sure that's why I am writing this back. I don't know how to apply this 
patch So I am counting on you guys to apply theses patch and reply me 
back so I know its fixed.


Thanks
Deepak Sheoran


Hi
In response to bug fix suggested by Richard I have created some patches. 
We need to apply these to fix biojava from processing references from a 
genbank record in a wrong manner which cause more hibernate exceptions. 
After applying patch, reference resolution code will test pubmed or 
medline id, then if no match then test author/title/location, then if 
still no match create a new reference. I even tested it with 
GenbankRelease 175 and I gained almost 3159 more records in my database.

Can somebody please have a look on second issue of it and fix it
"

2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

"

Also I am planning on making a bridge between biosql database loaded 
using bioperl and biojava, here is my some of the investigation can you 
guys suggest some direction on it.
Have a look on attached files
1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank 
record is stored in biosql instance by bioperl and biojava
2) GenbankRecord.doc  ==> its word document having a genbank showing 
where its information goes in biosql using bioperl and biojava
3) BioSqlRichobjectBuilder.patch ==> patch needed for 
BioSqlRichObjectBuild.java class
4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class


Thanks
Deepak Sheoran


-------- Original Message --------
Subject: 	Re: Hibernate Exception and suggestion for change in BioSqlSchema
Date: 	Tue, 9 Feb 2010 20:34:32 +1300
From: 	Richard Holland <holland at eaglegenomics.com>
To: 	Deepak Sheoran <sheoran143 at gmail.com>
CC: 	biojava-l at biojava.org


Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.

However, in answer to your two questions:

   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).

   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

cheers,
Richard

On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:

>
>  Hi Richard
>
>  Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
>
>
>  Thanks
>  Deepak Sheoran
>  -------- Original Message --------
>  Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
>  Date:	Wed, 03 Feb 2010 08:07:35 -0600
>  From:	Deepak Sheoran<sheoran143 at gmail.com>
>  To:	biojava-l at lists.open-bio.org
>
>  Hi guys,
>
>  A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
>  On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
>  	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
>  This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
>   Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
>  But problem is with below part of that method:
>  ?..LineNumber: 114
>  else if (SimpleDocRef.class.isAssignableFrom(clazz))
>   {                queryType = "DocRef";
>                  // convert List constructor to String representation for query
>                  ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
>                  if (ourParamsList.size()<3) {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
>                  } else {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
>                  }
>   }
>  ..LineNubmer: 123
>  Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
>  ?.LineNumber: 447
>  else {
>                                          try {
>                                              CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
>                                              RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
>                                              rlistener.getCurrentFeature().addRankedCrossRef(rcr);
>                                          } catch (ChangeVetoException e) {
>                                              throw new ParseException(e+", accession:"+accession);
>                                          }
>                                      }
>                      ?..LineNumber:455
>  Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
>
>  The only way to get these record in database is:
>  		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
>  		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
>
>  Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
>  Reference_id
>  Dbxref_id
>  Location
>  Title
>  Authors
>  crc
>  216
>  18554304
>  FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  9E940E01F4BE3CD0
>  230
>  18554304
>  FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  D3BC0C17F3F786C9
>  415
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  60AEDFA0CEEACC38
>  969
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  4B1232999F6E8130
>  929
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  3E79B40DD2AAA2B7
>  932
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  094EB3384F8D6DE8
>  1426
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
>  357648D8FD8C6C8A
>  1481
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
>  115411EB2DEE5654
>  1497
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D5D376EECCD186B
>  1501
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D57954EECDED66B
>  1556
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  698688FB6DB95247
>  1559
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  E25E1BA99DB18F3D
>
>  	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>  		? Which means in richsequence object some feature have location object which have its feature set to null.
>  		? My Observation:
>  			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
>  			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
>  			? Below is the screen shot of one of my tests
>  				? Settings before trying to persits the richsequence object to database
>
>  <Mail Attachment.png>
>  		?
>  		? After trying to persits the richsequence object to database and got in hibernate exception catch
>
>  		?<Mail Attachment.png>
>
>  		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
>  		? Some extra information to make things more clear to you guys.
>  			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
>  				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
>  					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
>  				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
>  					? richSequence.feature Index : 127 and line number in the genbank record : 2137
>  				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
>  					? richSequence.feature Index : 389 and line number in the genbank record : 3632
>  				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
>  					? richSequence.feature Index : 47 and line number in the genbank record : 4841
>  				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
>  					? richSequence.feature Index : 45 and line number in the genbank record : 442
>  		? The complete exception msg :
>  org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>          at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
>          at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
>          at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
http://www.eaglegenomics.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Biojava_BioPerl_diff.xls
Type: application/vnd.ms-excel
Size: 346624 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20100324/7ecffa4a/attachment-0002.xls>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: BioSqlRichObjectBuilder.patch
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20100324/7ecffa4a/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: GenbankFormat.patch
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20100324/7ecffa4a/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GenbankRecord.doc
Type: application/msword
Size: 59392 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20100324/7ecffa4a/attachment-0002.doc>

From holland at eaglegenomics.com  Thu Mar 25 16:27:17 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 25 Mar 2010 16:27:17 +0000
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with
	subject :( Hibernate Exception and suggestion for change in
	BioSqlSchema)
In-Reply-To: <4BAABA21.4000301@gmail.com>
References: <4BAABA21.4000301@gmail.com>
Message-ID: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>

Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL.

On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:

> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed.
> 
> 
> 
> Thanks
> Deepak Sheoran
> 
> 
> Hi
> In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database.
> 
> Can somebody please have a look on second issue of it and fix it
> "
> 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> "
> 
> Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it.
> Have a look on attached files 
> 1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava
> 2) GenbankRecord.doc  ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava
> 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class
> 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
> 
> 
> Thanks
> Deepak Sheoran
> 
> 
> 
> -------- Original Message --------
> Subject:	Re: Hibernate Exception and suggestion for change in BioSqlSchema
> Date:	Tue, 9 Feb 2010 20:34:32 +1300
> From:	Richard Holland <holland at eaglegenomics.com>
> To:	Deepak Sheoran <sheoran143 at gmail.com>
> CC:	biojava-l at biojava.org
> 
> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.
> 
> However, in answer to your two questions:
> 
>   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).
> 
>   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> 
> cheers,
> Richard
> 
> On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
> 
> > 
> > Hi Richard
> > 
> > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
> > 
> > 
> > Thanks
> > Deepak Sheoran
> > -------- Original Message --------
> > Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
> > Date:	Wed, 03 Feb 2010 08:07:35 -0600
> > From:	Deepak Sheoran 
> <sheoran143 at gmail.com>
> 
> > To:	
> biojava-l at lists.open-bio.org
> 
> > 
> > Hi guys,
> > 
> > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:  
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
> 
> > On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
> > 	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
> > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
> >  Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
> > But problem is with below part of that method:
> > ?..LineNumber: 114
> > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> >  {                queryType = "DocRef";
> >                 // convert List constructor to String representation for query
> >                 ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> >                 if (ourParamsList.size()<3) {
> >                         queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
> >                 } else {
> >                         queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
> >                 }       
> >  }
> > ..LineNubmer: 123
> > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
> > ?.LineNumber: 447
> > else {
> >                                         try {
> >                                             CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
> >                                             RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
> >                                             rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> >                                         } catch (ChangeVetoException e) {
> >                                             throw new ParseException(e+", accession:"+accession);
> >                                         }
> >                                     }
> >                     ?..LineNumber:455
> > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
> >  
> > The only way to get these record in database is:
> > 		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
> > 		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
> >  
> > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
> > Reference_id
> > Dbxref_id         
> > Location
> > Title
> > Authors
> > crc
> > 216
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > 9E940E01F4BE3CD0
> > 230
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > D3BC0C17F3F786C9
> > 415
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 60AEDFA0CEEACC38
> > 969
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 4B1232999F6E8130
> > 929
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 3E79B40DD2AAA2B7
> > 932
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 094EB3384F8D6DE8
> > 1426
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
> > 357648D8FD8C6C8A
> > 1481
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > 115411EB2DEE5654
> > 1497
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D5D376EECCD186B
> > 1501
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D57954EECDED66B
> > 1556
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > 698688FB6DB95247
> > 1559
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > E25E1BA99DB18F3D
> >  
> > 	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> > 		? Which means in richsequence object some feature have location object which have its feature set to null.
> > 		? My Observation:
> > 			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
> > 			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
> > 			? Below is the screen shot of one of my tests
> > 				? Settings before trying to persits the richsequence object to database
> >  
> > <Mail Attachment.png>
> > 		?  
> > 		? After trying to persits the richsequence object to database and got in hibernate exception catch
> >  
> > 		? <Mail Attachment.png>
> >  
> > 		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
> > 		? Some extra information to make things more clear to you guys.
> > 			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
> > 				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
> > 					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
> > 				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
> > 					? richSequence.feature Index : 127 and line number in the genbank record : 2137
> > 				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
> > 					? richSequence.feature Index : 389 and line number in the genbank record : 3632
> > 				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
> > 					? richSequence.feature Index : 47 and line number in the genbank record : 4841
> > 				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
> > 					? richSequence.feature Index : 45 and line number in the genbank record : 442
> > 		? The complete exception msg :
> > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> >         at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> >         at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> >         at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> >         at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> >         at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> >         at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> >         at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> >         at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> >         at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> >         at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> >         at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> >  
> >  
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: 
> holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 
> 
> 
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas at sdsc.edu  Thu Mar 25 16:47:45 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 25 Mar 2010 09:47:45 -0700
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with
	subject :( Hibernate Exception and suggestion for change in
	BioSqlSchema)
In-Reply-To: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
Message-ID: <59a41c431003250947g6ecd11cbw21c5be5858b9aa09@mail.gmail.com>

Excellent, thanks Richard and Deepak!
Andreas

On Thu, Mar 25, 2010 at 9:27 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Patched and in subversion on the head in the new Biojava 3 code. I modified
> the code slightly to simplify it. There were also parallel changes required
> over in SimpleDocRef itself to enable it to continue working without being
> connected to BioSQL.
>
> On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:
>
> > I am writing this email again, I didn't get any response weather this
> bugs are patched or are they lost some where on mailing list. I am not sure
> that's why I am writing this back. I don't know how to apply this patch So I
> am counting on you guys to apply theses patch and reply me back so I know
> its fixed.
> >
> >
> >
> > Thanks
> > Deepak Sheoran
> >
> >
> > Hi
> > In response to bug fix suggested by Richard I have created some patches.
> We need to apply these to fix biojava from processing references from a
> genbank record in a wrong manner which cause more hibernate exceptions.
> After applying patch, reference resolution code will test pubmed or medline
> id, then if no match then test author/title/location, then if still no match
> create a new reference. I even tested it with GenbankRelease 175 and I
> gained almost 3159 more records in my database.
> >
> > Can somebody please have a look on second issue of it and fix it
> > "
> > 2. I think that's a bug (compound locations with null features) but not
> sure why. Could be that the process of constructing a CompoundRichLocation
> is somehow losing the feature reference from the original
> SimpleRichLocation. Again I can't investigate until March - can someone else
> take a look at the code? (A good starting point would be to look at how a
> CompoundRichLocation decides to select the feature from the
> SimpleRichLocations it is made up from).
> > "
> >
> > Also I am planning on making a bridge between biosql database loaded
> using bioperl and biojava, here is my some of the investigation can you guys
> suggest some direction on it.
> > Have a look on attached files
> > 1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank
> record is stored in biosql instance by bioperl and biojava
> > 2) GenbankRecord.doc  ==> its word document having a genbank showing
> where its information goes in biosql using bioperl and biojava
> > 3) BioSqlRichobjectBuilder.patch ==> patch needed for
> BioSqlRichObjectBuild.java class
> > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
> >
> >
> > Thanks
> > Deepak Sheoran
> >
> >
> >
> > -------- Original Message --------
> > Subject:      Re: Hibernate Exception and suggestion for change in
> BioSqlSchema
> > Date: Tue, 9 Feb 2010 20:34:32 +1300
> > From: Richard Holland <holland at eaglegenomics.com>
> > To:   Deepak Sheoran <sheoran143 at gmail.com>
> > CC:   biojava-l at biojava.org
> >
> > Hi. It's possible that your original email didn't make it to the list
> because it is HTML format, and the list only accepts plain text.
> >
> > However, in answer to your two questions:
> >
> >   1. The code that does the resolution of references might be better if
> it looks up existing IDs rather than using author, title, location to
> identify existing records. I would suggest modifying it to a three-step
> process - test ID, then if no match then test author/title/location, then if
> still no match create a new reference. Could someone do that? (I'm unable to
> do anything until late March).
> >
> >   2. I think that's a bug (compound locations with null features) but not
> sure why. Could be that the process of constructing a CompoundRichLocation
> is somehow losing the feature reference from the original
> SimpleRichLocation. Again I can't investigate until March - can someone else
> take a look at the code? (A good starting point would be to look at how a
> CompoundRichLocation decides to select the feature from the
> SimpleRichLocations it is made up from).
> >
> > cheers,
> > Richard
> >
> > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
> >
> > >
> > > Hi Richard
> > >
> > > Below is the email which I sent to Biojava-1 mailing list but it never
> get posted on the mailing list server neither do i got any response, so
> please have a look on this email and tell what can be the solution of the
> problem described in the message.
> > >
> > >
> > > Thanks
> > > Deepak Sheoran
> > > -------- Original Message --------
> > > Subject:    Hibernate Exception and suggestion for change in
> BioSqlSchema
> > > Date:       Wed, 03 Feb 2010 08:07:35 -0600
> > > From:       Deepak Sheoran
> > <sheoran143 at gmail.com>
> >
> > > To:
> > biojava-l at lists.open-bio.org
> >
> > >
> > > Hi guys,
> > >
> > > A couple of days back I was having some problem with hibernate
> exception but that exception got resolved and the reference to that email
> is:
> >
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
> >
> > > On Richard  suggestion in above link  I am able to resolve some of
>  issues but then, I got stuck in to some other error with hibernate and then
> decided to investigate the matter and below are some facts and information
> which I found and I guess it is going to affect all of us.
> > >     ? The "Reference" table in bioSql schema have unique constraint on
> "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)).
> Which mean only one entry in reference table can use on dbxref_id.
> > > This Works wells but in cases when you have little variation in value
> of following column "location", "title", "authors" and all these variation
> refers to same PUBMED_ID. Then we can't persist or create a richsequence
> object .
> > >  Now when you tie RichObjectFactory to a  active hibernate session then
> the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class
> clazz, List paramsList) " which is responsible  for looking up details of
> object in the database and if it find one then it will return that object,
> else it will try to persist the new object into the database.
> > > But problem is with below part of that method:
> > > ?..LineNumber: 114
> > > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> > >  {                queryType = "DocRef";
> > >                 // convert List constructor to String representation
> for query
> > >                 ourParamsList.set(0,
> DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> > >                 if (ourParamsList.size()<3) {
> > >                         queryText = "from DocRef as cr where cr.authors
> = ? and cr.location = ? and cr.title is null";
> > >                 } else {
> > >                         queryText = "from DocRef as cr where cr.authors
> = ? and cr.location = ? and cr.title = ?";
> > >                 }
> > >  }
> > > ..LineNubmer: 123
> > > Now when hibernate search the database, it won't find any other record
> in "reference" table because those two record are different in string
> comparison, so it will return a new object back to "GenbankFormat" to
> following piece of code
> > > ?.LineNumber: 447
> > > else {
> > >                                         try {
> > >                                             CrossRef cr =
> (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new
> Object[]{dbname, raccession, new Integer(0)});
> > >                                             RankedCrossRef rcr = new
> SimpleRankedCrossRef(cr, ++rcrossrefCount);
> > >
> rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> > >                                         } catch (ChangeVetoException e)
> {
> > >                                             throw new
> ParseException(e+", accession:"+accession);
> > >                                         }
> > >                                     }
> > >                     ?..LineNumber:455
> > > Then we will add that object to rlistener. And move to next part of
> genbank record and then biojava search for a new crossref in database and it
> will try to persist the old one it get a hibernate exception regarding
> violation of  "unique constraint on dbxref_id" column.
> > >
> > > The only way to get these record in database is:
> > >             ? The very easy solution and the way I did it for testing
> my theory is Change the bioSql schema so that it can allow many to one on
> relation between "reference" and "dbxref" table.  Which even make sense
> because one paper can have many different variation of naming, and this
> change allow us to store that info too. But this is something BioSql people
> have decide and I don't know how to approach them.
> > >             ? Second solution is slightly difficult to implement, is to
> change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List
> paramsList)"  make decision about weather a particular DocRef already exist
> in database or not. I am mean testing all possible string variations of
> authors, location, title of the docRef which we are searching. Which does
> have many complications and may slow down process of creating a richsequence
> object when link RichObjectFactory with a active hibernate session.
> > >
> > > Example:Below is a sample of what i have in my local biosql schema
> which has modification suggested by me. (dbxref_id column have Pubmed_id , I
> replaced the local dbxref_id which was present on this table in my database
> with pubmed_id stored in "dbxref" table, for easy reference with outside
> world in this email)
> > > Reference_id
> > > Dbxref_id
> > > Location
> > > Title
> > > Authors
> > > crc
> > > 216
> > > 18554304
> > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536
> (2008)
> > > Isolation of lactate-utilizing butyrate-producing bacteria from human
> feces and in vivo administration of Anaerostipes caccae strain L2 and
> galacto-oligosaccharides in a rat model
> > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y.,
> Nomoto,K., Ito,M. and Sawada,H.
> > > 9E940E01F4BE3CD0
> > > 230
> > > 18554304
> > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > > Isolation of lactate-utilizing butyrate-producing bacteria from human
> feces and in vivo administration of Anaerostipes caccae strain L2 and
> galacto-oligosaccharides in a rat model
> > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y.,
> Nomoto,K., Ito,M. and Sawada,H.
> > > D3BC0C17F3F786C9
> > > 415
> > > 16790744
> > > Infect. Immun. 74 (7), 3715-3726 (2006)
> > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is
> Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via
> Recombination with Repetitive Chromosomal Sequences
> > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and
> Totten,P.A.
> > > 60AEDFA0CEEACC38
> > > 969
> > > 16790744
> > > Infect. Immun. 74 (7), 3715-3726 (2006)
> > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is
> extensive in vitro and in vivo and suggests that variation is generated via
> recombination with repetitive chromosomal sequences
> > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and
> Totten,P.A.
> > > 4B1232999F6E8130
> > > 929
> > > 8688087
> > > Science 273 (5278), 1058-1073 (1996)
> > > Complete genome sequence of the methanogenic archaeon, Methanococcus
> jannaschii
> > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D.,
> Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D.,
> Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I.,
> Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A.,
> Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A.,
> Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W.,
> Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P.,
> Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and
> Venter,J.C.
> > > 3E79B40DD2AAA2B7
> > > 932
> > > 8688087
> > > Science 273 (5278), 1058-1073 (1996)
> > > Complete genome sequence of the methanogenic archaeon, Methanococcus
> jannaschii
> > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D.,
> Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D.,
> Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I.,
> Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A.,
> Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T.,
> Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C.,
> Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M.,
> Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > > 094EB3384F8D6DE8
> > > 1426
> > > 10684935
> > > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae
> AR39
> > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O.,
> Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S.,
> Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M.,
> Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and
> Fraser,C.M.
> > > 357648D8FD8C6C8A
> > > 1481
> > > 10684935
> > > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae
> AR39
> > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O.,
> Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K.,
> Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W.,
> DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > > 115411EB2DEE5654
> > > 1497
> > > 14689165
> > > Arch. Microbiol. 181 (2), 144-154 (2004)
> > > The effect of FITA mutations on the symbiotic properties of
> Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R.,
> del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G.
> and Ruiz-Sainz,J.E.
> > > 4D5D376EECCD186B
> > > 1501
> > > 14689165
> > > Arch. Microbiol. 181 (2), 144-154 (2004)
> > > The effect of FITA mutations on the symbiotic properties of
> Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R.,
> Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G.
> and Ruiz-Sainz,J.E.
> > > 4D57954EECDED66B
> > > 1556
> > > 18060065
> > > PLoS ONE 2 (12), E1271 (2007)
> > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4
> and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C.,
> Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > > 698688FB6DB95247
> > > 1559
> > > 18060065
> > > PLoS ONE 2 (12), E1271 (2007)
> > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4
> and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C.,
> Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > > E25E1BA99DB18F3D
> > >
> > >     ? The second kind of error which I got was :
> org.hibernate.PropertyValueException: not-null property references a null or
> transient value: Location.feature
> > >             ? Which means in richsequence object some feature have
> location object which have its feature set to null.
> > >             ? My Observation:
> > >                     ? Usually occur when you try to persist a
> richsequence object to database, and occur to those features which have
> CompoundRichLocation usually "joins" and "complement" in cds region of a
> genbank record
> > >                     ? After catching the hibernate exception I went
> through all the features and either biojava or hibernate  changed the object
> type of a CompoundRichLocation  to SimpleRichLocation and set the feature
> variable to null.
> > >                     ? Below is the screen shot of one of my tests
> > >                             ? Settings before trying to persits the
> richsequence object to database
> > >
> > > <Mail Attachment.png>
> > >             ?
> > >             ? After trying to persits the richsequence object to
> database and got in hibernate exception catch
> > >
> > >             ? <Mail Attachment.png>
> > >
> > >             ? So my question is why is this happening and how to stop
> or how to get these record into database, I have no clue why is this
> happening.
> > >             ? Some extra information to make things more clear to you
> guys.
> > >                     ? Below are some Locus line from genbank record for
> which I know the error of location, I mean the cds region causing error, and
> array index in richsequence.feature arrayList object.
> > >                             ? LOCUS       AE001439             1643831
> bp    DNA     circular BCT 19-JAN-2006
> > >                                     ? richSequence.feature Index : 2540
> and line number in the genbank record : 22115
> > >                             ? LOCUS       CP001189             3887492
> bp    DNA     circular BCT 16-OCT-2008
> > >                                     ? richSequence.feature Index : 127
> and line number in the genbank record : 2137
> > >                             ? LOCUS       CP001292              328635
> bp    DNA     circular BCT 17-DEC-2008
> > >                                     ? richSequence.feature Index : 389
> and line number in the genbank record : 3632
> > >                             ? LOCUS       AM279694              238517
> bp    DNA     linear   BCT 23-OCT-2008
> > >                                     ? richSequence.feature Index : 47
> and line number in the genbank record : 4841
> > >                             ? LOCUS       CR931663               18517
> bp    DNA     linear   BCT 18-SEP-2008
> > >                                     ? richSequence.feature Index : 45
> and line number in the genbank record : 442
> > >             ? The complete exception msg :
> > > org.hibernate.PropertyValueException: not-null property references a
> null or transient value: Location.feature
> > >         at
> org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > >         at
> org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > >         at
> org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > >         at
> org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > >         at
> org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> > >         at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> > >         at
> trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> > >
> > >
> >
> > --
> > Richard Holland, BSc MBCS
> > Operations and Delivery Director, Eagle Genomics Ltd
> > T: +44 (0)1223 654481 ext 3 | E:
> > holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> >
> >
> >
> >
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>


From deepak.sheoran at orionbiosciences.com  Thu Mar 25 18:46:57 2010
From: deepak.sheoran at orionbiosciences.com (Deepak Sheoran)
Date: Thu, 25 Mar 2010 13:46:57 -0500
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with
 subject : ( Hibernate Exception and suggestion for change in BioSqlSchema)
In-Reply-To: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
Message-ID: <4BABAFA1.6090806@orionbiosciences.com>

That is reason why I was getting error when i was creating a 
Richsequence object without any active session to biosql, I didn't had 
the clue that I created one more bug by fixing one, thanks for noticing 
that and fixing that.

I am thinking should we use bioperl -biojava and biosql compatibility  
as one of the google summer of code project. I have vision on this, but 
don't know right way to being with. This can  help people who want to 
use biojava but can't because they are afraid to loos their Perl 
code,which is heavily dependent on perl way of loading the schema. Or 
come out with a hybrid way which have good from both languages.

Deepak Sheoran

On 3/25/2010 11:27 AM, Richard Holland wrote:
> Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL.
>
> On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:
>
>    
>> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed.
>>
>>
>>
>> Thanks
>> Deepak Sheoran
>>
>>
>> Hi
>> In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database.
>>
>> Can somebody please have a look on second issue of it and fix it
>> "
>> 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
>> "
>>
>> Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it.
>> Have a look on attached files
>> 1) Biojava_BioPerl_Diff.xls  ==>  it have view of tables where genbank record is stored in biosql instance by bioperl and biojava
>> 2) GenbankRecord.doc  ==>  its word document having a genbank showing where its information goes in biosql using bioperl and biojava
>> 3) BioSqlRichobjectBuilder.patch ==>  patch needed for BioSqlRichObjectBuild.java class
>> 4) GenBankFormat.patch ==>  patch needed for GenBankFormat.java class
>>
>>
>> Thanks
>> Deepak Sheoran
>>
>>
>>
>> -------- Original Message --------
>> Subject:	Re: Hibernate Exception and suggestion for change in BioSqlSchema
>> Date:	Tue, 9 Feb 2010 20:34:32 +1300
>> From:	Richard Holland<holland at eaglegenomics.com>
>> To:	Deepak Sheoran<sheoran143 at gmail.com>
>> CC:	biojava-l at biojava.org
>>
>> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.
>>
>> However, in answer to your two questions:
>>
>>    1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).
>>
>>    2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
>>
>> cheers,
>> Richard
>>
>> On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
>>
>>      
>>> Hi Richard
>>>
>>> Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
>>>
>>>
>>> Thanks
>>> Deepak Sheoran
>>> -------- Original Message --------
>>> Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
>>> Date:	Wed, 03 Feb 2010 08:07:35 -0600
>>> From:	Deepak Sheoran
>>>        
>> <sheoran143 at gmail.com>
>>
>>      
>>> To:	
>>>        
>> biojava-l at lists.open-bio.org
>>
>>      
>>> Hi guys,
>>>
>>> A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:
>>>        
>> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
>>
>>      
>>> On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
>>> 	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
>>> This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
>>>   Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
>>> But problem is with below part of that method:
>>> ?..LineNumber: 114
>>> else if (SimpleDocRef.class.isAssignableFrom(clazz))
>>>   {                queryType = "DocRef";
>>>                  // convert List constructor to String representation for query
>>>                  ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
>>>                  if (ourParamsList.size()<3) {
>>>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
>>>                  } else {
>>>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
>>>                  }
>>>   }
>>> ..LineNubmer: 123
>>> Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
>>> ?.LineNumber: 447
>>> else {
>>>                                          try {
>>>                                              CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
>>>                                              RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
>>>                                              rlistener.getCurrentFeature().addRankedCrossRef(rcr);
>>>                                          } catch (ChangeVetoException e) {
>>>                                              throw new ParseException(e+", accession:"+accession);
>>>                                          }
>>>                                      }
>>>                      ?..LineNumber:455
>>> Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
>>>
>>> The only way to get these record in database is:
>>> 		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
>>> 		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
>>>
>>> Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
>>> Reference_id
>>> Dbxref_id
>>> Location
>>> Title
>>> Authors
>>> crc
>>> 216
>>> 18554304
>>> FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
>>> Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>>> Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>>> 9E940E01F4BE3CD0
>>> 230
>>> 18554304
>>> FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
>>> Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>>> Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>>> D3BC0C17F3F786C9
>>> 415
>>> 16790744
>>> Infect. Immun. 74 (7), 3715-3726 (2006)
>>> Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
>>> Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>>> 60AEDFA0CEEACC38
>>> 969
>>> 16790744
>>> Infect. Immun. 74 (7), 3715-3726 (2006)
>>> Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
>>> Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>>> 4B1232999F6E8130
>>> 929
>>> 8688087
>>> Science 273 (5278), 1058-1073 (1996)
>>> Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>>> Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>>> 3E79B40DD2AAA2B7
>>> 932
>>> 8688087
>>> Science 273 (5278), 1058-1073 (1996)
>>> Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>>> Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>>> 094EB3384F8D6DE8
>>> 1426
>>> 10684935
>>> Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>>> Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>>> Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
>>> 357648D8FD8C6C8A
>>> 1481
>>> 10684935
>>> Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>>> Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>>> Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
>>> 115411EB2DEE5654
>>> 1497
>>> 14689165
>>> Arch. Microbiol. 181 (2), 144-154 (2004)
>>> The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>>> Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>>> 4D5D376EECCD186B
>>> 1501
>>> 14689165
>>> Arch. Microbiol. 181 (2), 144-154 (2004)
>>> The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>>> Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>>> 4D57954EECDED66B
>>> 1556
>>> 18060065
>>> PLoS ONE 2 (12), E1271 (2007)
>>> Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
>>> Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>>> 698688FB6DB95247
>>> 1559
>>> 18060065
>>> PLoS ONE 2 (12), E1271 (2007)
>>> Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
>>> Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>>> E25E1BA99DB18F3D
>>>
>>> 	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>>> 		? Which means in richsequence object some feature have location object which have its feature set to null.
>>> 		? My Observation:
>>> 			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
>>> 			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
>>> 			? Below is the screen shot of one of my tests
>>> 				? Settings before trying to persits the richsequence object to database
>>>
>>> <Mail Attachment.png>
>>> 		?
>>> 		? After trying to persits the richsequence object to database and got in hibernate exception catch
>>>
>>> 		?<Mail Attachment.png>
>>>
>>> 		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
>>> 		? Some extra information to make things more clear to you guys.
>>> 			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
>>> 				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
>>> 					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
>>> 				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
>>> 					? richSequence.feature Index : 127 and line number in the genbank record : 2137
>>> 				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
>>> 					? richSequence.feature Index : 389 and line number in the genbank record : 3632
>>> 				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
>>> 					? richSequence.feature Index : 47 and line number in the genbank record : 4841
>>> 				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
>>> 					? richSequence.feature Index : 45 and line number in the genbank record : 442
>>> 		? The complete exception msg :
>>> org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>>>          at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>>>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>>>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>>>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>>>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>>>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>>>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>>>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>>>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>>>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>>>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>>>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>>>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>>>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>>>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>>>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>>>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>>>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>>>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>>>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>>>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>>>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>>>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>>>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>>>          at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>>>          at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
>>>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>>>          at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
>>>          at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
>>>          at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
>>>
>>>
>>>        
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E:
>> holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>>
>> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>
>>      
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>    


From biopython at maubp.freeserve.co.uk  Thu Mar 25 22:16:55 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Mar 2010 22:16:55 +0000
Subject: [Biojava-dev] Bug fix for Biojava in regard to email with
	subject : ( Hibernate Exception and suggestion for change in
	BioSqlSchema)
In-Reply-To: <4BABAFA1.6090806@orionbiosciences.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
	<4BABAFA1.6090806@orionbiosciences.com>
Message-ID: <320fb6e01003251516w2977ab2h9869342f94576287@mail.gmail.com>

On Thu, Mar 25, 2010 at 6:46 PM, Deepak Sheoran
<deepak.sheoran at orionbiosciences.com> wrote:
>
> That is reason why I was getting error when i was creating a Richsequence
> object without any active session to biosql, I didn't had the clue that I
> created one more bug by fixing one, thanks for noticing that and fixing
> that.
>
> I am thinking should we use bioperl -biojava and biosql compatibility ?as
> one of the google summer of code project. I have vision on this, but don't
> know right way to being with. This can ?help people who want to use biojava
> but can't because they are afraid to loos their Perl code,which is heavily
> dependent on perl way of loading the schema. Or come out with a hybrid way
> which have good from both languages.
>
> Deepak Sheoran

That is an interesting idea for GSoC, I wonder if we at Biopython
should do the same. I know of a few things where we differ from
BioPerl's BioSQL support (e.g. SwissProt comment lines).

[I take we agree that bioperl-db is the de facto reference
implementation for mapping GenBank etc into BioSQL?]

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Mar 26 06:14:17 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Mar 2010 02:14:17 -0400
Subject: [Biojava-dev] [Bug 3035] New: ParseException thrown when parsing
	PDB file.
Message-ID: <bug-3035-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=3035

           Summary: ParseException thrown when parsing PDB file.
           Product: BioJava
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: structure
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: nakagawa-hiroyuki at mki.co.jp


When reading a PDB file using org.biojava.bio.structure.io.PDBFileReader on
non-English platform, java.text.ParseException is thrown.
java.text.ParseException: Unparseable date: "26-DEC-97"
        at java.text.DateFormat.parse(Unknown Source)
        at
org.biojava.bio.structure.io.PDBFileParser.pdb_HEADER_Handler(PDBFileParser.java:433)
        at
org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2067)
        at
org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:1963)
        at
org.biojava.bio.structure.io.PDBFileReader.getStructure(PDBFileReader.java:486)
        at
org.biojava.bio.structure.io.PDBFileReader.getStructure(PDBFileReader.java:466)
        at Test.main(Test.java:9)

To reproduce this symptom, 
1.      Set your operating system???s default locale to non-English one(e.g.
Japanese).
2.      Then run the test code described below.
Or simply run the test code with the option ???-Duser.language=ja???
> java -Duser.language=ja Test

----Begin Test.java ----
import org.biojava.bio.structure.io.PDBFileReader;
import org.biojava.bio.structure.Structure;
public class Test {
        public static void main(String[] args) {
                String filename =  "1a2b.pdb" ;
                PDBFileReader pdbreader = new PDBFileReader();
                try{
                        Structure structure = pdbreader.getStructure(filename);
                } catch (Exception e){
                        e.printStackTrace();
                }
        }
}
----End Test.java ----

This cause, that java.text.SimpleDateFormat can???t parse PDB style "dd-MMM-yy"
date format on some non-English locale.
I attached a patch to correct this problem.

---- Begin PDBFileParser.java.diff ----
*** .\biojava-1.7.1\src\org\biojava\bio\structure\io\PDBFileParser.java.orig   
2010-01-24 22:35:24.000000000 +0900
--- .\biojava-1.7.1\src\org\biojava\bio\structure\io\PDBFileParser.java
2010-03-19 11:34:28.571551900 +0900
***************
*** 271,277 ****
                current_compound = new Compound();
                dbrefs        = new ArrayList<DBRef>();

!               dateFormat = new SimpleDateFormat("dd-MMM-yy");
                atomCount = 0;
                atomOverflow = false;

--- 271,277 ----
                current_compound = new Compound();
                dbrefs        = new ArrayList<DBRef>();

!               dateFormat = new SimpleDateFormat("dd-MMM-yy",
java.util.Locale.ENGLISH);
                atomCount = 0;
                atomOverflow = false;

---- End PDBFileParser.java.diff ----


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 26 06:18:26 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Mar 2010 02:18:26 -0400
Subject: [Biojava-dev] [Bug 3035] ParseException thrown when parsing PDB
	file.
In-Reply-To: <bug-3035-485@http.bugzilla.open-bio.org/>
Message-ID: <201003260618.o2Q6IQEV023480@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3035


------- Comment #1 from nakagawa-hiroyuki at mki.co.jp  2010-03-26 02:18 EST -------
Created an attachment (id=1467)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1467&action=view)
A patch to correct this problem


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 26 16:25:14 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Mar 2010 12:25:14 -0400
Subject: [Biojava-dev] [Bug 3035] ParseException thrown when parsing PDB
	file.
In-Reply-To: <bug-3035-485@http.bugzilla.open-bio.org/>
Message-ID: <201003261625.o2QGPEVe012950@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3035


andreas at sdsc.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 26 16:27:56 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Mar 2010 12:27:56 -0400
Subject: [Biojava-dev] [Bug 3035] ParseException thrown when parsing PDB
	file.
In-Reply-To: <bug-3035-485@http.bugzilla.open-bio.org/>
Message-ID: <201003261627.o2QGRu2r013123@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3035


andreas at sdsc.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from andreas at sdsc.edu  2010-03-26 12:27 EST -------
applied user provided patch, problem should be fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From andreas at sdsc.edu  Mon Mar 29 02:02:49 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 28 Mar 2010 19:02:49 -0700
Subject: [Biojava-dev] Biojava3 structure
In-Reply-To: <C842AAAA-DF3B-4EB8-B240-8F9E76CFAD20@scripps.edu>
References: <C842AAAA-DF3B-4EB8-B240-8F9E76CFAD20@scripps.edu>
Message-ID: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com>

Hi Scooter,

at the present the structure modules depend on the alignment module and on
the (old) core module.  This is for aligning ATOM and SEQRES residues in the
PDB files, and for the Smith Waterman alignment based 3D structure
superposition. If we target a release of biojava 3 in about a month, I don't
think it will be possible to break this out, mainly because the alignment
module is still based on the biojava 1 code base. Overall I think that the
core module probably should still be part of the BioJava 3 release. Any
opinions on that?

Andreas

On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis <HWillis at scripps.edu> wrote:

> Andreas
>
> I needed to do some work with a PDB file so started to use the structure
> library. It looks like it depends on all the old biojava code. Mainly the
> structure exceptions that extend bioexception is the first thing tripping me
> up. Should the biojava3-structure module have any external dependencies or
> am I working with the wrong package?
>
> Thanks
>
> Scooter