From chapmanb at 50mail.com  Fri Apr  2 09:07:06 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 2 Apr 2010 09:07:06 -0400
Subject: [Biojava-dev] BOSC and OpenBio solution challenge reminder -- April
	15th
Message-ID: <20100402130706.GJ36623@sobchak.mgh.harvard.edu>

Hello all;
A friendly reminder that the deadline for the Bioinformatics Open
Source Conference (BOSC) is coming up on April 15th:

http://www.open-bio.org/wiki/BOSC_2010

This is a great opportunity to discuss code and biology with fellow
developers.

One session which I'd like to emphasize is the OpenBio Solution
Challenge, a section of talks that describes how to solve practical
problems in bioinformatics using a variety of approaches:

http://www.open-bio.org/wiki/SolutionChallenge

Any toolkit developers who are interested in giving a talk are
encouraged to submit an abstract for the challenge. We have some initial
project ideas on the page and welcome your feedback for other useful
workflows that would emphasize the advantages of using open source
toolkits to solve biological problems. Please copy messages to the
OpenBio mailing list as a central point for discussion and
questions:

http://lists.open-bio.org/mailman/listinfo/open-bio-l

Looking forward to seeing everyone in July,
Brad

BOSC contact and dates:
Date: July 9-10, 2010
Location: Boston, Massachusetts, USA
BOSC 2010 web site: http://www.open-bio.org/wiki/BOSC_2010
Abstract submission via Open Conference System site:  http://events.open-bio.org/BOSC2010/openconf.php
E-mail: bosc at open-bio.org
Bosc-announce list:  http://lists.open-bio.org/mailman/listinfo/bosc-announce

Important Dates
April 15: Abstract deadline
May 5:  Notification of accepted abstracts
May 28: Early Registration Discount Cut-off date
July 8-9:  Codefest 2010
July 9-10: BOSC 2010
August 15:  Manuscript deadline for BOSC 2010 Proceedings published in BMC Bioinformatics

From andreas at sdsc.edu  Fri Apr  2 13:25:39 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 2 Apr 2010 10:25:39 -0700
Subject: [Biojava-dev] BOSC
Message-ID: <h2n59a41c431004021025sf638add5v78097bcf0e8ade40@mail.gmail.com>

Hi,

who is going to BOSC this year and who wants to present a BioJava talk?

Andreas

-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------

From holland at eaglegenomics.com  Fri Apr  2 15:37:06 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 2 Apr 2010 20:37:06 +0100
Subject: [Biojava-dev] BOSC
In-Reply-To: <h2n59a41c431004021025sf638add5v78097bcf0e8ade40@mail.gmail.com>
References: <h2n59a41c431004021025sf638add5v78097bcf0e8ade40@mail.gmail.com>
Message-ID: <D301D60F-0C2B-4EDB-90DD-4E700CC0FBA3@eaglegenomics.com>

I will be there but for various reasons I can't talk this year.

On 2 Apr 2010, at 18:25, Andreas Prlic wrote:

> Hi,
> 
> who is going to BOSC this year and who wants to present a BioJava talk?
> 
> Andreas
> 
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From heuermh at acm.org  Fri Apr  2 23:23:15 2010
From: heuermh at acm.org (Michael Heuer)
Date: Fri, 2 Apr 2010 23:23:15 -0400 (EDT)
Subject: [Biojava-dev] BOSC
In-Reply-To: <h2n59a41c431004021025sf638add5v78097bcf0e8ade40@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.1004022320310.10023-100000@shell3.shore.net>

Andreas Prlic wrote:

> who is going to BOSC this year and who wants to present a BioJava talk?

I will be there.  I'm probably not the best person to present this time
around.

   michael


From aradwen at gmail.com  Sat Apr  3 06:18:40 2010
From: aradwen at gmail.com (Radwen Aniba)
Date: Sat, 3 Apr 2010 12:18:40 +0200
Subject: [Biojava-dev] Protein sequence composition
Message-ID: <s2me591b1bd1004030318u7584bb09j405d1b224266df4c@mail.gmail.com>

Hello,

I'm writing an application that treats protein sequences, and I am using
Biojava for a couple of things.
One of these processings is to parse protein multifasta files, and treat the
sequences one after the other. One of my purposes is to calculate
composition. By composition I mean that I am interested to know in a given
protein sequence what is the mean and the standard deviation composition of
these groups :

PAGST
EDNQ
LIVM
KRH
C

example :

protein fasta file :

>SEQ1

DVSFRLSGATSSSYGVFISNLRKALPNERKLYDIPLLRSSLPGSQRYALI
HLTNYADETISVAIDVTNVYIMGYRAGDTSYFFNEASATEAAKYVFKDAM
RKVTLPYSGNYERLQTAAGKIRENIPLGLPALDSAITTLFYYNANSAASA
LMVLIQSTSEAARYKFIEQQIGKRVDKTFLPSLAIISLENSWSALSKQIQ
IASTNNGQFESPVVLINAQNQRVTITNVDAGVVTSNIALLLNRNNMA

>SEQ2

IFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVG
LPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQED
AEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISA
LYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAP
DPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIP
IIALMVYRCAPPPSSQF

I would like to
1/ parse SEQ1 to calculate the composition mean of PAGST residues for
example ( number of residus/ length of the sequence)
2/ do same thing for SEQ2
3 / return the average mean of both sequences
4/ Return standard deviation of these values.


I can do it writing a standard java code, but I would like to know (as I am
using biojava already) if this is possible or not ( Which class / instances
to use)

Cheers

From chapman at cs.wisc.edu  Sat Apr  3 09:08:23 2010
From: chapman at cs.wisc.edu (Mark Chapman)
Date: Sat, 03 Apr 2010 08:08:23 -0500
Subject: [Biojava-dev] Protein sequence composition
In-Reply-To: <s2me591b1bd1004030318u7584bb09j405d1b224266df4c@mail.gmail.com>
References: <s2me591b1bd1004030318u7584bb09j405d1b224266df4c@mail.gmail.com>
Message-ID: <4BB73DC7.2020000@cs.wisc.edu>

Hi Radwen,

The example below solves most of what you asked for.  It may not be the most 
elegant solution, but it should get you started in the right direction.

Saving the proteins into sample.fasta and running the following command:
 > java ProteinComposition sample.fasta PAGST EDNQ LIVM KRH C

produces the output:
SEQ1	247:	0.36032388	0.19433199	0.25506073	0.097165994	0.0
SEQ2	267:	0.3445693	0.20224719	0.23595506	0.101123594	0.007490637

Take care,
Mark

-- ProteinComposition.java --

import java.io.*;
import java.util.NoSuchElementException;

import org.biojava.bio.BioException;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.db.SequenceDB;
import org.biojava.bio.seq.io.SeqIOTools;
import org.biojava.bio.symbol.*;

@SuppressWarnings("deprecation")
public class ProteinComposition {

   /**
    * Determines the composition of proteins in a Fasta file.
    * @param args <filename> <residues>...
    *   <filename>: file name of the Fasta file
    *   <residues>: group(s) of one or more amino acid residues, statistics are 
printed out for each group
    */
   public static void main(String[] args) {
     try {

       // load Fasta file into memory
       BufferedInputStream is = new BufferedInputStream(new 
FileInputStream(args[0]));
       Alphabet alpha = AlphabetManager.alphabetForName("PROTEIN");
       SequenceDB db = SeqIOTools.readFasta(is, alpha);

       // load command line arguments into memory
       SymbolList[] res = new SymbolList[args.length-1];
       for (int a = 1; a < args.length; a++)
         res[a-1] = ProteinTools.createProtein(args[a]);

       // store length and composition of each protein
       int[] lengths = new int[db.ids().size()];
       int[][] counts = new int[lengths.length][res.length];
       float[][] means = new float[lengths.length][res.length];

       // iterate over proteins in Fasta file
       SequenceIterator sI = db.sequenceIterator();
       for (int s = 0; sI.hasNext(); s++) {
         Sequence seq = sI.nextSequence();
         lengths[s] = seq.length();

         // iterate over each amino acid
         for (Object sr : seq.toList())

           // check for amino acid in each residue group
           for (int a = 1; a < args.length; a++)

             // iterate over each residue in group
             for (Object r : res[a-1].toList())

               // increment count if amino acid has a match in residue group
               if (((Symbol) r).getMatches().contains((Symbol) sr)) {
                 counts[s][a-1]++;
                 break;
               }

         // print "name length: composition" for each protein
         System.out.print(seq.getName() + "\t" + seq.length() + ":");
         for (int a = 1; a < args.length; a++)
           System.out.print("\t" + (means[s][a-1] = (float) counts[s][a-1] / 
lengths[s]));
         System.out.println();
       }

     } catch (FileNotFoundException ex) {
       System.err.println("Problem reading file...");
       ex.printStackTrace();
     } catch (BioException ex) {
       System.err.println("File not in fasta format or wrong alphabet...");
       ex.printStackTrace();
     } catch (NoSuchElementException ex) {
       System.err.println("No fasta sequences in the file...");
       ex.printStackTrace();
     }
   }

}


On 4/3/2010 5:18 AM, Radwen Aniba wrote:
> Hello,
>
> I'm writing an application that treats protein sequences, and I am using
> Biojava for a couple of things.
> One of these processings is to parse protein multifasta files, and treat the
> sequences one after the other. One of my purposes is to calculate
> composition. By composition I mean that I am interested to know in a given
> protein sequence what is the mean and the standard deviation composition of
> these groups :
>
> PAGST
> EDNQ
> LIVM
> KRH
> C
>
> example :
>
> protein fasta file :
>
>> SEQ1
>
> DVSFRLSGATSSSYGVFISNLRKALPNERKLYDIPLLRSSLPGSQRYALI
> HLTNYADETISVAIDVTNVYIMGYRAGDTSYFFNEASATEAAKYVFKDAM
> RKVTLPYSGNYERLQTAAGKIRENIPLGLPALDSAITTLFYYNANSAASA
> LMVLIQSTSEAARYKFIEQQIGKRVDKTFLPSLAIISLENSWSALSKQIQ
> IASTNNGQFESPVVLINAQNQRVTITNVDAGVVTSNIALLLNRNNMA
>
>> SEQ2
>
> IFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVG
> LPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQED
> AEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISA
> LYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAP
> DPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIP
> IIALMVYRCAPPPSSQF
>
> I would like to
> 1/ parse SEQ1 to calculate the composition mean of PAGST residues for
> example ( number of residus/ length of the sequence)
> 2/ do same thing for SEQ2
> 3 / return the average mean of both sequences
> 4/ Return standard deviation of these values.
>
>
> I can do it writing a standard java code, but I would like to know (as I am
> using biojava already) if this is possible or not ( Which class / instances
> to use)
>
> Cheers
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

From andreas.prlic at gmail.com  Sat Apr  3 11:20:58 2010
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Sat, 3 Apr 2010 08:20:58 -0700
Subject: [Biojava-dev] BOSC
In-Reply-To: <Pine.GSO.4.44.1004022320310.10023-100000@shell3.shore.net>
References: <Pine.GSO.4.44.1004022320310.10023-100000@shell3.shore.net>
Message-ID: <2E000562-884F-4172-A94D-61488C605A9F@gmail.com>

I am planning to attend 3d-SIG this year and will be mentioning  
biojava there...

Andreas

On 2 Apr 2010, at 20:23, Michael Heuer <heuermh at acm.org> wrote:

> Andreas Prlic wrote:
>
>> who is going to BOSC this year and who wants to present a BioJava  
>> talk?
>
> I will be there.  I'm probably not the best person to present this  
> time
> around.
>
>   michael
>

From sheoran143 at gmail.com  Sun Apr 11 15:16:29 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Sun, 11 Apr 2010 14:16:29 -0500
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
Message-ID: <4BC2200D.8000109@gmail.com>

Hi,

Their is very fundamental issue in SimpleNCBITaxon class becuase of 
which it is producing wrong taxonomy hierarchy. I am explaing what I 
have found let me what you guys think of it, and me suggest how to fix it.

1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, 
nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to 
have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not 
true. The value which "parent_taxon_id" have is "taxon_id" which have 
parent_ncbi_taxon_id of current ncbi_taxon_id.

<property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
<property name="nodeRank" column="node_rank"/>
<property name="geneticCode" column="genetic_code"/>
<property name="mitoGeneticCode" column="mito_genetic_code"/>
<property name="leftValue" column="left_value"/>
<property name="rightValue" column="right_value"/>
<property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- 
its not correct column parent_taxon_id stores the taxon_id which have 
parent_ncbi_taxon_id for current entry

Thanks
Deepak Sheoran


From holland at eaglegenomics.com  Sun Apr 11 15:53:06 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Sun, 11 Apr 2010 20:53:06 +0100
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
In-Reply-To: <4BC2200D.8000109@gmail.com>
References: <4BC2200D.8000109@gmail.com>
Message-ID: <B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>

I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).

thanks,
Richard

On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:

> Hi,
> 
> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
> 
> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
> 
> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
> <property name="nodeRank" column="node_rank"/>
> <property name="geneticCode" column="genetic_code"/>
> <property name="mitoGeneticCode" column="mito_genetic_code"/>
> <property name="leftValue" column="left_value"/>
> <property name="rightValue" column="right_value"/>
> <property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
> 
> Thanks
> Deepak Sheoran
> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From sheoran143 at gmail.com  Sun Apr 11 17:08:22 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Sun, 11 Apr 2010 16:08:22 -0500
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
In-Reply-To: <B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
Message-ID: <4BC23A46.7090304@gmail.com>

I am using same table with biojava and bioperl taxon program and the 
output I get is below:

*Biojava:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage 
i get is
             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia 
australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum 
var. haydenii.

Biojava process of finding names: 
11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   
(wrong way of doing things)

*Bioperl:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage 
i get is
           Retroviridae; Orthoretrovirinae; Alpharetrovirus; 
unclassified  Alpharetrovirus.

Bioperl process of finding names: 
11876==>353825==>153057==>327045==>11632   (Right way of doing things)

Hint: biojava search ncbi_taxon_id column with a value from 
parent_taxon_id where bioperl search taxon_id column with a value from 
parent_taxon_id.

*Taxon and Taxon_name Table content which is being relevant  in discussion:*

taxon_id 	ncbi_taxon_id 	parent_taxon_id 	node_rank 	name 	name_class
2901 	3609 	276240 	genus 	Rhamnus 	scientific name
3610 	4403 	3609 	species 	Platanus occidentalis 	scientific name
29052 	48579 	4403 	species 	Suillus placidus 	scientific name
114412 	143975 	48579 	species 	Diadasia australis 	scientific name
143976 	176516 	143975 	species 	Arnicastrum guerrerense 	scientific name
30680 	50447 	176516 	family 	Labiduridae 	scientific name
254757 	301952 	50447 	varietas 	Oreostemma alpigenum var. haydenii 
scientific name
9394 	11632 	17394 	family 	Retroviridae 	scientific name
277861 	327045 	9394 	subfamily 	Orthoretrovirinae 	scientific name
122448 	153057 	277861 	genus 	Alpharetrovirus 	scientific name
301952 	353825 	122448 	no rank 	unclassified Alpharetrovirus 
scientific name
9584
	11876
	301952
	species
	Avian sarcoma virus
	scientifice name


Thanks
Deepak

On 4/11/2010 2:53 PM, Richard Holland wrote:
> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>
> thanks,
> Richard
>
> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>
>    
>> Hi,
>>
>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>
>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>
>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>> <property name="nodeRank" column="node_rank"/>
>> <property name="geneticCode" column="genetic_code"/>
>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>> <property name="leftValue" column="left_value"/>
>> <property name="rightValue" column="right_value"/>
>> <property name="parentNCBITaxID" column="parent_taxon_id"/>       ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>
>> Thanks
>> Deepak Sheoran
>>
>>
>>      
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>    


From sheoran143 at gmail.com  Sun Apr 11 18:48:00 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Sun, 11 Apr 2010 17:48:00 -0500
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
In-Reply-To: <4BC23A46.7090304@gmail.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
	<4BC23A46.7090304@gmail.com>
Message-ID: <4BC251A0.4090602@gmail.com>

If we don't want to change the current code in biojava and still want to 
fix this bug I have found a way,
1) we can do this by changing one of hibernate files called 
"Taxon.hbm.xml" and replace the line
<property name="parentNCBITaxID" column="parent_taxon_id"/>
     with
<property name="parentNCBITaxID" formula="(select tax.ncbi_taxon_id from 
taxon tax where tax.taxon_id = parent_taxon_id)"/>

by changing the above setting in hibernate setting I am able to get the 
correct linage for ncbi_taxon_id = 11876(Avian sarcoma virus) which is
              Viruses; Retro-transcribing viruses; Retroviridae; 
Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus.

2) But the possible issue which we might get is with Taxonomy loader 
class which want to insert something for parent taxon_id into taxon 
table which  I think won't be possible if we do this change to hibernate 
con-fig file.

Deepak Sheoran


On 4/11/2010 4:08 PM, Deepak Sheoran wrote:
> I am using same table with biojava and bioperl taxon program and the 
> output I get is below:
>
> *Biojava:*
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the 
> lineage i get is
>             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia 
> australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum 
> var. haydenii.
>
> Biojava process of finding names: 
> 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   
> (wrong way of doing things)
>
> *Bioperl:*
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the 
> lineage i get is
>           Retroviridae; Orthoretrovirinae; Alpharetrovirus; 
> unclassified  Alpharetrovirus.
>
> Bioperl process of finding names: 
> 11876==>353825==>153057==>327045==>11632   (Right way of doing things)
>
> Hint: biojava search ncbi_taxon_id column with a value from 
> parent_taxon_id where bioperl search taxon_id column with a value from 
> parent_taxon_id.
>
> *Taxon and Taxon_name Table content which is being relevant  in 
> discussion:*
>
> taxon_id 	ncbi_taxon_id 	parent_taxon_id 	node_rank 	name 	name_class
> 2901 	3609 	276240 	genus 	Rhamnus 	scientific name
> 3610 	4403 	3609 	species 	Platanus occidentalis 	scientific name
> 29052 	48579 	4403 	species 	Suillus placidus 	scientific name
> 114412 	143975 	48579 	species 	Diadasia australis 	scientific name
> 143976 	176516 	143975 	species 	Arnicastrum guerrerense 	scientific name
> 30680 	50447 	176516 	family 	Labiduridae 	scientific name
> 254757 	301952 	50447 	varietas 	Oreostemma alpigenum var. haydenii 
> scientific name
> 9394 	11632 	17394 	family 	Retroviridae 	scientific name
> 277861 	327045 	9394 	subfamily 	Orthoretrovirinae 	scientific name
> 122448 	153057 	277861 	genus 	Alpharetrovirus 	scientific name
> 301952 	353825 	122448 	no rank 	unclassified Alpharetrovirus 
> scientific name
> 9584
> 	11876
> 	301952
> 	species
> 	Avian sarcoma virus
> 	scientifice name
>
>
> Thanks
> Deepak
>
> On 4/11/2010 2:53 PM, Richard Holland wrote:
>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>>
>> thanks,
>> Richard
>>
>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>>
>>    
>>> Hi,
>>>
>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>>
>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>>
>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>> <property name="nodeRank" column="node_rank"/>
>>> <property name="geneticCode" column="genetic_code"/>
>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>> <property name="leftValue" column="left_value"/>
>>> <property name="rightValue" column="right_value"/>
>>> <property name="parentNCBITaxID" column="parent_taxon_id"/>       ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>>
>>> Thanks
>>> Deepak Sheoran
>>>
>>>
>>>      
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>    
>


From holland at eaglegenomics.com  Mon Apr 12 03:07:55 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 12 Apr 2010 08:07:55 +0100
Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class
In-Reply-To: <D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
	<4BC23A46.7090304@gmail.com>
	<D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
Message-ID: <E7FB88D1-52D9-496C-86FA-738419FFF579@eaglegenomics.com>

Incidentally, BioJava's approach matches the description in the BioSQL docs at:

 http://biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME

(first example SQL statement - find the taxon id of the parent taxon for 'Homo sapiens' using a self-join)

The BioPerl/BioSQL load_ncbi_taxonomy.pl script however does not match this description.

cheers,
Richard

On 12 Apr 2010, at 07:57, Richard Holland wrote:

> Thanks Deepak. 
> 
> I've had a look at the code and I believe its due to the different ways in which BioJava and BioPerl load the taxon table. 
> 
> BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless auto-generated value that is never used.
> 
> BioPerl however is generating taxon_id values and linking them by setting parent_taxon_id to the generated value. The parent value from the NCBI taxonomy file is therefore replaced with the BioPerl generated parent ID, meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the link is to taxon_id instead. (I'm basing this comment on looking at load_ncbi_taxonomy.pl from the BioSQL archives.)
> 
> I believe if you load the taxonomy table using BioJava, you should see BioJava giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will behave correctly. But if you load with one then query with the other, you'll get incorrect results.
> 
> This sounds like a case for discussion on both lists - a matter of standardisation between the two projects. Not quickly/easily solvable for now.
> 
> cheers,
> Richard
> 
> On 11 Apr 2010, at 22:08, Deepak Sheoran wrote:
> 
>> I am using same table with biojava and bioperl taxon program and the output I get is below:
>> 
>> Biojava:
>> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is 
>>            Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. 
>> 
>> Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   (wrong way of doing things)
>> 
>> Bioperl:    
>> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is 
>>          Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified  Alpharetrovirus.
>> 
>> Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632   (Right way of doing things)
>> 
>> Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id.
>> 
>> Taxon and Taxon_name Table content which is being relevant  in discussion:
>> 
>> taxon_id	ncbi_taxon_id	parent_taxon_id	node_rank	name	name_class
>> 2901	3609	276240	genus	Rhamnus	scientific name
>> 3610	4403	3609	species	Platanus occidentalis	scientific name
>> 29052	48579	4403	species	Suillus placidus	scientific name
>> 114412	143975	48579	species	Diadasia australis	scientific name
>> 143976	176516	143975	species	Arnicastrum guerrerense	scientific name
>> 30680	50447	176516	family	Labiduridae	scientific name
>> 254757	301952	50447	varietas	Oreostemma alpigenum var. haydenii	scientific name
>> 9394	11632	17394	family	Retroviridae	scientific name
>> 277861	327045	9394	subfamily	Orthoretrovirinae	scientific name
>> 122448	153057	277861	genus	Alpharetrovirus	scientific name
>> 301952	353825	122448	no rank	unclassified Alpharetrovirus	scientific name
>> 9584
>> 11876
>> 301952
>> species
>> Avian sarcoma virus
>> scientifice name
>> 
>> Thanks
>> Deepak 
>> 
>> On 4/11/2010 2:53 PM, Richard Holland wrote:
>>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>>> 
>>> thanks,
>>> Richard
>>> 
>>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>>> 
>>> 
>>> 
>>>> Hi,
>>>> 
>>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>>> 
>>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>>> 
>>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>>> <property name="nodeRank" column="node_rank"/>
>>>> <property name="geneticCode" column="genetic_code"/>
>>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>>> <property name="leftValue" column="left_value"/>
>>>> <property name="rightValue" column="right_value"/>
>>>> <property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>>> 
>>>> Thanks
>>>> Deepak Sheoran
>>>> 
>>>> 
>>>> 
>>>> 
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: 
>>> holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>> 
>>> 
>>> 
>>> 
>> 
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Apr 12 02:57:57 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 12 Apr 2010 07:57:57 +0100
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
In-Reply-To: <4BC23A46.7090304@gmail.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
	<4BC23A46.7090304@gmail.com>
Message-ID: <D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>

Thanks Deepak. 

I've had a look at the code and I believe its due to the different ways in which BioJava and BioPerl load the taxon table. 

BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless auto-generated value that is never used.

BioPerl however is generating taxon_id values and linking them by setting parent_taxon_id to the generated value. The parent value from the NCBI taxonomy file is therefore replaced with the BioPerl generated parent ID, meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the link is to taxon_id instead. (I'm basing this comment on looking at load_ncbi_taxonomy.pl from the BioSQL archives.)

I believe if you load the taxonomy table using BioJava, you should see BioJava giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will behave correctly. But if you load with one then query with the other, you'll get incorrect results.

This sounds like a case for discussion on both lists - a matter of standardisation between the two projects. Not quickly/easily solvable for now.

cheers,
Richard

On 11 Apr 2010, at 22:08, Deepak Sheoran wrote:

> I am using same table with biojava and bioperl taxon program and the output I get is below:
> 
> Biojava:
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is 
>             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. 
> 
> Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   (wrong way of doing things)
> 
> Bioperl:    
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is 
>           Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified  Alpharetrovirus.
> 
> Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632   (Right way of doing things)
> 
> Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id.
> 
> Taxon and Taxon_name Table content which is being relevant  in discussion:
> 
> taxon_id	ncbi_taxon_id	parent_taxon_id	node_rank	name	name_class
> 2901	3609	276240	genus	Rhamnus	scientific name
> 3610	4403	3609	species	Platanus occidentalis	scientific name
> 29052	48579	4403	species	Suillus placidus	scientific name
> 114412	143975	48579	species	Diadasia australis	scientific name
> 143976	176516	143975	species	Arnicastrum guerrerense	scientific name
> 30680	50447	176516	family	Labiduridae	scientific name
> 254757	301952	50447	varietas	Oreostemma alpigenum var. haydenii	scientific name
> 9394	11632	17394	family	Retroviridae	scientific name
> 277861	327045	9394	subfamily	Orthoretrovirinae	scientific name
> 122448	153057	277861	genus	Alpharetrovirus	scientific name
> 301952	353825	122448	no rank	unclassified Alpharetrovirus	scientific name
> 9584
> 11876
> 301952
> species
> Avian sarcoma virus
> scientifice name
> 
> Thanks
> Deepak 
> 
> On 4/11/2010 2:53 PM, Richard Holland wrote:
>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>> 
>> thanks,
>> Richard
>> 
>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>> 
>>   
>> 
>>> Hi,
>>> 
>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>> 
>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>> 
>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>> <property name="nodeRank" column="node_rank"/>
>>> <property name="geneticCode" column="genetic_code"/>
>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>> <property name="leftValue" column="left_value"/>
>>> <property name="rightValue" column="right_value"/>
>>> <property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>> 
>>> Thanks
>>> Deepak Sheoran
>>> 
>>> 
>>>     
>>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: 
>> holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
>> 
>>   
>> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From trevor.paterson at roslin.ed.ac.uk  Tue Apr 13 07:41:01 2010
From: trevor.paterson at roslin.ed.ac.uk (trevor paterson (RI))
Date: Tue, 13 Apr 2010 12:41:01 +0100
Subject: [Biojava-dev] Biojava3 structure
In-Reply-To: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com>
Message-ID: <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk>

Andreas

I am trying to do an anoymous checkout of the whole bio-java 3 trunk  and it is failing on the structure module

I cant even do a copy command

the src/main tree seems corrupted - throwing an error
Error: Decompression of svndiff data failed  

Trevor Paterson PhD
new email trevor.paterson at roslin.ed.ac.uk

Bioinformatics 
The Roslin Institute
The Royal (Dick) School of Veterinary Studies
University of Edinburgh
Scotland EH25 9PS
phone +44 (0)131 5274197
http://www.roslin.ed.ac.uk
http://www.resspecies.org
http://www.thearkdb.org
Please consider the environment before printing this e-mail

The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336
Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. 

 
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org 
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of 
> Andreas Prlic
> Sent: 29 March 2010 03:03
> To: Scooter Willis
> Cc: biojava-dev
> Subject: Re: [Biojava-dev] Biojava3 structure
> 
> Hi Scooter,
> 
> at the present the structure modules depend on the alignment 
> module and on the (old) core module.  This is for aligning 
> ATOM and SEQRES residues in the PDB files, and for the Smith 
> Waterman alignment based 3D structure superposition. If we 
> target a release of biojava 3 in about a month, I don't think 
> it will be possible to break this out, mainly because the 
> alignment module is still based on the biojava 1 code base. 
> Overall I think that the core module probably should still be 
> part of the BioJava 3 release. Any opinions on that?
> 
> Andreas
> 
> On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis 
> <HWillis at scripps.edu> wrote:
> 
> > Andreas
> >
> > I needed to do some work with a PDB file so started to use the 
> > structure library. It looks like it depends on all the old biojava 
> > code. Mainly the structure exceptions that extend 
> bioexception is the 
> > first thing tripping me up. Should the biojava3-structure 
> module have 
> > any external dependencies or am I working with the wrong package?
> >
> > Thanks
> >
> > Scooter
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 

From andreas at sdsc.edu  Tue Apr 13 10:04:20 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 13 Apr 2010 07:04:20 -0700
Subject: [Biojava-dev] Biojava3 structure
In-Reply-To: <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk>
References: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com>
	<050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk>
Message-ID: <g2p59a41c431004130704i9c4b8ca7h77b18da7e2bb8808@mail.gmail.com>

Hi Trevor,

I can confirm the same behaviour from our anonymous SVN. Developer SVN
seems to be ok and I also ran an svnadmin verify without problems. I
suppose we are having issues with the anonymous SVN server again...
I'll ask the OBF helpdesk to take a another look ...

Can you try and let us know if checkout from svn/git  from github
works for you in the meanwhile ? e.g.

svn co http://svn.github.com/biojava/biojava.git ./biojava

Thanks,

Andreas


On Tue, Apr 13, 2010 at 4:41 AM, trevor paterson (RI)
<trevor.paterson at roslin.ed.ac.uk> wrote:
> Andreas
>
> I am trying to do an anoymous checkout of the whole bio-java 3 trunk ?and it is failing on the structure module
>
> I cant even do a copy command
>
> the src/main tree seems corrupted - throwing an error
> Error: Decompression of svndiff data failed
>
> Trevor Paterson PhD
> new email trevor.paterson at roslin.ed.ac.uk
>
> Bioinformatics
> The Roslin Institute
> The Royal (Dick) School of Veterinary Studies
> University of Edinburgh
> Scotland EH25 9PS
> phone +44 (0)131 5274197
> http://www.roslin.ed.ac.uk
> http://www.resspecies.org
> http://www.thearkdb.org
> Please consider the environment before printing this e-mail
>
> The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336
> Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
>
>
>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of
>> Andreas Prlic
>> Sent: 29 March 2010 03:03
>> To: Scooter Willis
>> Cc: biojava-dev
>> Subject: Re: [Biojava-dev] Biojava3 structure
>>
>> Hi Scooter,
>>
>> at the present the structure modules depend on the alignment
>> module and on the (old) core module. ?This is for aligning
>> ATOM and SEQRES residues in the PDB files, and for the Smith
>> Waterman alignment based 3D structure superposition. If we
>> target a release of biojava 3 in about a month, I don't think
>> it will be possible to break this out, mainly because the
>> alignment module is still based on the biojava 1 code base.
>> Overall I think that the core module probably should still be
>> part of the BioJava 3 release. Any opinions on that?
>>
>> Andreas
>>
>> On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis
>> <HWillis at scripps.edu> wrote:
>>
>> > Andreas
>> >
>> > I needed to do some work with a PDB file so started to use the
>> > structure library. It looks like it depends on all the old biojava
>> > code. Mainly the structure exceptions that extend
>> bioexception is the
>> > first thing tripping me up. Should the biojava3-structure
>> module have
>> > any external dependencies or am I working with the wrong package?
>> >
>> > Thanks
>> >
>> > Scooter
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------


From biopython at maubp.freeserve.co.uk  Thu Apr 15 13:54:56 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 15 Apr 2010 18:54:56 +0100
Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class
In-Reply-To: <D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
	<4BC23A46.7090304@gmail.com>
	<D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
Message-ID: <m2o320fb6e01004151054rcb57a28fvad135dffbe35d5fa@mail.gmail.com>

Hi,

I've CC'd this to the BioSQL mailing list for cross project
discussion.

On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland  wrote:
> Thanks Deepak.
>
> I've had a look at the code and I believe its due to the
> different ways in which BioJava and BioPerl load the
> taxon table.
>
> BioJava sets the ncbi_taxon_id and parent_taxon_id
> columns based on the values from the NCBI taxonomy
> file. The taxon_id column in BioJava is a meaningless
> auto-generated value that is never used.
>
> BioPerl however is generating taxon_id values and
> linking them by setting parent_taxon_id to the
> generated value. The parent value from the NCBI
> taxonomy file is therefore replaced with the BioPerl
> generated parent ID, meaning that instead of linking
> from parent_taxon_id to ncbi_taxon_id as per BioJava,
> the link is to taxon_id instead. (I'm basing this
> comment on looking at load_ncbi_taxonomy.pl from
> the BioSQL archives.)

Note that old versions of load_ncbi_taxonomy.pl
(which is part of BioSQL, not part of BioPerl) would
set taxon_id equal to ncbi_taxon_id, see:
http://bugzilla.open-bio.org/show_bug.cgi?id=2470

This may help explain the confusion.

> I believe if you load the taxonomy table using BioJava,
> you should see BioJava giving correct behaviour.
> Likewise if you load it using BioPerl, BioPerl will
> behave correctly. But if you load with one then query
> with the other, you'll get incorrect results.
>
> This sounds like a case for discussion on both lists -
> a matter of standardisation between the two projects.
> Not quickly/easily solvable for now.

Its not just two projects (BioPerl & BioJava) (grin).
Its at least five projects (BioSQL itself plus BioRuby
and Biopython).

I'm not sure about BioRuby's implementation, but
currently I think BioJava is the odd one out - BioPerl,
Biopython, and the BioSQL's load_ncbi_taxonomy.pl
all make entries in parent_taxon_id reference the
automatically generated taxon_id (please correct
me if I am wrong).

My personal view is that bioperl-db is the reference
implementation and should be followed in the event
of any ambiguity within BioSQL. In this particular
case, there is actually a BioSQL script to check
against too (load_ncbi_taxonomy.pl).

Hopefully Hilmar can give us an official verdict...

Peter

From andreas at sdsc.edu  Fri Apr 16 13:39:37 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 16 Apr 2010 10:39:37 -0700
Subject: [Biojava-dev] Biojava3-genetics
In-Reply-To: <4BC806F4.3090302@wur.nl>
References: <4BC806F4.3090302@wur.nl>
Message-ID: <r2n59a41c431004161039hd93b268eu159de8a6659d969f@mail.gmail.com>

Hi Richard,

any contribution is welcome. What do you have in mind in particular? Perhaps
there is already something there along those lines...

Andreas

On Thu, Apr 15, 2010 at 11:43 PM, Richard Finkers <Richard.Finkers at wur.nl>wrote:

> Dear List,
>
> I would be interested in adding a module for genetic analysis to the
> biojava3 project. Are there others who are interested in this as well and
> with who should I discuss this further?
>
> Thanks,
> Richard
>
>
> --
> Dr. Richard Finkers
> Researcher Plant Breeding
> Wageningen UR Plant Breeding
> P.O. Box 16, 6700 AA, Wageningen, The Netherlands
> Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB
> Wageningen, The Netherlands
> Tel. +31-317-484165 Fax +31-317-418094
> http://www.plantbreeding.wur.nl/ <http://www.plantbreeding.wur.nl>
> https://www.eu-sol.wur.nl/ <https://www.eu-sol.wur.nl>
> https://cbsgdbase.wur.nl/ <https://cbsgdbase.wur.nl>
> http://solgenomics.wur.nl/ <http://solgenomics.wur.nl>
> http://www.disclaimer-uk.wur.nl/
>
>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------

From sheoran143 at gmail.com  Fri Apr 16 14:43:59 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Fri, 16 Apr 2010 13:43:59 -0500
Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class
In-Reply-To: <m2o320fb6e01004151054rcb57a28fvad135dffbe35d5fa@mail.gmail.com>
References: <4BC2200D.8000109@gmail.com>	
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>	
	<4BC23A46.7090304@gmail.com>	
	<D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
	<m2o320fb6e01004151054rcb57a28fvad135dffbe35d5fa@mail.gmail.com>
Message-ID: <4BC8AFEF.70107@gmail.com>

What my experience says on this issue we should make use of taxon_id 
because its a unique key in a local instance of biosql.
ncbi_taxon_id should only be used for mapping purpose only so that a 
person can map his local taxon_id to a ncbi_taxon_id otherwise it defeat 
the sole purpose of having taxon_id as primary key in taxon table. The 
main goal which I think when biosql is designed is to make it 
independent of any other organization like genbank or NCBI but its a 
feature so that we can map a number(ncbi_taxon_id) given by a know 
authority to a local number (taxon_id).

Deepak Sheoran

On 4/15/2010 12:54 PM, Peter wrote:
> Hi,
>
> I've CC'd this to the BioSQL mailing list for cross project
> discussion.
>
> On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland  wrote:
>    
>> Thanks Deepak.
>>
>> I've had a look at the code and I believe its due to the
>> different ways in which BioJava and BioPerl load the
>> taxon table.
>>
>> BioJava sets the ncbi_taxon_id and parent_taxon_id
>> columns based on the values from the NCBI taxonomy
>> file. The taxon_id column in BioJava is a meaningless
>> auto-generated value that is never used.
>>
>> BioPerl however is generating taxon_id values and
>> linking them by setting parent_taxon_id to the
>> generated value. The parent value from the NCBI
>> taxonomy file is therefore replaced with the BioPerl
>> generated parent ID, meaning that instead of linking
>> from parent_taxon_id to ncbi_taxon_id as per BioJava,
>> the link is to taxon_id instead. (I'm basing this
>> comment on looking at load_ncbi_taxonomy.pl from
>> the BioSQL archives.)
>>      
> Note that old versions of load_ncbi_taxonomy.pl
> (which is part of BioSQL, not part of BioPerl) would
> set taxon_id equal to ncbi_taxon_id, see:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2470
>
> This may help explain the confusion.
>
>    
>> I believe if you load the taxonomy table using BioJava,
>> you should see BioJava giving correct behaviour.
>> Likewise if you load it using BioPerl, BioPerl will
>> behave correctly. But if you load with one then query
>> with the other, you'll get incorrect results.
>>
>> This sounds like a case for discussion on both lists -
>> a matter of standardisation between the two projects.
>> Not quickly/easily solvable for now.
>>      
> Its not just two projects (BioPerl&  BioJava) (grin).
> Its at least five projects (BioSQL itself plus BioRuby
> and Biopython).
>
> I'm not sure about BioRuby's implementation, but
> currently I think BioJava is the odd one out - BioPerl,
> Biopython, and the BioSQL's load_ncbi_taxonomy.pl
> all make entries in parent_taxon_id reference the
> automatically generated taxon_id (please correct
> me if I am wrong).
>
> My personal view is that bioperl-db is the reference
> implementation and should be followed in the event
> of any ambiguity within BioSQL. In this particular
> case, there is actually a BioSQL script to check
> against too (load_ncbi_taxonomy.pl).
>
> Hopefully Hilmar can give us an official verdict...
>
> Peter
>    


From sylvain.foisy at diploide.net  Sat Apr 17 10:00:07 2010
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Sat, 17 Apr 2010 10:00:07 -0400 (EDT)
Subject: [Biojava-dev] Eclipse + maven woes...
Message-ID: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net>

Hi,

Again, I feel stupid asking these newbie questions... I finally got my and
on a new MacBook Pro and re-installing the apps to get stuff moving. As
usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a
checkout of the developer's tree.

I have installed the latest Subversion and Maven plugins. When I want to
create a new project, I try the following:

1) I right click to select "New > Other..." in the Navigator panel;

2) I select "SVN > Project from SVN", which leads me to a window where the
 location of the developer's tree is in svn+ssh; in the window that comes
up next, I use this URL to get the "Finish" button activated:

 svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk

3) After that, I choose the "Check out as a project configured using the New
Project Wizard", which pop the window where I select "Maven > Maven
Project".

4) I get a "New Maven Project" window where I select the default. The window
then changes to a "Select archetype" where I also use the default
selections.

5) This is where I can't seem to be moving forward... The window that pops
out ask me for an Artefact ID. I am clueless about what to put... The
process stops there :-(

Maven is probably a cool tool but its learning curve is pretty steep...
Shouldn't all this be automatic after "Maven > Maven Project"

Thanks in advance. I'll put the solution into the wiki ;-)

Sylvain

===================================================================

 Sylvain Foisy, Ph. D.
 Consultant Bio-informatique / Bioinformatics
 Diploide.net - TI pour la vie / IT for Life

 Courriel: sylvain.foisy at diploide.net
 Web: http://www.diploide.net

===================================================================


From heuermh at acm.org  Sun Apr 18 23:33:00 2010
From: heuermh at acm.org (Michael Heuer)
Date: Sun, 18 Apr 2010 23:33:00 -0400 (EDT)
Subject: [Biojava-dev] Eclipse + maven woes...
In-Reply-To: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net>
Message-ID: <Pine.GSO.4.44.1004182329080.28020-100000@shell3.shore.net>

Sylvain Foisy wrote:

> Again, I feel stupid asking these newbie questions... I finally got my and
> on a new MacBook Pro and re-installing the apps to get stuff moving. As
> usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a
> checkout of the developer's tree.
>
> I have installed the latest Subversion and Maven plugins. When I want to
> create a new project, I try the following:
>
> 1) I right click to select "New > Other..." in the Navigator panel;
>
> 2) I select "SVN > Project from SVN", which leads me to a window where the
>  location of the developer's tree is in svn+ssh; in the window that comes
> up next, I use this URL to get the "Finish" button activated:
>
>  svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk
>
> 3) After that, I choose the "Check out as a project configured using the New
> Project Wizard", which pop the window where I select "Maven > Maven
> Project".
>
> 4) I get a "New Maven Project" window where I select the default. The window
> then changes to a "Select archetype" where I also use the default
> selections.

This last step doesn't sound right, here Eclipse is creating a brand new
Maven project for you instead of creating a Maven-based project from the
metadata already in subversion.

In the SVN window you should see "Check out as Maven Project" when you
right-click, unless that has changed with newer versions of the maven
plugin.

   michael


From andreas at sdsc.edu  Mon Apr 19 00:17:17 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 18 Apr 2010 21:17:17 -0700
Subject: [Biojava-dev] Eclipse + maven woes...
In-Reply-To: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net>
References: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net>
Message-ID: <m2t59a41c431004182117xe69b55ffj6d13c37ff2c7e50a@mail.gmail.com>

Hi Sylvain,

The place to start the checkout in eclipse is the SVN repository browser.
There you can do a right-click on the biojava/trunk folder and check out as
a Maven project.

Andreas

On Sat, Apr 17, 2010 at 7:00 AM, Sylvain Foisy
<sylvain.foisy at diploide.net>wrote:

> Hi,
>
> Again, I feel stupid asking these newbie questions... I finally got my and
> on a new MacBook Pro and re-installing the apps to get stuff moving. As
> usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a
> checkout of the developer's tree.
>
> I have installed the latest Subversion and Maven plugins. When I want to
> create a new project, I try the following:
>
> 1) I right click to select "New > Other..." in the Navigator panel;
>
> 2) I select "SVN > Project from SVN", which leads me to a window where the
>  location of the developer's tree is in svn+ssh; in the window that comes
> up next, I use this URL to get the "Finish" button activated:
>
>  svn+ssh://
> dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk
>
> 3) After that, I choose the "Check out as a project configured using the
> New
> Project Wizard", which pop the window where I select "Maven > Maven
> Project".
>
> 4) I get a "New Maven Project" window where I select the default. The
> window
> then changes to a "Select archetype" where I also use the default
> selections.
>
> 5) This is where I can't seem to be moving forward... The window that pops
> out ask me for an Artefact ID. I am clueless about what to put... The
> process stops there :-(
>
> Maven is probably a cool tool but its learning curve is pretty steep...
> Shouldn't all this be automatic after "Maven > Maven Project"
>
> Thanks in advance. I'll put the solution into the wiki ;-)
>
> Sylvain
>
> ===================================================================
>
>  Sylvain Foisy, Ph. D.
>  Consultant Bio-informatique / Bioinformatics
>  Diploide.net - TI pour la vie / IT for Life
>
>  Courriel: sylvain.foisy at diploide.net
>  Web: http://www.diploide.net
>
> ===================================================================
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------

From sylvain.foisy at diploide.net  Mon Apr 19 16:47:43 2010
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Mon, 19 Apr 2010 16:47:43 -0400
Subject: [Biojava-dev] Eclipse + maven woes...
In-Reply-To: <m2t59a41c431004182117xe69b55ffj6d13c37ff2c7e50a@mail.gmail.com>
Message-ID: <C7F239AF.16DB7%sylvain.foisy@diploide.net>

Hi Andreas,

I finally got something working but it wasn't automatic... Switching to the
SVN Repositories perspective, I right-clicked on trunk and selected
"Checkout..." After d/l the code, I had to right-click the biojava-live that
was now found in the Java Browsing perspective, select the "m2 Maven >
Enable Dependancy Management" to have it working. If I tried the "Check out
as..." option, I would have a window popping out with "Check out Maven
projects with SCM" pre-selected and I would be stuck in the Group
ID/Artefact ID mayhem.

Thanks for the time. Back to coding ;-)

Sylvain

On 19/04/10 00:17, "[NAME]" <[ADDRESS]> wrote:

> Hi Sylvain,
> 
> The place to start the checkout in eclipse is the SVN repository browser.?
> There you can do a right-click on the biojava/trunk folder and check out as a
> Maven project. 
> 
> Andreas
> 
> On Sat, Apr 17, 2010 at 7:00 AM, Sylvain Foisy <sylvain.foisy at diploide.net>
> wrote:
>> Hi,
>> 
>> Again, I feel stupid asking these newbie questions... I finally got my and
>> on a new MacBook Pro and re-installing the apps to get stuff moving. As
>> usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a
>> checkout of the developer's tree.
>> 
>> I have installed the latest Subversion and Maven plugins. When I want to
>> create a new project, I try the following:
>> 
>> 1) I right click to select "New > Other..." in the Navigator panel;
>> 
>> 2) I select "SVN > Project from SVN", which leads me to a window where the
>> ?location of the developer's tree is in svn+ssh; in the window that comes
>> up next, I use this URL to get the "Finish" button activated:
>> 
>> ?svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk
>> <http://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk>
>> 
>> 3) After that, I choose the "Check out as a project configured using the New
>> Project Wizard", which pop the window where I select "Maven > Maven
>> Project".
>> 
>> 4) I get a "New Maven Project" window where I select the default. The window
>> then changes to a "Select archetype" where I also use the default
>> selections.
>> 
>> 5) This is where I can't seem to be moving forward... The window that pops
>> out ask me for an Artefact ID. I am clueless about what to put... The
>> process stops there :-(
>> 
>> Maven is probably a cool tool but its learning curve is pretty steep...
>> Shouldn't all this be automatic after "Maven > Maven Project"
>> 
>> Thanks in advance. I'll put the solution into the wiki ;-)
>> 
>> Sylvain
>> 
>> ===================================================================
>> 
>> ?Sylvain Foisy, Ph. D.
>> ?Consultant Bio-informatique / Bioinformatics
>> ?Diploide.net - TI pour la vie / IT for Life
>> 
>> ?Courriel: sylvain.foisy at diploide.net
>> ?Web: http://www.diploide.net
>> 
>> ===================================================================
>> 
>> 
>> 
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> 


From andreas at sdsc.edu  Tue Apr 27 01:33:51 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 26 Apr 2010 22:33:51 -0700
Subject: [Biojava-dev] accepted GSoC projects
Message-ID: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>

Dear all,

Google has released the results for GSoC: Congratulations to Mark Chapman
and Jianjiong Gao for having been accepted to work on the MSA and PTM
projects for BioJava! Let's start the "community bonding" process (
http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we all are
looking forward to work with you on this during the summer. The Mentors and
co-mentors will be Peter Rose for the PTM and Scooter Willis and Kyle
Ellrott for the MSA project (and me).

I want to thank all of of you who submitted proposals or showed interest in
other ways for the Google Summer of Code. We hope you are not too
disappointed if your application did not get accepted this time. We had a
large number (52) applications and the the overall quality of the
submissions was very high. We would like to stay in touch with you and we
hope that you are interested in BioJava also beyond the scope of GSoC. There
are a number of different ways how to contribute:  We are always looking for
people who provide code and patches to further improve our library, help out
with the documentation on the Wiki page, or answer questions on the mailing
lists.

Let's all give Mark and Jianjiong  a warm welcome to the BioJava community.
For those of you who are interested in following the progress of the
projects, as usually, the development related discussions are going to be on
the biojava-dev list.

Happy coding!

Andreas

From jianjiong.gao at gmail.com  Tue Apr 27 15:13:12 2010
From: jianjiong.gao at gmail.com (Jianjiong Gao)
Date: Tue, 27 Apr 2010 14:13:12 -0500
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
Message-ID: <h2kc82264f51004271213u1ea78e1bq29184a65b6315cbe@mail.gmail.com>

Dear Dr. Prlic and Everyone,

Thanks for the warm welcome. I am so glad that I have the chance to
work with the BioJava community this summer. I would like to briefly
introduce myself. My name is Jianjiong (JJ) Gao. I am a PhD student in
Computer Science at University of Missouri, Columbia. My study is
focusing on Bioinformatics, specifically computational proteomics and
PTMs.

I came across BioJava about two years ago when I was working on a
plugin for Cytoscape, and was attracted by the idea of providing
generic Java API for bioinformatics applications. I was thinking maybe
someday I could do some coding for BioJava. And now I got the chance
:)

Best Regards,
-JJ

On Tue, Apr 27, 2010 at 12:33 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> Dear all,
>
> Google has released the results for GSoC: Congratulations to Mark Chapman
> and Jianjiong Gao for having been accepted to work on the MSA and PTM
> projects for BioJava! Let's start the "community bonding" process (
> http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) ?and we all are
> looking forward to work with you on this during the summer. The Mentors and
> co-mentors will be Peter Rose for the PTM and Scooter Willis and Kyle
> Ellrott for the MSA project (and me).
>
> I want to thank all of of you who submitted proposals or showed interest in
> other ways for the Google Summer of Code. We hope you are not too
> disappointed if your application did not get accepted this time. We had a
> large number (52) applications and the the overall quality of the
> submissions was very high. We would like to stay in touch with you and we
> hope that you are interested in BioJava also beyond the scope of GSoC. There
> are a number of different ways how to contribute: ?We are always looking for
> people who provide code and patches to further improve our library, help out
> with the documentation on the Wiki page, or answer questions on the mailing
> lists.
>
> Let's all give Mark and Jianjiong ?a warm welcome to the BioJava community.
> For those of you who are interested in following the progress of the
> projects, as usually, the development related discussions are going to be on
> the biojava-dev list.
>
> Happy coding!
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From chapman at cs.wisc.edu  Wed Apr 28 00:18:25 2010
From: chapman at cs.wisc.edu (Mark Chapman)
Date: Tue, 27 Apr 2010 23:18:25 -0500
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
Message-ID: <4BD7B711.9090108@cs.wisc.edu>

Hi all,

Thank you to Google, Open Bioinformatics Foundation, BioJava, and my mentors for 
this opportunity.  As a short introduction, I am Mark Chapman, a graduate 
student in Computer Sciences at the University of Wisconsin - Madison.  My focus 
is in artificial intelligence and bioinformatics.  This summer, I will add a 
Multiple Sequence Alignment module to BioJava.

My first task will be to update the alignment module to BioJava3 and to design 
the interface for MSA.  My second goal is to implement a progressive MSA styled 
after clustalw.  After that, I will add alternative routines for each step.

Any ideas for the MSA project as well as more sources of programming wisdom are 
quite welcome.  For example, Andreas suggested a series about Java parallelism 
and lazy execution 
(http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). 
  I also noted a useful tip for iterative development 
(http://en.flossmanuals.net/GSoCMentoring/Workflow).

Thanks again,
Mark


On 4/27/2010 12:33 AM, Andreas Prlic wrote:
> Dear all,
>
> Google has released the results for GSoC: Congratulations to Mark
> Chapman and Jianjiong Gao for having been accepted to work on the MSA
> and PTM projects for BioJava! Let's start the "community bonding"
> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
> all are looking forward to work with you on this during the summer. The
> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
> and Kyle Ellrott for the MSA project (and me).
>
> I want to thank all of of you who submitted proposals or showed interest
> in other ways for the Google Summer of Code. We hope you are not too
> disappointed if your application did not get accepted this time. We had
> a  large number (52) applications and the the overall quality of the
> submissions was very high. We would like to stay in touch with you and
> we hope that you are interested in BioJava also beyond the scope of
> GSoC. There are a number of different ways how to contribute:  We are
> always looking for people who provide code and patches to further
> improve our library, help out with the documentation on the Wiki page,
> or answer questions on the mailing lists.
>
> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
> community.  For those of you who are interested in following the
> progress of the projects, as usually, the development related
> discussions are going to be on the biojava-dev list.
>
> Happy coding!
>
> Andreas
>
>

From andreas at sdsc.edu  Wed Apr 28 13:31:58 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 28 Apr 2010 10:31:58 -0700
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <4BD7B711.9090108@cs.wisc.edu>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
Message-ID: <w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>

> Any ideas for the MSA project as well as more sources of programming wisdom
> are quite welcome.  For example, Andreas suggested a series about Java
> parallelism and lazy execution (
> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/).
>


credits for the links go to Scooter, who recommended those ;-)  My general
recommendation is to read Joshua Bloch's "Effective Java".
http://java.sun.com/docs/books/effective/ It is a collection of  rules that
should help in avoiding some frequently made mistakes...

Andreas


>  I also noted a useful tip for iterative development (
> http://en.flossmanuals.net/GSoCMentoring/Workflow).
>
> Thanks again,
> Mark
>
>
>
> On 4/27/2010 12:33 AM, Andreas Prlic wrote:
>
>> Dear all,
>>
>> Google has released the results for GSoC: Congratulations to Mark
>> Chapman and Jianjiong Gao for having been accepted to work on the MSA
>> and PTM projects for BioJava! Let's start the "community bonding"
>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
>> all are looking forward to work with you on this during the summer. The
>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
>> and Kyle Ellrott for the MSA project (and me).
>>
>> I want to thank all of of you who submitted proposals or showed interest
>> in other ways for the Google Summer of Code. We hope you are not too
>> disappointed if your application did not get accepted this time. We had
>> a  large number (52) applications and the the overall quality of the
>> submissions was very high. We would like to stay in touch with you and
>> we hope that you are interested in BioJava also beyond the scope of
>> GSoC. There are a number of different ways how to contribute:  We are
>> always looking for people who provide code and patches to further
>> improve our library, help out with the documentation on the Wiki page,
>> or answer questions on the mailing lists.
>>
>> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
>> community.  For those of you who are interested in following the
>> progress of the projects, as usually, the development related
>> discussions are going to be on the biojava-dev list.
>>
>> Happy coding!
>>
>> Andreas
>>
>>
>>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------

From HWillis at scripps.edu  Wed Apr 28 13:57:14 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 28 Apr 2010 13:57:14 -0400
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
	<w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>
Message-ID: <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu>

Andreas

Those links were sent to me by Mark Southern who sits a couple doors down and a past BioJava contributor for the sequence viewer. We should avoid bringing in any external parallel frameworks but at minimum give ourselves enough abstraction with a backend multi-threaded job-processing approach to take advantage of a multi-processor box and a cluster via Terracotta.  If the abstraction of the jobs and the mapping of resources is generic enough then that allows different implementations in various cluster environments for those who have found the next best thing in parallel computing!

Scooter

On Apr 28, 2010, at 1:31 PM, Andreas Prlic wrote:

>> Any ideas for the MSA project as well as more sources of programming wisdom
>> are quite welcome.  For example, Andreas suggested a series about Java
>> parallelism and lazy execution (
>> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/).
>> 
> 
> 
> credits for the links go to Scooter, who recommended those ;-)  My general
> recommendation is to read Joshua Bloch's "Effective Java".
> http://java.sun.com/docs/books/effective/ It is a collection of  rules that
> should help in avoiding some frequently made mistakes...
> 
> Andreas
> 
> 
> 
> 
> 
> 
>> I also noted a useful tip for iterative development (
>> http://en.flossmanuals.net/GSoCMentoring/Workflow).
>> 
>> Thanks again,
>> Mark
>> 
>> 
>> 
>> On 4/27/2010 12:33 AM, Andreas Prlic wrote:
>> 
>>> Dear all,
>>> 
>>> Google has released the results for GSoC: Congratulations to Mark
>>> Chapman and Jianjiong Gao for having been accepted to work on the MSA
>>> and PTM projects for BioJava! Let's start the "community bonding"
>>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
>>> all are looking forward to work with you on this during the summer. The
>>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
>>> and Kyle Ellrott for the MSA project (and me).
>>> 
>>> I want to thank all of of you who submitted proposals or showed interest
>>> in other ways for the Google Summer of Code. We hope you are not too
>>> disappointed if your application did not get accepted this time. We had
>>> a  large number (52) applications and the the overall quality of the
>>> submissions was very high. We would like to stay in touch with you and
>>> we hope that you are interested in BioJava also beyond the scope of
>>> GSoC. There are a number of different ways how to contribute:  We are
>>> always looking for people who provide code and patches to further
>>> improve our library, help out with the documentation on the Wiki page,
>>> or answer questions on the mailing lists.
>>> 
>>> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
>>> community.  For those of you who are interested in following the
>>> progress of the projects, as usually, the development related
>>> discussions are going to be on the biojava-dev list.
>>> 
>>> Happy coding!
>>> 
>>> Andreas
>>> 
>>> 
>>> 
> 
> 
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From quantum7 at gmail.com  Wed Apr 28 15:06:40 2010
From: quantum7 at gmail.com (Spencer Bliven)
Date: Wed, 28 Apr 2010 12:06:40 -0700
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <4BD7B711.9090108@cs.wisc.edu>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
Message-ID: <p2kdf1d36d1004281206r9e86ab01p8cd2b406667aa0ba@mail.gmail.com>

Mark-

Welcome to the Biojava community! Adding multiple sequence alignments will
be a nice feature for the library.

One suggestion I have is to make any data structures for multiple alignments
you create as general as possible, and to think about whether the special
cases can still be represented. For instance, can you store an alignment
where some of the sequence is unknown (eg {ABCD, ABXD})? Can you store an
alignment where only a subset of the sequences are defined? I recently had
to represent an alignment like this:
ABCD EFGH
EFGH ABCD
This sort of alignment can't be written using just gaps; I had to make a new
structure to store pairs {(A,A), (B,B), ...} and rewrite much of the
existing alignment functionality based on that.

Anyway, I don't mean to get bogged down in specific examples or exceptions.
I just wanted to point out that there are a lot of methods which can be used
to define some sort of alignment between a set of sequences, and it would be
nice if the BioJava alignment package was general enough to accommodate such
methods in the future without reinventing the wheel.

Cheers!
Spencer

P.S. I ran into such weird alignments while working on structural
alignments, which are not well behaved like traditional multiple sequence
alignments. Andreas knows all about both types of alignment, and can
probably judge better than I how much generality is worth spending your time
on.


On Tue, Apr 27, 2010 at 9:18 PM, Mark Chapman <chapman at cs.wisc.edu> wrote:

> Hi all,
>
> Thank you to Google, Open Bioinformatics Foundation, BioJava, and my
> mentors for this opportunity.  As a short introduction, I am Mark Chapman, a
> graduate student in Computer Sciences at the University of Wisconsin -
> Madison.  My focus is in artificial intelligence and bioinformatics.  This
> summer, I will add a Multiple Sequence Alignment module to BioJava.
>
> My first task will be to update the alignment module to BioJava3 and to
> design the interface for MSA.  My second goal is to implement a progressive
> MSA styled after clustalw.  After that, I will add alternative routines for
> each step.
>
> Any ideas for the MSA project as well as more sources of programming wisdom
> are quite welcome.  For example, Andreas suggested a series about Java
> parallelism and lazy execution (
> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/).
>  I also noted a useful tip for iterative development (
> http://en.flossmanuals.net/GSoCMentoring/Workflow).
>
> Thanks again,
> Mark
>
>
>
> On 4/27/2010 12:33 AM, Andreas Prlic wrote:
>
>> Dear all,
>>
>> Google has released the results for GSoC: Congratulations to Mark
>> Chapman and Jianjiong Gao for having been accepted to work on the MSA
>> and PTM projects for BioJava! Let's start the "community bonding"
>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
>> all are looking forward to work with you on this during the summer. The
>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
>> and Kyle Ellrott for the MSA project (and me).
>>
>> I want to thank all of of you who submitted proposals or showed interest
>> in other ways for the Google Summer of Code. We hope you are not too
>> disappointed if your application did not get accepted this time. We had
>> a  large number (52) applications and the the overall quality of the
>> submissions was very high. We would like to stay in touch with you and
>> we hope that you are interested in BioJava also beyond the scope of
>> GSoC. There are a number of different ways how to contribute:  We are
>> always looking for people who provide code and patches to further
>> improve our library, help out with the documentation on the Wiki page,
>> or answer questions on the mailing lists.
>>
>> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
>> community.  For those of you who are interested in following the
>> progress of the projects, as usually, the development related
>> discussions are going to be on the biojava-dev list.
>>
>> Happy coding!
>>
>> Andreas
>>
>>
>>  _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From chapman at cs.wisc.edu  Wed Apr 28 21:09:07 2010
From: chapman at cs.wisc.edu (Mark Chapman)
Date: Wed, 28 Apr 2010 20:09:07 -0500
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
	<w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>
	<6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu>
Message-ID: <4BD8DC33.7010607@cs.wisc.edu>

Here is a summary of the concurrency lessons I learned that are useful with or 
without the functional programming paradigm --

1: implement Callable<T> to submit tasks for concurrent/parallel/lazy execution
  - call() methods just wrap a call to the computation intensive method
2: share a fixed size thread pool with task queue to avoid
  - overhead of thread creation/destruction,
  - too many simultaneous threads, and
  - most blocking issues
3: place thread blocking Future<T>.get() calls within tasks later in the queue
  - while(!Future<T>.isDone()) Thread.yield(); may also help keep the pool active
4: execution in a task queue also enables easier logging and progress listening

There are two obvious places concurrent execution will fit in the MSA module --

1: building the distance matrix
  - queue pairwise alignment/scoring tasks in loop over all sequence pairs
2: progressive alignment
  - queue profile-profile alignment tasks in postfix traversal of guide tree 
(from leaves to root)

All our library copies of "Effective Java" are checked out, so I ordered a copy 
for my personal library.  The sample chapter on generics sold me.

Mark


On 4/28/2010 12:57 PM, Scooter Willis wrote:
> Andreas
>
> Those links were sent to me by Mark Southern who sits a couple doors down and a past BioJava contributor for the sequence viewer. We should avoid bringing in any external parallel frameworks but at minimum give ourselves enough abstraction with a backend multi-threaded job-processing approach to take advantage of a multi-processor box and a cluster via Terracotta.  If the abstraction of the jobs and the mapping of resources is generic enough then that allows different implementations in various cluster environments for those who have found the next best thing in parallel computing!
>
> Scooter
>
> On Apr 28, 2010, at 1:31 PM, Andreas Prlic wrote:
>
>>> Any ideas for the MSA project as well as more sources of programming wisdom
>>> are quite welcome.  For example, Andreas suggested a series about Java
>>> parallelism and lazy execution (
>>> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/).
>>>
>>
>>
>> credits for the links go to Scooter, who recommended those ;-)  My general
>> recommendation is to read Joshua Bloch's "Effective Java".
>> http://java.sun.com/docs/books/effective/ It is a collection of  rules that
>> should help in avoiding some frequently made mistakes...
>>
>> Andreas
>>
>>
>>
>>
>>
>>
>>> I also noted a useful tip for iterative development (
>>> http://en.flossmanuals.net/GSoCMentoring/Workflow).
>>>
>>> Thanks again,
>>> Mark
>>>
>>>
>>>
>>> On 4/27/2010 12:33 AM, Andreas Prlic wrote:
>>>
>>>> Dear all,
>>>>
>>>> Google has released the results for GSoC: Congratulations to Mark
>>>> Chapman and Jianjiong Gao for having been accepted to work on the MSA
>>>> and PTM projects for BioJava! Let's start the "community bonding"
>>>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
>>>> all are looking forward to work with you on this during the summer. The
>>>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
>>>> and Kyle Ellrott for the MSA project (and me).
>>>>
>>>> I want to thank all of of you who submitted proposals or showed interest
>>>> in other ways for the Google Summer of Code. We hope you are not too
>>>> disappointed if your application did not get accepted this time. We had
>>>> a  large number (52) applications and the the overall quality of the
>>>> submissions was very high. We would like to stay in touch with you and
>>>> we hope that you are interested in BioJava also beyond the scope of
>>>> GSoC. There are a number of different ways how to contribute:  We are
>>>> always looking for people who provide code and patches to further
>>>> improve our library, help out with the documentation on the Wiki page,
>>>> or answer questions on the mailing lists.
>>>>
>>>> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
>>>> community.  For those of you who are interested in following the
>>>> progress of the projects, as usually, the development related
>>>> discussions are going to be on the biojava-dev list.
>>>>
>>>> Happy coding!
>>>>
>>>> Andreas
>>>>
>>>>
>>>>
>>
>>
>> --
>> -----------------------------------------------------------------------
>> Dr. Andreas Prlic
>> Senior Scientist, RCSB PDB Protein Data Bank
>> University of California, San Diego
>> (+1) 858.246.0526
>> -----------------------------------------------------------------------
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From andreas at sdsc.edu  Fri Apr 30 11:29:03 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 30 Apr 2010 08:29:03 -0700
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <4BD8DC33.7010607@cs.wisc.edu>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
	<w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>
	<6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu>
	<4BD8DC33.7010607@cs.wisc.edu>
Message-ID: <n2j59a41c431004300829n5155bcd0i4fbacb8e219a43ce@mail.gmail.com>

Hi Mark and Jianjiong,

In the meanwhile you should have received your login info for the develoment
SVN server. I suggest the following things as next steps:


*) If you have not done so already, sign up to the biojava-l and biojava-dev
mailing lists

*) Get a biojava checkout from the developmental SVN server.

*) add the LGPL license javadoc header
http://www.biojava.org/wiki/BioJava3_license to the templates in your IDE.

*) Take a look at the JUnit tests and add a new test for something that is
related for your projects

*) Take a look at the Wiki pages (e.g.
http://www.biojava.org/wiki/BioJava:CookBook ), get an account on the wiki
and improve one of the documentation pages

*) take a look at the javadocs at http://www.biojava.org/docs/api/index.html

Andreas

From andreas at sdsc.edu  Fri Apr 30 11:44:25 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 30 Apr 2010 08:44:25 -0700
Subject: [Biojava-dev] biojava SVN
Message-ID: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>

Hi,

The BioJava SVN has not been fully compiling ever since the Hackathon. I
guess things were quite in flux the last months and it is now time to make
sure SVN fully compiles again.  There is a few things we need to figure out
in order for that:

* Jar files for libraries that are not in a public Maven repository. Jules :
at some point you indicated that we might be able to get such jar files
hosted by the EBI Maven repository. Do you think that is still an
possibility and could you get a few libraries into that? In particular that
would be Jmol, Astex, and probably one or two other Jar files. That would
make the BioJava checkout process much smoother and not require a developer
to manually install jars for full functionality.

* We have a couple of modules that are fragmented and broken. This is due to
historic leftovers from when we started the re-factoring process. If all the
functionality has been moved into the new biojava3-core module, I would vote
for removing the modules starting with sequence*

Andreas


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------

From ayates at ebi.ac.uk  Fri Apr 30 11:48:01 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Fri, 30 Apr 2010 16:48:01 +0100
Subject: [Biojava-dev] biojava SVN
In-Reply-To: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
References: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
Message-ID: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>

Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on?

Andy

On 30 Apr 2010, at 16:44, Andreas Prlic wrote:

> Hi,
> 
> The BioJava SVN has not been fully compiling ever since the Hackathon. I
> guess things were quite in flux the last months and it is now time to make
> sure SVN fully compiles again.  There is a few things we need to figure out
> in order for that:
> 
> * Jar files for libraries that are not in a public Maven repository. Jules :
> at some point you indicated that we might be able to get such jar files
> hosted by the EBI Maven repository. Do you think that is still an
> possibility and could you get a few libraries into that? In particular that
> would be Jmol, Astex, and probably one or two other Jar files. That would
> make the BioJava checkout process much smoother and not require a developer
> to manually install jars for full functionality.
> 
> * We have a couple of modules that are fragmented and broken. This is due to
> historic leftovers from when we started the re-factoring process. If all the
> functionality has been moved into the new biojava3-core module, I would vote
> for removing the modules starting with sequence*
> 
> Andreas
> 
> 
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From ayates at ebi.ac.uk  Fri Apr 30 11:57:12 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Fri, 30 Apr 2010 16:57:12 +0100
Subject: [Biojava-dev] biojava SVN
In-Reply-To: <CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>
References: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
	<475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>
	<CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>
Message-ID: <3C3AAC8F-5C03-44C1-B121-7808C0612A65@ebi.ac.uk>

As far as I remember you 'can' have one setup manually. I think I offered one hand-developed from one of my projects. Infact FYI:

http://code.google.com/p/dbcon/source/browse/#svn/trunk/maven-repo

It just requires the correct structure in place & it works. I went for it being hosted in SVN because there's a HTTP interface to it offered by Google. The EBI Maven repo is just a public HTTP directory. It's been some years since I did a deployment there but it's not hard to do & we should be able to do it locally & sync it to SVN 

Andy

On 30 Apr 2010, at 16:50, Richard Holland wrote:

> Could a small MVN repo be set up at OBF?
> 
> On 30 Apr 2010, at 16:48, Andy Yates wrote:
> 
>> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on?
>> 
>> Andy
>> 
>> On 30 Apr 2010, at 16:44, Andreas Prlic wrote:
>> 
>>> Hi,
>>> 
>>> The BioJava SVN has not been fully compiling ever since the Hackathon. I
>>> guess things were quite in flux the last months and it is now time to make
>>> sure SVN fully compiles again.  There is a few things we need to figure out
>>> in order for that:
>>> 
>>> * Jar files for libraries that are not in a public Maven repository. Jules :
>>> at some point you indicated that we might be able to get such jar files
>>> hosted by the EBI Maven repository. Do you think that is still an
>>> possibility and could you get a few libraries into that? In particular that
>>> would be Jmol, Astex, and probably one or two other Jar files. That would
>>> make the BioJava checkout process much smoother and not require a developer
>>> to manually install jars for full functionality.
>>> 
>>> * We have a couple of modules that are fragmented and broken. This is due to
>>> historic leftovers from when we started the re-factoring process. If all the
>>> functionality has been moved into the new biojava3-core module, I would vote
>>> for removing the modules starting with sequence*
>>> 
>>> Andreas
>>> 
>>> 
>>> -- 
>>> -----------------------------------------------------------------------
>>> Dr. Andreas Prlic
>>> Senior Scientist, RCSB PDB Protein Data Bank
>>> University of California, San Diego
>>> (+1) 858.246.0526
>>> -----------------------------------------------------------------------
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> 
>> -- 
>> Andrew Yates                   Ensembl Genomes Engineer
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From andreas at sdsc.edu  Fri Apr 30 12:27:09 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 30 Apr 2010 09:27:09 -0700
Subject: [Biojava-dev] biojava SVN
In-Reply-To: <CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>
References: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
	<475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>
	<CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>
Message-ID: <h2t59a41c431004300927gd9e48b6ag8415a16df49ba9f9@mail.gmail.com>

> Could a small MVN repo be set up at OBF?

I am pretty sure we could do that. Anybody volunteering? I can help with
getting the necessary permissions... Anybody knows some good docu for how to
set this up?

Andreas


On Fri, Apr 30, 2010 at 8:50 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Could a small MVN repo be set up at OBF?
>
> On 30 Apr 2010, at 16:48, Andy Yates wrote:
>
> > Does anyone know how hard it would be to get these into the public maven
> repository? The EBI repo is all well & good but updating it relies on
> BioJava always having a committer at the EBI. Now I know that is a very
> likely statement but is it something we can rely on?
> >
> > Andy
> >
> > On 30 Apr 2010, at 16:44, Andreas Prlic wrote:
> >
> >> Hi,
> >>
> >> The BioJava SVN has not been fully compiling ever since the Hackathon. I
> >> guess things were quite in flux the last months and it is now time to
> make
> >> sure SVN fully compiles again.  There is a few things we need to figure
> out
> >> in order for that:
> >>
> >> * Jar files for libraries that are not in a public Maven repository.
> Jules :
> >> at some point you indicated that we might be able to get such jar files
> >> hosted by the EBI Maven repository. Do you think that is still an
> >> possibility and could you get a few libraries into that? In particular
> that
> >> would be Jmol, Astex, and probably one or two other Jar files. That
> would
> >> make the BioJava checkout process much smoother and not require a
> developer
> >> to manually install jars for full functionality.
> >>
> >> * We have a couple of modules that are fragmented and broken. This is
> due to
> >> historic leftovers from when we started the re-factoring process. If all
> the
> >> functionality has been moved into the new biojava3-core module, I would
> vote
> >> for removing the modules starting with sequence*
> >>
> >> Andreas
> >>
> >>
> >> --
> >> -----------------------------------------------------------------------
> >> Dr. Andreas Prlic
> >> Senior Scientist, RCSB PDB Protein Data Bank
> >> University of California, San Diego
> >> (+1) 858.246.0526
> >> -----------------------------------------------------------------------
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> > --
> > Andrew Yates                   Ensembl Genomes Engineer
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >
> >
> >
> >
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------

From holland at eaglegenomics.com  Fri Apr 30 11:50:52 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 30 Apr 2010 16:50:52 +0100
Subject: [Biojava-dev] biojava SVN
In-Reply-To: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>
References: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
	<475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>
Message-ID: <CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>

Could a small MVN repo be set up at OBF?

On 30 Apr 2010, at 16:48, Andy Yates wrote:

> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on?
> 
> Andy
> 
> On 30 Apr 2010, at 16:44, Andreas Prlic wrote:
> 
>> Hi,
>> 
>> The BioJava SVN has not been fully compiling ever since the Hackathon. I
>> guess things were quite in flux the last months and it is now time to make
>> sure SVN fully compiles again.  There is a few things we need to figure out
>> in order for that:
>> 
>> * Jar files for libraries that are not in a public Maven repository. Jules :
>> at some point you indicated that we might be able to get such jar files
>> hosted by the EBI Maven repository. Do you think that is still an
>> possibility and could you get a few libraries into that? In particular that
>> would be Jmol, Astex, and probably one or two other Jar files. That would
>> make the BioJava checkout process much smoother and not require a developer
>> to manually install jars for full functionality.
>> 
>> * We have a couple of modules that are fragmented and broken. This is due to
>> historic leftovers from when we started the re-factoring process. If all the
>> functionality has been moved into the new biojava3-core module, I would vote
>> for removing the modules starting with sequence*
>> 
>> Andreas
>> 
>> 
>> -- 
>> -----------------------------------------------------------------------
>> Dr. Andreas Prlic
>> Senior Scientist, RCSB PDB Protein Data Bank
>> University of California, San Diego
>> (+1) 858.246.0526
>> -----------------------------------------------------------------------
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From chapmanb at 50mail.com  Fri Apr  2 13:07:06 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 2 Apr 2010 09:07:06 -0400
Subject: [Biojava-dev] BOSC and OpenBio solution challenge reminder -- April
	15th
Message-ID: <20100402130706.GJ36623@sobchak.mgh.harvard.edu>

Hello all;
A friendly reminder that the deadline for the Bioinformatics Open
Source Conference (BOSC) is coming up on April 15th:

http://www.open-bio.org/wiki/BOSC_2010

This is a great opportunity to discuss code and biology with fellow
developers.

One session which I'd like to emphasize is the OpenBio Solution
Challenge, a section of talks that describes how to solve practical
problems in bioinformatics using a variety of approaches:

http://www.open-bio.org/wiki/SolutionChallenge

Any toolkit developers who are interested in giving a talk are
encouraged to submit an abstract for the challenge. We have some initial
project ideas on the page and welcome your feedback for other useful
workflows that would emphasize the advantages of using open source
toolkits to solve biological problems. Please copy messages to the
OpenBio mailing list as a central point for discussion and
questions:

http://lists.open-bio.org/mailman/listinfo/open-bio-l

Looking forward to seeing everyone in July,
Brad

BOSC contact and dates:
Date: July 9-10, 2010
Location: Boston, Massachusetts, USA
BOSC 2010 web site: http://www.open-bio.org/wiki/BOSC_2010
Abstract submission via Open Conference System site:  http://events.open-bio.org/BOSC2010/openconf.php
E-mail: bosc at open-bio.org
Bosc-announce list:  http://lists.open-bio.org/mailman/listinfo/bosc-announce

Important Dates
April 15: Abstract deadline
May 5:  Notification of accepted abstracts
May 28: Early Registration Discount Cut-off date
July 8-9:  Codefest 2010
July 9-10: BOSC 2010
August 15:  Manuscript deadline for BOSC 2010 Proceedings published in BMC Bioinformatics


From andreas at sdsc.edu  Fri Apr  2 17:25:39 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 2 Apr 2010 10:25:39 -0700
Subject: [Biojava-dev] BOSC
Message-ID: <h2n59a41c431004021025sf638add5v78097bcf0e8ade40@mail.gmail.com>

Hi,

who is going to BOSC this year and who wants to present a BioJava talk?

Andreas

-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------


From holland at eaglegenomics.com  Fri Apr  2 19:37:06 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 2 Apr 2010 20:37:06 +0100
Subject: [Biojava-dev] BOSC
In-Reply-To: <h2n59a41c431004021025sf638add5v78097bcf0e8ade40@mail.gmail.com>
References: <h2n59a41c431004021025sf638add5v78097bcf0e8ade40@mail.gmail.com>
Message-ID: <D301D60F-0C2B-4EDB-90DD-4E700CC0FBA3@eaglegenomics.com>

I will be there but for various reasons I can't talk this year.

On 2 Apr 2010, at 18:25, Andreas Prlic wrote:

> Hi,
> 
> who is going to BOSC this year and who wants to present a BioJava talk?
> 
> Andreas
> 
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From heuermh at acm.org  Sat Apr  3 03:23:15 2010
From: heuermh at acm.org (Michael Heuer)
Date: Fri, 2 Apr 2010 23:23:15 -0400 (EDT)
Subject: [Biojava-dev] BOSC
In-Reply-To: <h2n59a41c431004021025sf638add5v78097bcf0e8ade40@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.1004022320310.10023-100000@shell3.shore.net>

Andreas Prlic wrote:

> who is going to BOSC this year and who wants to present a BioJava talk?

I will be there.  I'm probably not the best person to present this time
around.

   michael


From aradwen at gmail.com  Sat Apr  3 10:18:40 2010
From: aradwen at gmail.com (Radwen Aniba)
Date: Sat, 3 Apr 2010 12:18:40 +0200
Subject: [Biojava-dev] Protein sequence composition
Message-ID: <s2me591b1bd1004030318u7584bb09j405d1b224266df4c@mail.gmail.com>

Hello,

I'm writing an application that treats protein sequences, and I am using
Biojava for a couple of things.
One of these processings is to parse protein multifasta files, and treat the
sequences one after the other. One of my purposes is to calculate
composition. By composition I mean that I am interested to know in a given
protein sequence what is the mean and the standard deviation composition of
these groups :

PAGST
EDNQ
LIVM
KRH
C

example :

protein fasta file :

>SEQ1

DVSFRLSGATSSSYGVFISNLRKALPNERKLYDIPLLRSSLPGSQRYALI
HLTNYADETISVAIDVTNVYIMGYRAGDTSYFFNEASATEAAKYVFKDAM
RKVTLPYSGNYERLQTAAGKIRENIPLGLPALDSAITTLFYYNANSAASA
LMVLIQSTSEAARYKFIEQQIGKRVDKTFLPSLAIISLENSWSALSKQIQ
IASTNNGQFESPVVLINAQNQRVTITNVDAGVVTSNIALLLNRNNMA

>SEQ2

IFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVG
LPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQED
AEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISA
LYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAP
DPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIP
IIALMVYRCAPPPSSQF

I would like to
1/ parse SEQ1 to calculate the composition mean of PAGST residues for
example ( number of residus/ length of the sequence)
2/ do same thing for SEQ2
3 / return the average mean of both sequences
4/ Return standard deviation of these values.


I can do it writing a standard java code, but I would like to know (as I am
using biojava already) if this is possible or not ( Which class / instances
to use)

Cheers


From chapman at cs.wisc.edu  Sat Apr  3 13:08:23 2010
From: chapman at cs.wisc.edu (Mark Chapman)
Date: Sat, 03 Apr 2010 08:08:23 -0500
Subject: [Biojava-dev] Protein sequence composition
In-Reply-To: <s2me591b1bd1004030318u7584bb09j405d1b224266df4c@mail.gmail.com>
References: <s2me591b1bd1004030318u7584bb09j405d1b224266df4c@mail.gmail.com>
Message-ID: <4BB73DC7.2020000@cs.wisc.edu>

Hi Radwen,

The example below solves most of what you asked for.  It may not be the most 
elegant solution, but it should get you started in the right direction.

Saving the proteins into sample.fasta and running the following command:
 > java ProteinComposition sample.fasta PAGST EDNQ LIVM KRH C

produces the output:
SEQ1	247:	0.36032388	0.19433199	0.25506073	0.097165994	0.0
SEQ2	267:	0.3445693	0.20224719	0.23595506	0.101123594	0.007490637

Take care,
Mark

-- ProteinComposition.java --

import java.io.*;
import java.util.NoSuchElementException;

import org.biojava.bio.BioException;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.db.SequenceDB;
import org.biojava.bio.seq.io.SeqIOTools;
import org.biojava.bio.symbol.*;

@SuppressWarnings("deprecation")
public class ProteinComposition {

   /**
    * Determines the composition of proteins in a Fasta file.
    * @param args <filename> <residues>...
    *   <filename>: file name of the Fasta file
    *   <residues>: group(s) of one or more amino acid residues, statistics are 
printed out for each group
    */
   public static void main(String[] args) {
     try {

       // load Fasta file into memory
       BufferedInputStream is = new BufferedInputStream(new 
FileInputStream(args[0]));
       Alphabet alpha = AlphabetManager.alphabetForName("PROTEIN");
       SequenceDB db = SeqIOTools.readFasta(is, alpha);

       // load command line arguments into memory
       SymbolList[] res = new SymbolList[args.length-1];
       for (int a = 1; a < args.length; a++)
         res[a-1] = ProteinTools.createProtein(args[a]);

       // store length and composition of each protein
       int[] lengths = new int[db.ids().size()];
       int[][] counts = new int[lengths.length][res.length];
       float[][] means = new float[lengths.length][res.length];

       // iterate over proteins in Fasta file
       SequenceIterator sI = db.sequenceIterator();
       for (int s = 0; sI.hasNext(); s++) {
         Sequence seq = sI.nextSequence();
         lengths[s] = seq.length();

         // iterate over each amino acid
         for (Object sr : seq.toList())

           // check for amino acid in each residue group
           for (int a = 1; a < args.length; a++)

             // iterate over each residue in group
             for (Object r : res[a-1].toList())

               // increment count if amino acid has a match in residue group
               if (((Symbol) r).getMatches().contains((Symbol) sr)) {
                 counts[s][a-1]++;
                 break;
               }

         // print "name length: composition" for each protein
         System.out.print(seq.getName() + "\t" + seq.length() + ":");
         for (int a = 1; a < args.length; a++)
           System.out.print("\t" + (means[s][a-1] = (float) counts[s][a-1] / 
lengths[s]));
         System.out.println();
       }

     } catch (FileNotFoundException ex) {
       System.err.println("Problem reading file...");
       ex.printStackTrace();
     } catch (BioException ex) {
       System.err.println("File not in fasta format or wrong alphabet...");
       ex.printStackTrace();
     } catch (NoSuchElementException ex) {
       System.err.println("No fasta sequences in the file...");
       ex.printStackTrace();
     }
   }

}


On 4/3/2010 5:18 AM, Radwen Aniba wrote:
> Hello,
>
> I'm writing an application that treats protein sequences, and I am using
> Biojava for a couple of things.
> One of these processings is to parse protein multifasta files, and treat the
> sequences one after the other. One of my purposes is to calculate
> composition. By composition I mean that I am interested to know in a given
> protein sequence what is the mean and the standard deviation composition of
> these groups :
>
> PAGST
> EDNQ
> LIVM
> KRH
> C
>
> example :
>
> protein fasta file :
>
>> SEQ1
>
> DVSFRLSGATSSSYGVFISNLRKALPNERKLYDIPLLRSSLPGSQRYALI
> HLTNYADETISVAIDVTNVYIMGYRAGDTSYFFNEASATEAAKYVFKDAM
> RKVTLPYSGNYERLQTAAGKIRENIPLGLPALDSAITTLFYYNANSAASA
> LMVLIQSTSEAARYKFIEQQIGKRVDKTFLPSLAIISLENSWSALSKQIQ
> IASTNNGQFESPVVLINAQNQRVTITNVDAGVVTSNIALLLNRNNMA
>
>> SEQ2
>
> IFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVG
> LPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQED
> AEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISA
> LYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAP
> DPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIP
> IIALMVYRCAPPPSSQF
>
> I would like to
> 1/ parse SEQ1 to calculate the composition mean of PAGST residues for
> example ( number of residus/ length of the sequence)
> 2/ do same thing for SEQ2
> 3 / return the average mean of both sequences
> 4/ Return standard deviation of these values.
>
>
> I can do it writing a standard java code, but I would like to know (as I am
> using biojava already) if this is possible or not ( Which class / instances
> to use)
>
> Cheers
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From andreas.prlic at gmail.com  Sat Apr  3 15:20:58 2010
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Sat, 3 Apr 2010 08:20:58 -0700
Subject: [Biojava-dev] BOSC
In-Reply-To: <Pine.GSO.4.44.1004022320310.10023-100000@shell3.shore.net>
References: <Pine.GSO.4.44.1004022320310.10023-100000@shell3.shore.net>
Message-ID: <2E000562-884F-4172-A94D-61488C605A9F@gmail.com>

I am planning to attend 3d-SIG this year and will be mentioning  
biojava there...

Andreas

On 2 Apr 2010, at 20:23, Michael Heuer <heuermh at acm.org> wrote:

> Andreas Prlic wrote:
>
>> who is going to BOSC this year and who wants to present a BioJava  
>> talk?
>
> I will be there.  I'm probably not the best person to present this  
> time
> around.
>
>   michael
>


From sheoran143 at gmail.com  Sun Apr 11 19:16:29 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Sun, 11 Apr 2010 14:16:29 -0500
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
Message-ID: <4BC2200D.8000109@gmail.com>

Hi,

Their is very fundamental issue in SimpleNCBITaxon class becuase of 
which it is producing wrong taxonomy hierarchy. I am explaing what I 
have found let me what you guys think of it, and me suggest how to fix it.

1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, 
nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to 
have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not 
true. The value which "parent_taxon_id" have is "taxon_id" which have 
parent_ncbi_taxon_id of current ncbi_taxon_id.

<property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
<property name="nodeRank" column="node_rank"/>
<property name="geneticCode" column="genetic_code"/>
<property name="mitoGeneticCode" column="mito_genetic_code"/>
<property name="leftValue" column="left_value"/>
<property name="rightValue" column="right_value"/>
<property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- 
its not correct column parent_taxon_id stores the taxon_id which have 
parent_ncbi_taxon_id for current entry

Thanks
Deepak Sheoran


From holland at eaglegenomics.com  Sun Apr 11 19:53:06 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Sun, 11 Apr 2010 20:53:06 +0100
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
In-Reply-To: <4BC2200D.8000109@gmail.com>
References: <4BC2200D.8000109@gmail.com>
Message-ID: <B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>

I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).

thanks,
Richard

On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:

> Hi,
> 
> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
> 
> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
> 
> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
> <property name="nodeRank" column="node_rank"/>
> <property name="geneticCode" column="genetic_code"/>
> <property name="mitoGeneticCode" column="mito_genetic_code"/>
> <property name="leftValue" column="left_value"/>
> <property name="rightValue" column="right_value"/>
> <property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
> 
> Thanks
> Deepak Sheoran
> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From sheoran143 at gmail.com  Sun Apr 11 21:08:22 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Sun, 11 Apr 2010 16:08:22 -0500
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
In-Reply-To: <B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
Message-ID: <4BC23A46.7090304@gmail.com>

I am using same table with biojava and bioperl taxon program and the 
output I get is below:

*Biojava:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage 
i get is
             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia 
australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum 
var. haydenii.

Biojava process of finding names: 
11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   
(wrong way of doing things)

*Bioperl:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage 
i get is
           Retroviridae; Orthoretrovirinae; Alpharetrovirus; 
unclassified  Alpharetrovirus.

Bioperl process of finding names: 
11876==>353825==>153057==>327045==>11632   (Right way of doing things)

Hint: biojava search ncbi_taxon_id column with a value from 
parent_taxon_id where bioperl search taxon_id column with a value from 
parent_taxon_id.

*Taxon and Taxon_name Table content which is being relevant  in discussion:*

taxon_id 	ncbi_taxon_id 	parent_taxon_id 	node_rank 	name 	name_class
2901 	3609 	276240 	genus 	Rhamnus 	scientific name
3610 	4403 	3609 	species 	Platanus occidentalis 	scientific name
29052 	48579 	4403 	species 	Suillus placidus 	scientific name
114412 	143975 	48579 	species 	Diadasia australis 	scientific name
143976 	176516 	143975 	species 	Arnicastrum guerrerense 	scientific name
30680 	50447 	176516 	family 	Labiduridae 	scientific name
254757 	301952 	50447 	varietas 	Oreostemma alpigenum var. haydenii 
scientific name
9394 	11632 	17394 	family 	Retroviridae 	scientific name
277861 	327045 	9394 	subfamily 	Orthoretrovirinae 	scientific name
122448 	153057 	277861 	genus 	Alpharetrovirus 	scientific name
301952 	353825 	122448 	no rank 	unclassified Alpharetrovirus 
scientific name
9584
	11876
	301952
	species
	Avian sarcoma virus
	scientifice name


Thanks
Deepak

On 4/11/2010 2:53 PM, Richard Holland wrote:
> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>
> thanks,
> Richard
>
> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>
>    
>> Hi,
>>
>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>
>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>
>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>> <property name="nodeRank" column="node_rank"/>
>> <property name="geneticCode" column="genetic_code"/>
>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>> <property name="leftValue" column="left_value"/>
>> <property name="rightValue" column="right_value"/>
>> <property name="parentNCBITaxID" column="parent_taxon_id"/>       ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>
>> Thanks
>> Deepak Sheoran
>>
>>
>>      
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>    


From sheoran143 at gmail.com  Sun Apr 11 22:48:00 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Sun, 11 Apr 2010 17:48:00 -0500
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
In-Reply-To: <4BC23A46.7090304@gmail.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
	<4BC23A46.7090304@gmail.com>
Message-ID: <4BC251A0.4090602@gmail.com>

If we don't want to change the current code in biojava and still want to 
fix this bug I have found a way,
1) we can do this by changing one of hibernate files called 
"Taxon.hbm.xml" and replace the line
<property name="parentNCBITaxID" column="parent_taxon_id"/>
     with
<property name="parentNCBITaxID" formula="(select tax.ncbi_taxon_id from 
taxon tax where tax.taxon_id = parent_taxon_id)"/>

by changing the above setting in hibernate setting I am able to get the 
correct linage for ncbi_taxon_id = 11876(Avian sarcoma virus) which is
              Viruses; Retro-transcribing viruses; Retroviridae; 
Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus.

2) But the possible issue which we might get is with Taxonomy loader 
class which want to insert something for parent taxon_id into taxon 
table which  I think won't be possible if we do this change to hibernate 
con-fig file.

Deepak Sheoran


On 4/11/2010 4:08 PM, Deepak Sheoran wrote:
> I am using same table with biojava and bioperl taxon program and the 
> output I get is below:
>
> *Biojava:*
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the 
> lineage i get is
>             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia 
> australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum 
> var. haydenii.
>
> Biojava process of finding names: 
> 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   
> (wrong way of doing things)
>
> *Bioperl:*
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the 
> lineage i get is
>           Retroviridae; Orthoretrovirinae; Alpharetrovirus; 
> unclassified  Alpharetrovirus.
>
> Bioperl process of finding names: 
> 11876==>353825==>153057==>327045==>11632   (Right way of doing things)
>
> Hint: biojava search ncbi_taxon_id column with a value from 
> parent_taxon_id where bioperl search taxon_id column with a value from 
> parent_taxon_id.
>
> *Taxon and Taxon_name Table content which is being relevant  in 
> discussion:*
>
> taxon_id 	ncbi_taxon_id 	parent_taxon_id 	node_rank 	name 	name_class
> 2901 	3609 	276240 	genus 	Rhamnus 	scientific name
> 3610 	4403 	3609 	species 	Platanus occidentalis 	scientific name
> 29052 	48579 	4403 	species 	Suillus placidus 	scientific name
> 114412 	143975 	48579 	species 	Diadasia australis 	scientific name
> 143976 	176516 	143975 	species 	Arnicastrum guerrerense 	scientific name
> 30680 	50447 	176516 	family 	Labiduridae 	scientific name
> 254757 	301952 	50447 	varietas 	Oreostemma alpigenum var. haydenii 
> scientific name
> 9394 	11632 	17394 	family 	Retroviridae 	scientific name
> 277861 	327045 	9394 	subfamily 	Orthoretrovirinae 	scientific name
> 122448 	153057 	277861 	genus 	Alpharetrovirus 	scientific name
> 301952 	353825 	122448 	no rank 	unclassified Alpharetrovirus 
> scientific name
> 9584
> 	11876
> 	301952
> 	species
> 	Avian sarcoma virus
> 	scientifice name
>
>
> Thanks
> Deepak
>
> On 4/11/2010 2:53 PM, Richard Holland wrote:
>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>>
>> thanks,
>> Richard
>>
>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>>
>>    
>>> Hi,
>>>
>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>>
>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>>
>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>> <property name="nodeRank" column="node_rank"/>
>>> <property name="geneticCode" column="genetic_code"/>
>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>> <property name="leftValue" column="left_value"/>
>>> <property name="rightValue" column="right_value"/>
>>> <property name="parentNCBITaxID" column="parent_taxon_id"/>       ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>>
>>> Thanks
>>> Deepak Sheoran
>>>
>>>
>>>      
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>    
>


From holland at eaglegenomics.com  Mon Apr 12 07:07:55 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 12 Apr 2010 08:07:55 +0100
Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class
In-Reply-To: <D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
	<4BC23A46.7090304@gmail.com>
	<D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
Message-ID: <E7FB88D1-52D9-496C-86FA-738419FFF579@eaglegenomics.com>

Incidentally, BioJava's approach matches the description in the BioSQL docs at:

 http://biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME

(first example SQL statement - find the taxon id of the parent taxon for 'Homo sapiens' using a self-join)

The BioPerl/BioSQL load_ncbi_taxonomy.pl script however does not match this description.

cheers,
Richard

On 12 Apr 2010, at 07:57, Richard Holland wrote:

> Thanks Deepak. 
> 
> I've had a look at the code and I believe its due to the different ways in which BioJava and BioPerl load the taxon table. 
> 
> BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless auto-generated value that is never used.
> 
> BioPerl however is generating taxon_id values and linking them by setting parent_taxon_id to the generated value. The parent value from the NCBI taxonomy file is therefore replaced with the BioPerl generated parent ID, meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the link is to taxon_id instead. (I'm basing this comment on looking at load_ncbi_taxonomy.pl from the BioSQL archives.)
> 
> I believe if you load the taxonomy table using BioJava, you should see BioJava giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will behave correctly. But if you load with one then query with the other, you'll get incorrect results.
> 
> This sounds like a case for discussion on both lists - a matter of standardisation between the two projects. Not quickly/easily solvable for now.
> 
> cheers,
> Richard
> 
> On 11 Apr 2010, at 22:08, Deepak Sheoran wrote:
> 
>> I am using same table with biojava and bioperl taxon program and the output I get is below:
>> 
>> Biojava:
>> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is 
>>            Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. 
>> 
>> Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   (wrong way of doing things)
>> 
>> Bioperl:    
>> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is 
>>          Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified  Alpharetrovirus.
>> 
>> Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632   (Right way of doing things)
>> 
>> Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id.
>> 
>> Taxon and Taxon_name Table content which is being relevant  in discussion:
>> 
>> taxon_id	ncbi_taxon_id	parent_taxon_id	node_rank	name	name_class
>> 2901	3609	276240	genus	Rhamnus	scientific name
>> 3610	4403	3609	species	Platanus occidentalis	scientific name
>> 29052	48579	4403	species	Suillus placidus	scientific name
>> 114412	143975	48579	species	Diadasia australis	scientific name
>> 143976	176516	143975	species	Arnicastrum guerrerense	scientific name
>> 30680	50447	176516	family	Labiduridae	scientific name
>> 254757	301952	50447	varietas	Oreostemma alpigenum var. haydenii	scientific name
>> 9394	11632	17394	family	Retroviridae	scientific name
>> 277861	327045	9394	subfamily	Orthoretrovirinae	scientific name
>> 122448	153057	277861	genus	Alpharetrovirus	scientific name
>> 301952	353825	122448	no rank	unclassified Alpharetrovirus	scientific name
>> 9584
>> 11876
>> 301952
>> species
>> Avian sarcoma virus
>> scientifice name
>> 
>> Thanks
>> Deepak 
>> 
>> On 4/11/2010 2:53 PM, Richard Holland wrote:
>>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>>> 
>>> thanks,
>>> Richard
>>> 
>>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>>> 
>>> 
>>> 
>>>> Hi,
>>>> 
>>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>>> 
>>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>>> 
>>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>>> <property name="nodeRank" column="node_rank"/>
>>>> <property name="geneticCode" column="genetic_code"/>
>>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>>> <property name="leftValue" column="left_value"/>
>>>> <property name="rightValue" column="right_value"/>
>>>> <property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>>> 
>>>> Thanks
>>>> Deepak Sheoran
>>>> 
>>>> 
>>>> 
>>>> 
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: 
>>> holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>> 
>>> 
>>> 
>>> 
>> 
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Apr 12 06:57:57 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 12 Apr 2010 07:57:57 +0100
Subject: [Biojava-dev] Issue with SimpleNCBITaxon class
In-Reply-To: <4BC23A46.7090304@gmail.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
	<4BC23A46.7090304@gmail.com>
Message-ID: <D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>

Thanks Deepak. 

I've had a look at the code and I believe its due to the different ways in which BioJava and BioPerl load the taxon table. 

BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless auto-generated value that is never used.

BioPerl however is generating taxon_id values and linking them by setting parent_taxon_id to the generated value. The parent value from the NCBI taxonomy file is therefore replaced with the BioPerl generated parent ID, meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the link is to taxon_id instead. (I'm basing this comment on looking at load_ncbi_taxonomy.pl from the BioSQL archives.)

I believe if you load the taxonomy table using BioJava, you should see BioJava giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will behave correctly. But if you load with one then query with the other, you'll get incorrect results.

This sounds like a case for discussion on both lists - a matter of standardisation between the two projects. Not quickly/easily solvable for now.

cheers,
Richard

On 11 Apr 2010, at 22:08, Deepak Sheoran wrote:

> I am using same table with biojava and bioperl taxon program and the output I get is below:
> 
> Biojava:
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is 
>             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. 
> 
> Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   (wrong way of doing things)
> 
> Bioperl:    
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is 
>           Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified  Alpharetrovirus.
> 
> Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632   (Right way of doing things)
> 
> Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id.
> 
> Taxon and Taxon_name Table content which is being relevant  in discussion:
> 
> taxon_id	ncbi_taxon_id	parent_taxon_id	node_rank	name	name_class
> 2901	3609	276240	genus	Rhamnus	scientific name
> 3610	4403	3609	species	Platanus occidentalis	scientific name
> 29052	48579	4403	species	Suillus placidus	scientific name
> 114412	143975	48579	species	Diadasia australis	scientific name
> 143976	176516	143975	species	Arnicastrum guerrerense	scientific name
> 30680	50447	176516	family	Labiduridae	scientific name
> 254757	301952	50447	varietas	Oreostemma alpigenum var. haydenii	scientific name
> 9394	11632	17394	family	Retroviridae	scientific name
> 277861	327045	9394	subfamily	Orthoretrovirinae	scientific name
> 122448	153057	277861	genus	Alpharetrovirus	scientific name
> 301952	353825	122448	no rank	unclassified Alpharetrovirus	scientific name
> 9584
> 11876
> 301952
> species
> Avian sarcoma virus
> scientifice name
> 
> Thanks
> Deepak 
> 
> On 4/11/2010 2:53 PM, Richard Holland wrote:
>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>> 
>> thanks,
>> Richard
>> 
>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>> 
>>   
>> 
>>> Hi,
>>> 
>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>> 
>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>> 
>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>> <property name="nodeRank" column="node_rank"/>
>>> <property name="geneticCode" column="genetic_code"/>
>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>> <property name="leftValue" column="left_value"/>
>>> <property name="rightValue" column="right_value"/>
>>> <property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>> 
>>> Thanks
>>> Deepak Sheoran
>>> 
>>> 
>>>     
>>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: 
>> holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
>> 
>>   
>> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From trevor.paterson at roslin.ed.ac.uk  Tue Apr 13 11:41:01 2010
From: trevor.paterson at roslin.ed.ac.uk (trevor paterson (RI))
Date: Tue, 13 Apr 2010 12:41:01 +0100
Subject: [Biojava-dev] Biojava3 structure
In-Reply-To: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com>
Message-ID: <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk>

Andreas

I am trying to do an anoymous checkout of the whole bio-java 3 trunk  and it is failing on the structure module

I cant even do a copy command

the src/main tree seems corrupted - throwing an error
Error: Decompression of svndiff data failed  

Trevor Paterson PhD
new email trevor.paterson at roslin.ed.ac.uk

Bioinformatics 
The Roslin Institute
The Royal (Dick) School of Veterinary Studies
University of Edinburgh
Scotland EH25 9PS
phone +44 (0)131 5274197
http://www.roslin.ed.ac.uk
http://www.resspecies.org
http://www.thearkdb.org
Please consider the environment before printing this e-mail

The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336
Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. 

 
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org 
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of 
> Andreas Prlic
> Sent: 29 March 2010 03:03
> To: Scooter Willis
> Cc: biojava-dev
> Subject: Re: [Biojava-dev] Biojava3 structure
> 
> Hi Scooter,
> 
> at the present the structure modules depend on the alignment 
> module and on the (old) core module.  This is for aligning 
> ATOM and SEQRES residues in the PDB files, and for the Smith 
> Waterman alignment based 3D structure superposition. If we 
> target a release of biojava 3 in about a month, I don't think 
> it will be possible to break this out, mainly because the 
> alignment module is still based on the biojava 1 code base. 
> Overall I think that the core module probably should still be 
> part of the BioJava 3 release. Any opinions on that?
> 
> Andreas
> 
> On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis 
> <HWillis at scripps.edu> wrote:
> 
> > Andreas
> >
> > I needed to do some work with a PDB file so started to use the 
> > structure library. It looks like it depends on all the old biojava 
> > code. Mainly the structure exceptions that extend 
> bioexception is the 
> > first thing tripping me up. Should the biojava3-structure 
> module have 
> > any external dependencies or am I working with the wrong package?
> >
> > Thanks
> >
> > Scooter
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 


From andreas at sdsc.edu  Tue Apr 13 14:04:20 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 13 Apr 2010 07:04:20 -0700
Subject: [Biojava-dev] Biojava3 structure
In-Reply-To: <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk>
References: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com>
	<050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk>
Message-ID: <g2p59a41c431004130704i9c4b8ca7h77b18da7e2bb8808@mail.gmail.com>

Hi Trevor,

I can confirm the same behaviour from our anonymous SVN. Developer SVN
seems to be ok and I also ran an svnadmin verify without problems. I
suppose we are having issues with the anonymous SVN server again...
I'll ask the OBF helpdesk to take a another look ...

Can you try and let us know if checkout from svn/git  from github
works for you in the meanwhile ? e.g.

svn co http://svn.github.com/biojava/biojava.git ./biojava

Thanks,

Andreas


On Tue, Apr 13, 2010 at 4:41 AM, trevor paterson (RI)
<trevor.paterson at roslin.ed.ac.uk> wrote:
> Andreas
>
> I am trying to do an anoymous checkout of the whole bio-java 3 trunk ?and it is failing on the structure module
>
> I cant even do a copy command
>
> the src/main tree seems corrupted - throwing an error
> Error: Decompression of svndiff data failed
>
> Trevor Paterson PhD
> new email trevor.paterson at roslin.ed.ac.uk
>
> Bioinformatics
> The Roslin Institute
> The Royal (Dick) School of Veterinary Studies
> University of Edinburgh
> Scotland EH25 9PS
> phone +44 (0)131 5274197
> http://www.roslin.ed.ac.uk
> http://www.resspecies.org
> http://www.thearkdb.org
> Please consider the environment before printing this e-mail
>
> The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336
> Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
>
>
>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org
>> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of
>> Andreas Prlic
>> Sent: 29 March 2010 03:03
>> To: Scooter Willis
>> Cc: biojava-dev
>> Subject: Re: [Biojava-dev] Biojava3 structure
>>
>> Hi Scooter,
>>
>> at the present the structure modules depend on the alignment
>> module and on the (old) core module. ?This is for aligning
>> ATOM and SEQRES residues in the PDB files, and for the Smith
>> Waterman alignment based 3D structure superposition. If we
>> target a release of biojava 3 in about a month, I don't think
>> it will be possible to break this out, mainly because the
>> alignment module is still based on the biojava 1 code base.
>> Overall I think that the core module probably should still be
>> part of the BioJava 3 release. Any opinions on that?
>>
>> Andreas
>>
>> On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis
>> <HWillis at scripps.edu> wrote:
>>
>> > Andreas
>> >
>> > I needed to do some work with a PDB file so started to use the
>> > structure library. It looks like it depends on all the old biojava
>> > code. Mainly the structure exceptions that extend
>> bioexception is the
>> > first thing tripping me up. Should the biojava3-structure
>> module have
>> > any external dependencies or am I working with the wrong package?
>> >
>> > Thanks
>> >
>> > Scooter
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------


From biopython at maubp.freeserve.co.uk  Thu Apr 15 17:54:56 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 15 Apr 2010 18:54:56 +0100
Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class
In-Reply-To: <D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
References: <4BC2200D.8000109@gmail.com>
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>
	<4BC23A46.7090304@gmail.com>
	<D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
Message-ID: <m2o320fb6e01004151054rcb57a28fvad135dffbe35d5fa@mail.gmail.com>

Hi,

I've CC'd this to the BioSQL mailing list for cross project
discussion.

On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland  wrote:
> Thanks Deepak.
>
> I've had a look at the code and I believe its due to the
> different ways in which BioJava and BioPerl load the
> taxon table.
>
> BioJava sets the ncbi_taxon_id and parent_taxon_id
> columns based on the values from the NCBI taxonomy
> file. The taxon_id column in BioJava is a meaningless
> auto-generated value that is never used.
>
> BioPerl however is generating taxon_id values and
> linking them by setting parent_taxon_id to the
> generated value. The parent value from the NCBI
> taxonomy file is therefore replaced with the BioPerl
> generated parent ID, meaning that instead of linking
> from parent_taxon_id to ncbi_taxon_id as per BioJava,
> the link is to taxon_id instead. (I'm basing this
> comment on looking at load_ncbi_taxonomy.pl from
> the BioSQL archives.)

Note that old versions of load_ncbi_taxonomy.pl
(which is part of BioSQL, not part of BioPerl) would
set taxon_id equal to ncbi_taxon_id, see:
http://bugzilla.open-bio.org/show_bug.cgi?id=2470

This may help explain the confusion.

> I believe if you load the taxonomy table using BioJava,
> you should see BioJava giving correct behaviour.
> Likewise if you load it using BioPerl, BioPerl will
> behave correctly. But if you load with one then query
> with the other, you'll get incorrect results.
>
> This sounds like a case for discussion on both lists -
> a matter of standardisation between the two projects.
> Not quickly/easily solvable for now.

Its not just two projects (BioPerl & BioJava) (grin).
Its at least five projects (BioSQL itself plus BioRuby
and Biopython).

I'm not sure about BioRuby's implementation, but
currently I think BioJava is the odd one out - BioPerl,
Biopython, and the BioSQL's load_ncbi_taxonomy.pl
all make entries in parent_taxon_id reference the
automatically generated taxon_id (please correct
me if I am wrong).

My personal view is that bioperl-db is the reference
implementation and should be followed in the event
of any ambiguity within BioSQL. In this particular
case, there is actually a BioSQL script to check
against too (load_ncbi_taxonomy.pl).

Hopefully Hilmar can give us an official verdict...

Peter


From andreas at sdsc.edu  Fri Apr 16 17:39:37 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 16 Apr 2010 10:39:37 -0700
Subject: [Biojava-dev] Biojava3-genetics
In-Reply-To: <4BC806F4.3090302@wur.nl>
References: <4BC806F4.3090302@wur.nl>
Message-ID: <r2n59a41c431004161039hd93b268eu159de8a6659d969f@mail.gmail.com>

Hi Richard,

any contribution is welcome. What do you have in mind in particular? Perhaps
there is already something there along those lines...

Andreas

On Thu, Apr 15, 2010 at 11:43 PM, Richard Finkers <Richard.Finkers at wur.nl>wrote:

> Dear List,
>
> I would be interested in adding a module for genetic analysis to the
> biojava3 project. Are there others who are interested in this as well and
> with who should I discuss this further?
>
> Thanks,
> Richard
>
>
> --
> Dr. Richard Finkers
> Researcher Plant Breeding
> Wageningen UR Plant Breeding
> P.O. Box 16, 6700 AA, Wageningen, The Netherlands
> Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB
> Wageningen, The Netherlands
> Tel. +31-317-484165 Fax +31-317-418094
> http://www.plantbreeding.wur.nl/ <http://www.plantbreeding.wur.nl>
> https://www.eu-sol.wur.nl/ <https://www.eu-sol.wur.nl>
> https://cbsgdbase.wur.nl/ <https://cbsgdbase.wur.nl>
> http://solgenomics.wur.nl/ <http://solgenomics.wur.nl>
> http://www.disclaimer-uk.wur.nl/
>
>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------


From sheoran143 at gmail.com  Fri Apr 16 18:43:59 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Fri, 16 Apr 2010 13:43:59 -0500
Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class
In-Reply-To: <m2o320fb6e01004151054rcb57a28fvad135dffbe35d5fa@mail.gmail.com>
References: <4BC2200D.8000109@gmail.com>	
	<B2DECC5B-650E-434E-8955-5F02DB4297AC@eaglegenomics.com>	
	<4BC23A46.7090304@gmail.com>	
	<D75549FE-6866-4397-ACEE-A897C719C441@eaglegenomics.com>
	<m2o320fb6e01004151054rcb57a28fvad135dffbe35d5fa@mail.gmail.com>
Message-ID: <4BC8AFEF.70107@gmail.com>

What my experience says on this issue we should make use of taxon_id 
because its a unique key in a local instance of biosql.
ncbi_taxon_id should only be used for mapping purpose only so that a 
person can map his local taxon_id to a ncbi_taxon_id otherwise it defeat 
the sole purpose of having taxon_id as primary key in taxon table. The 
main goal which I think when biosql is designed is to make it 
independent of any other organization like genbank or NCBI but its a 
feature so that we can map a number(ncbi_taxon_id) given by a know 
authority to a local number (taxon_id).

Deepak Sheoran

On 4/15/2010 12:54 PM, Peter wrote:
> Hi,
>
> I've CC'd this to the BioSQL mailing list for cross project
> discussion.
>
> On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland  wrote:
>    
>> Thanks Deepak.
>>
>> I've had a look at the code and I believe its due to the
>> different ways in which BioJava and BioPerl load the
>> taxon table.
>>
>> BioJava sets the ncbi_taxon_id and parent_taxon_id
>> columns based on the values from the NCBI taxonomy
>> file. The taxon_id column in BioJava is a meaningless
>> auto-generated value that is never used.
>>
>> BioPerl however is generating taxon_id values and
>> linking them by setting parent_taxon_id to the
>> generated value. The parent value from the NCBI
>> taxonomy file is therefore replaced with the BioPerl
>> generated parent ID, meaning that instead of linking
>> from parent_taxon_id to ncbi_taxon_id as per BioJava,
>> the link is to taxon_id instead. (I'm basing this
>> comment on looking at load_ncbi_taxonomy.pl from
>> the BioSQL archives.)
>>      
> Note that old versions of load_ncbi_taxonomy.pl
> (which is part of BioSQL, not part of BioPerl) would
> set taxon_id equal to ncbi_taxon_id, see:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2470
>
> This may help explain the confusion.
>
>    
>> I believe if you load the taxonomy table using BioJava,
>> you should see BioJava giving correct behaviour.
>> Likewise if you load it using BioPerl, BioPerl will
>> behave correctly. But if you load with one then query
>> with the other, you'll get incorrect results.
>>
>> This sounds like a case for discussion on both lists -
>> a matter of standardisation between the two projects.
>> Not quickly/easily solvable for now.
>>      
> Its not just two projects (BioPerl&  BioJava) (grin).
> Its at least five projects (BioSQL itself plus BioRuby
> and Biopython).
>
> I'm not sure about BioRuby's implementation, but
> currently I think BioJava is the odd one out - BioPerl,
> Biopython, and the BioSQL's load_ncbi_taxonomy.pl
> all make entries in parent_taxon_id reference the
> automatically generated taxon_id (please correct
> me if I am wrong).
>
> My personal view is that bioperl-db is the reference
> implementation and should be followed in the event
> of any ambiguity within BioSQL. In this particular
> case, there is actually a BioSQL script to check
> against too (load_ncbi_taxonomy.pl).
>
> Hopefully Hilmar can give us an official verdict...
>
> Peter
>    


From sylvain.foisy at diploide.net  Sat Apr 17 14:00:07 2010
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Sat, 17 Apr 2010 10:00:07 -0400 (EDT)
Subject: [Biojava-dev] Eclipse + maven woes...
Message-ID: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net>

Hi,

Again, I feel stupid asking these newbie questions... I finally got my and
on a new MacBook Pro and re-installing the apps to get stuff moving. As
usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a
checkout of the developer's tree.

I have installed the latest Subversion and Maven plugins. When I want to
create a new project, I try the following:

1) I right click to select "New > Other..." in the Navigator panel;

2) I select "SVN > Project from SVN", which leads me to a window where the
 location of the developer's tree is in svn+ssh; in the window that comes
up next, I use this URL to get the "Finish" button activated:

 svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk

3) After that, I choose the "Check out as a project configured using the New
Project Wizard", which pop the window where I select "Maven > Maven
Project".

4) I get a "New Maven Project" window where I select the default. The window
then changes to a "Select archetype" where I also use the default
selections.

5) This is where I can't seem to be moving forward... The window that pops
out ask me for an Artefact ID. I am clueless about what to put... The
process stops there :-(

Maven is probably a cool tool but its learning curve is pretty steep...
Shouldn't all this be automatic after "Maven > Maven Project"

Thanks in advance. I'll put the solution into the wiki ;-)

Sylvain

===================================================================

 Sylvain Foisy, Ph. D.
 Consultant Bio-informatique / Bioinformatics
 Diploide.net - TI pour la vie / IT for Life

 Courriel: sylvain.foisy at diploide.net
 Web: http://www.diploide.net

===================================================================


From heuermh at acm.org  Mon Apr 19 03:33:00 2010
From: heuermh at acm.org (Michael Heuer)
Date: Sun, 18 Apr 2010 23:33:00 -0400 (EDT)
Subject: [Biojava-dev] Eclipse + maven woes...
In-Reply-To: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net>
Message-ID: <Pine.GSO.4.44.1004182329080.28020-100000@shell3.shore.net>

Sylvain Foisy wrote:

> Again, I feel stupid asking these newbie questions... I finally got my and
> on a new MacBook Pro and re-installing the apps to get stuff moving. As
> usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a
> checkout of the developer's tree.
>
> I have installed the latest Subversion and Maven plugins. When I want to
> create a new project, I try the following:
>
> 1) I right click to select "New > Other..." in the Navigator panel;
>
> 2) I select "SVN > Project from SVN", which leads me to a window where the
>  location of the developer's tree is in svn+ssh; in the window that comes
> up next, I use this URL to get the "Finish" button activated:
>
>  svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk
>
> 3) After that, I choose the "Check out as a project configured using the New
> Project Wizard", which pop the window where I select "Maven > Maven
> Project".
>
> 4) I get a "New Maven Project" window where I select the default. The window
> then changes to a "Select archetype" where I also use the default
> selections.

This last step doesn't sound right, here Eclipse is creating a brand new
Maven project for you instead of creating a Maven-based project from the
metadata already in subversion.

In the SVN window you should see "Check out as Maven Project" when you
right-click, unless that has changed with newer versions of the maven
plugin.

   michael


From andreas at sdsc.edu  Mon Apr 19 04:17:17 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 18 Apr 2010 21:17:17 -0700
Subject: [Biojava-dev] Eclipse + maven woes...
In-Reply-To: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net>
References: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net>
Message-ID: <m2t59a41c431004182117xe69b55ffj6d13c37ff2c7e50a@mail.gmail.com>

Hi Sylvain,

The place to start the checkout in eclipse is the SVN repository browser.
There you can do a right-click on the biojava/trunk folder and check out as
a Maven project.

Andreas

On Sat, Apr 17, 2010 at 7:00 AM, Sylvain Foisy
<sylvain.foisy at diploide.net>wrote:

> Hi,
>
> Again, I feel stupid asking these newbie questions... I finally got my and
> on a new MacBook Pro and re-installing the apps to get stuff moving. As
> usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a
> checkout of the developer's tree.
>
> I have installed the latest Subversion and Maven plugins. When I want to
> create a new project, I try the following:
>
> 1) I right click to select "New > Other..." in the Navigator panel;
>
> 2) I select "SVN > Project from SVN", which leads me to a window where the
>  location of the developer's tree is in svn+ssh; in the window that comes
> up next, I use this URL to get the "Finish" button activated:
>
>  svn+ssh://
> dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk
>
> 3) After that, I choose the "Check out as a project configured using the
> New
> Project Wizard", which pop the window where I select "Maven > Maven
> Project".
>
> 4) I get a "New Maven Project" window where I select the default. The
> window
> then changes to a "Select archetype" where I also use the default
> selections.
>
> 5) This is where I can't seem to be moving forward... The window that pops
> out ask me for an Artefact ID. I am clueless about what to put... The
> process stops there :-(
>
> Maven is probably a cool tool but its learning curve is pretty steep...
> Shouldn't all this be automatic after "Maven > Maven Project"
>
> Thanks in advance. I'll put the solution into the wiki ;-)
>
> Sylvain
>
> ===================================================================
>
>  Sylvain Foisy, Ph. D.
>  Consultant Bio-informatique / Bioinformatics
>  Diploide.net - TI pour la vie / IT for Life
>
>  Courriel: sylvain.foisy at diploide.net
>  Web: http://www.diploide.net
>
> ===================================================================
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------


From sylvain.foisy at diploide.net  Mon Apr 19 20:47:43 2010
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Mon, 19 Apr 2010 16:47:43 -0400
Subject: [Biojava-dev] Eclipse + maven woes...
In-Reply-To: <m2t59a41c431004182117xe69b55ffj6d13c37ff2c7e50a@mail.gmail.com>
Message-ID: <C7F239AF.16DB7%sylvain.foisy@diploide.net>

Hi Andreas,

I finally got something working but it wasn't automatic... Switching to the
SVN Repositories perspective, I right-clicked on trunk and selected
"Checkout..." After d/l the code, I had to right-click the biojava-live that
was now found in the Java Browsing perspective, select the "m2 Maven >
Enable Dependancy Management" to have it working. If I tried the "Check out
as..." option, I would have a window popping out with "Check out Maven
projects with SCM" pre-selected and I would be stuck in the Group
ID/Artefact ID mayhem.

Thanks for the time. Back to coding ;-)

Sylvain

On 19/04/10 00:17, "[NAME]" <[ADDRESS]> wrote:

> Hi Sylvain,
> 
> The place to start the checkout in eclipse is the SVN repository browser.?
> There you can do a right-click on the biojava/trunk folder and check out as a
> Maven project. 
> 
> Andreas
> 
> On Sat, Apr 17, 2010 at 7:00 AM, Sylvain Foisy <sylvain.foisy at diploide.net>
> wrote:
>> Hi,
>> 
>> Again, I feel stupid asking these newbie questions... I finally got my and
>> on a new MacBook Pro and re-installing the apps to get stuff moving. As
>> usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a
>> checkout of the developer's tree.
>> 
>> I have installed the latest Subversion and Maven plugins. When I want to
>> create a new project, I try the following:
>> 
>> 1) I right click to select "New > Other..." in the Navigator panel;
>> 
>> 2) I select "SVN > Project from SVN", which leads me to a window where the
>> ?location of the developer's tree is in svn+ssh; in the window that comes
>> up next, I use this URL to get the "Finish" button activated:
>> 
>> ?svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk
>> <http://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk>
>> 
>> 3) After that, I choose the "Check out as a project configured using the New
>> Project Wizard", which pop the window where I select "Maven > Maven
>> Project".
>> 
>> 4) I get a "New Maven Project" window where I select the default. The window
>> then changes to a "Select archetype" where I also use the default
>> selections.
>> 
>> 5) This is where I can't seem to be moving forward... The window that pops
>> out ask me for an Artefact ID. I am clueless about what to put... The
>> process stops there :-(
>> 
>> Maven is probably a cool tool but its learning curve is pretty steep...
>> Shouldn't all this be automatic after "Maven > Maven Project"
>> 
>> Thanks in advance. I'll put the solution into the wiki ;-)
>> 
>> Sylvain
>> 
>> ===================================================================
>> 
>> ?Sylvain Foisy, Ph. D.
>> ?Consultant Bio-informatique / Bioinformatics
>> ?Diploide.net - TI pour la vie / IT for Life
>> 
>> ?Courriel: sylvain.foisy at diploide.net
>> ?Web: http://www.diploide.net
>> 
>> ===================================================================
>> 
>> 
>> 
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> 


From andreas at sdsc.edu  Tue Apr 27 05:33:51 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 26 Apr 2010 22:33:51 -0700
Subject: [Biojava-dev] accepted GSoC projects
Message-ID: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>

Dear all,

Google has released the results for GSoC: Congratulations to Mark Chapman
and Jianjiong Gao for having been accepted to work on the MSA and PTM
projects for BioJava! Let's start the "community bonding" process (
http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we all are
looking forward to work with you on this during the summer. The Mentors and
co-mentors will be Peter Rose for the PTM and Scooter Willis and Kyle
Ellrott for the MSA project (and me).

I want to thank all of of you who submitted proposals or showed interest in
other ways for the Google Summer of Code. We hope you are not too
disappointed if your application did not get accepted this time. We had a
large number (52) applications and the the overall quality of the
submissions was very high. We would like to stay in touch with you and we
hope that you are interested in BioJava also beyond the scope of GSoC. There
are a number of different ways how to contribute:  We are always looking for
people who provide code and patches to further improve our library, help out
with the documentation on the Wiki page, or answer questions on the mailing
lists.

Let's all give Mark and Jianjiong  a warm welcome to the BioJava community.
For those of you who are interested in following the progress of the
projects, as usually, the development related discussions are going to be on
the biojava-dev list.

Happy coding!

Andreas


From jianjiong.gao at gmail.com  Tue Apr 27 19:13:12 2010
From: jianjiong.gao at gmail.com (Jianjiong Gao)
Date: Tue, 27 Apr 2010 14:13:12 -0500
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
Message-ID: <h2kc82264f51004271213u1ea78e1bq29184a65b6315cbe@mail.gmail.com>

Dear Dr. Prlic and Everyone,

Thanks for the warm welcome. I am so glad that I have the chance to
work with the BioJava community this summer. I would like to briefly
introduce myself. My name is Jianjiong (JJ) Gao. I am a PhD student in
Computer Science at University of Missouri, Columbia. My study is
focusing on Bioinformatics, specifically computational proteomics and
PTMs.

I came across BioJava about two years ago when I was working on a
plugin for Cytoscape, and was attracted by the idea of providing
generic Java API for bioinformatics applications. I was thinking maybe
someday I could do some coding for BioJava. And now I got the chance
:)

Best Regards,
-JJ

On Tue, Apr 27, 2010 at 12:33 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> Dear all,
>
> Google has released the results for GSoC: Congratulations to Mark Chapman
> and Jianjiong Gao for having been accepted to work on the MSA and PTM
> projects for BioJava! Let's start the "community bonding" process (
> http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) ?and we all are
> looking forward to work with you on this during the summer. The Mentors and
> co-mentors will be Peter Rose for the PTM and Scooter Willis and Kyle
> Ellrott for the MSA project (and me).
>
> I want to thank all of of you who submitted proposals or showed interest in
> other ways for the Google Summer of Code. We hope you are not too
> disappointed if your application did not get accepted this time. We had a
> large number (52) applications and the the overall quality of the
> submissions was very high. We would like to stay in touch with you and we
> hope that you are interested in BioJava also beyond the scope of GSoC. There
> are a number of different ways how to contribute: ?We are always looking for
> people who provide code and patches to further improve our library, help out
> with the documentation on the Wiki page, or answer questions on the mailing
> lists.
>
> Let's all give Mark and Jianjiong ?a warm welcome to the BioJava community.
> For those of you who are interested in following the progress of the
> projects, as usually, the development related discussions are going to be on
> the biojava-dev list.
>
> Happy coding!
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From chapman at cs.wisc.edu  Wed Apr 28 04:18:25 2010
From: chapman at cs.wisc.edu (Mark Chapman)
Date: Tue, 27 Apr 2010 23:18:25 -0500
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
Message-ID: <4BD7B711.9090108@cs.wisc.edu>

Hi all,

Thank you to Google, Open Bioinformatics Foundation, BioJava, and my mentors for 
this opportunity.  As a short introduction, I am Mark Chapman, a graduate 
student in Computer Sciences at the University of Wisconsin - Madison.  My focus 
is in artificial intelligence and bioinformatics.  This summer, I will add a 
Multiple Sequence Alignment module to BioJava.

My first task will be to update the alignment module to BioJava3 and to design 
the interface for MSA.  My second goal is to implement a progressive MSA styled 
after clustalw.  After that, I will add alternative routines for each step.

Any ideas for the MSA project as well as more sources of programming wisdom are 
quite welcome.  For example, Andreas suggested a series about Java parallelism 
and lazy execution 
(http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). 
  I also noted a useful tip for iterative development 
(http://en.flossmanuals.net/GSoCMentoring/Workflow).

Thanks again,
Mark


On 4/27/2010 12:33 AM, Andreas Prlic wrote:
> Dear all,
>
> Google has released the results for GSoC: Congratulations to Mark
> Chapman and Jianjiong Gao for having been accepted to work on the MSA
> and PTM projects for BioJava! Let's start the "community bonding"
> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
> all are looking forward to work with you on this during the summer. The
> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
> and Kyle Ellrott for the MSA project (and me).
>
> I want to thank all of of you who submitted proposals or showed interest
> in other ways for the Google Summer of Code. We hope you are not too
> disappointed if your application did not get accepted this time. We had
> a  large number (52) applications and the the overall quality of the
> submissions was very high. We would like to stay in touch with you and
> we hope that you are interested in BioJava also beyond the scope of
> GSoC. There are a number of different ways how to contribute:  We are
> always looking for people who provide code and patches to further
> improve our library, help out with the documentation on the Wiki page,
> or answer questions on the mailing lists.
>
> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
> community.  For those of you who are interested in following the
> progress of the projects, as usually, the development related
> discussions are going to be on the biojava-dev list.
>
> Happy coding!
>
> Andreas
>
>


From andreas at sdsc.edu  Wed Apr 28 17:31:58 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 28 Apr 2010 10:31:58 -0700
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <4BD7B711.9090108@cs.wisc.edu>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
Message-ID: <w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>

> Any ideas for the MSA project as well as more sources of programming wisdom
> are quite welcome.  For example, Andreas suggested a series about Java
> parallelism and lazy execution (
> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/).
>


credits for the links go to Scooter, who recommended those ;-)  My general
recommendation is to read Joshua Bloch's "Effective Java".
http://java.sun.com/docs/books/effective/ It is a collection of  rules that
should help in avoiding some frequently made mistakes...

Andreas


>  I also noted a useful tip for iterative development (
> http://en.flossmanuals.net/GSoCMentoring/Workflow).
>
> Thanks again,
> Mark
>
>
>
> On 4/27/2010 12:33 AM, Andreas Prlic wrote:
>
>> Dear all,
>>
>> Google has released the results for GSoC: Congratulations to Mark
>> Chapman and Jianjiong Gao for having been accepted to work on the MSA
>> and PTM projects for BioJava! Let's start the "community bonding"
>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
>> all are looking forward to work with you on this during the summer. The
>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
>> and Kyle Ellrott for the MSA project (and me).
>>
>> I want to thank all of of you who submitted proposals or showed interest
>> in other ways for the Google Summer of Code. We hope you are not too
>> disappointed if your application did not get accepted this time. We had
>> a  large number (52) applications and the the overall quality of the
>> submissions was very high. We would like to stay in touch with you and
>> we hope that you are interested in BioJava also beyond the scope of
>> GSoC. There are a number of different ways how to contribute:  We are
>> always looking for people who provide code and patches to further
>> improve our library, help out with the documentation on the Wiki page,
>> or answer questions on the mailing lists.
>>
>> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
>> community.  For those of you who are interested in following the
>> progress of the projects, as usually, the development related
>> discussions are going to be on the biojava-dev list.
>>
>> Happy coding!
>>
>> Andreas
>>
>>
>>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------


From HWillis at scripps.edu  Wed Apr 28 17:57:14 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 28 Apr 2010 13:57:14 -0400
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
	<w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>
Message-ID: <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu>

Andreas

Those links were sent to me by Mark Southern who sits a couple doors down and a past BioJava contributor for the sequence viewer. We should avoid bringing in any external parallel frameworks but at minimum give ourselves enough abstraction with a backend multi-threaded job-processing approach to take advantage of a multi-processor box and a cluster via Terracotta.  If the abstraction of the jobs and the mapping of resources is generic enough then that allows different implementations in various cluster environments for those who have found the next best thing in parallel computing!

Scooter

On Apr 28, 2010, at 1:31 PM, Andreas Prlic wrote:

>> Any ideas for the MSA project as well as more sources of programming wisdom
>> are quite welcome.  For example, Andreas suggested a series about Java
>> parallelism and lazy execution (
>> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/).
>> 
> 
> 
> credits for the links go to Scooter, who recommended those ;-)  My general
> recommendation is to read Joshua Bloch's "Effective Java".
> http://java.sun.com/docs/books/effective/ It is a collection of  rules that
> should help in avoiding some frequently made mistakes...
> 
> Andreas
> 
> 
> 
> 
> 
> 
>> I also noted a useful tip for iterative development (
>> http://en.flossmanuals.net/GSoCMentoring/Workflow).
>> 
>> Thanks again,
>> Mark
>> 
>> 
>> 
>> On 4/27/2010 12:33 AM, Andreas Prlic wrote:
>> 
>>> Dear all,
>>> 
>>> Google has released the results for GSoC: Congratulations to Mark
>>> Chapman and Jianjiong Gao for having been accepted to work on the MSA
>>> and PTM projects for BioJava! Let's start the "community bonding"
>>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
>>> all are looking forward to work with you on this during the summer. The
>>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
>>> and Kyle Ellrott for the MSA project (and me).
>>> 
>>> I want to thank all of of you who submitted proposals or showed interest
>>> in other ways for the Google Summer of Code. We hope you are not too
>>> disappointed if your application did not get accepted this time. We had
>>> a  large number (52) applications and the the overall quality of the
>>> submissions was very high. We would like to stay in touch with you and
>>> we hope that you are interested in BioJava also beyond the scope of
>>> GSoC. There are a number of different ways how to contribute:  We are
>>> always looking for people who provide code and patches to further
>>> improve our library, help out with the documentation on the Wiki page,
>>> or answer questions on the mailing lists.
>>> 
>>> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
>>> community.  For those of you who are interested in following the
>>> progress of the projects, as usually, the development related
>>> discussions are going to be on the biojava-dev list.
>>> 
>>> Happy coding!
>>> 
>>> Andreas
>>> 
>>> 
>>> 
> 
> 
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From quantum7 at gmail.com  Wed Apr 28 19:06:40 2010
From: quantum7 at gmail.com (Spencer Bliven)
Date: Wed, 28 Apr 2010 12:06:40 -0700
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <4BD7B711.9090108@cs.wisc.edu>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
Message-ID: <p2kdf1d36d1004281206r9e86ab01p8cd2b406667aa0ba@mail.gmail.com>

Mark-

Welcome to the Biojava community! Adding multiple sequence alignments will
be a nice feature for the library.

One suggestion I have is to make any data structures for multiple alignments
you create as general as possible, and to think about whether the special
cases can still be represented. For instance, can you store an alignment
where some of the sequence is unknown (eg {ABCD, ABXD})? Can you store an
alignment where only a subset of the sequences are defined? I recently had
to represent an alignment like this:
ABCD EFGH
EFGH ABCD
This sort of alignment can't be written using just gaps; I had to make a new
structure to store pairs {(A,A), (B,B), ...} and rewrite much of the
existing alignment functionality based on that.

Anyway, I don't mean to get bogged down in specific examples or exceptions.
I just wanted to point out that there are a lot of methods which can be used
to define some sort of alignment between a set of sequences, and it would be
nice if the BioJava alignment package was general enough to accommodate such
methods in the future without reinventing the wheel.

Cheers!
Spencer

P.S. I ran into such weird alignments while working on structural
alignments, which are not well behaved like traditional multiple sequence
alignments. Andreas knows all about both types of alignment, and can
probably judge better than I how much generality is worth spending your time
on.


On Tue, Apr 27, 2010 at 9:18 PM, Mark Chapman <chapman at cs.wisc.edu> wrote:

> Hi all,
>
> Thank you to Google, Open Bioinformatics Foundation, BioJava, and my
> mentors for this opportunity.  As a short introduction, I am Mark Chapman, a
> graduate student in Computer Sciences at the University of Wisconsin -
> Madison.  My focus is in artificial intelligence and bioinformatics.  This
> summer, I will add a Multiple Sequence Alignment module to BioJava.
>
> My first task will be to update the alignment module to BioJava3 and to
> design the interface for MSA.  My second goal is to implement a progressive
> MSA styled after clustalw.  After that, I will add alternative routines for
> each step.
>
> Any ideas for the MSA project as well as more sources of programming wisdom
> are quite welcome.  For example, Andreas suggested a series about Java
> parallelism and lazy execution (
> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/).
>  I also noted a useful tip for iterative development (
> http://en.flossmanuals.net/GSoCMentoring/Workflow).
>
> Thanks again,
> Mark
>
>
>
> On 4/27/2010 12:33 AM, Andreas Prlic wrote:
>
>> Dear all,
>>
>> Google has released the results for GSoC: Congratulations to Mark
>> Chapman and Jianjiong Gao for having been accepted to work on the MSA
>> and PTM projects for BioJava! Let's start the "community bonding"
>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
>> all are looking forward to work with you on this during the summer. The
>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
>> and Kyle Ellrott for the MSA project (and me).
>>
>> I want to thank all of of you who submitted proposals or showed interest
>> in other ways for the Google Summer of Code. We hope you are not too
>> disappointed if your application did not get accepted this time. We had
>> a  large number (52) applications and the the overall quality of the
>> submissions was very high. We would like to stay in touch with you and
>> we hope that you are interested in BioJava also beyond the scope of
>> GSoC. There are a number of different ways how to contribute:  We are
>> always looking for people who provide code and patches to further
>> improve our library, help out with the documentation on the Wiki page,
>> or answer questions on the mailing lists.
>>
>> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
>> community.  For those of you who are interested in following the
>> progress of the projects, as usually, the development related
>> discussions are going to be on the biojava-dev list.
>>
>> Happy coding!
>>
>> Andreas
>>
>>
>>  _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From chapman at cs.wisc.edu  Thu Apr 29 01:09:07 2010
From: chapman at cs.wisc.edu (Mark Chapman)
Date: Wed, 28 Apr 2010 20:09:07 -0500
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
	<w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>
	<6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu>
Message-ID: <4BD8DC33.7010607@cs.wisc.edu>

Here is a summary of the concurrency lessons I learned that are useful with or 
without the functional programming paradigm --

1: implement Callable<T> to submit tasks for concurrent/parallel/lazy execution
  - call() methods just wrap a call to the computation intensive method
2: share a fixed size thread pool with task queue to avoid
  - overhead of thread creation/destruction,
  - too many simultaneous threads, and
  - most blocking issues
3: place thread blocking Future<T>.get() calls within tasks later in the queue
  - while(!Future<T>.isDone()) Thread.yield(); may also help keep the pool active
4: execution in a task queue also enables easier logging and progress listening

There are two obvious places concurrent execution will fit in the MSA module --

1: building the distance matrix
  - queue pairwise alignment/scoring tasks in loop over all sequence pairs
2: progressive alignment
  - queue profile-profile alignment tasks in postfix traversal of guide tree 
(from leaves to root)

All our library copies of "Effective Java" are checked out, so I ordered a copy 
for my personal library.  The sample chapter on generics sold me.

Mark


On 4/28/2010 12:57 PM, Scooter Willis wrote:
> Andreas
>
> Those links were sent to me by Mark Southern who sits a couple doors down and a past BioJava contributor for the sequence viewer. We should avoid bringing in any external parallel frameworks but at minimum give ourselves enough abstraction with a backend multi-threaded job-processing approach to take advantage of a multi-processor box and a cluster via Terracotta.  If the abstraction of the jobs and the mapping of resources is generic enough then that allows different implementations in various cluster environments for those who have found the next best thing in parallel computing!
>
> Scooter
>
> On Apr 28, 2010, at 1:31 PM, Andreas Prlic wrote:
>
>>> Any ideas for the MSA project as well as more sources of programming wisdom
>>> are quite welcome.  For example, Andreas suggested a series about Java
>>> parallelism and lazy execution (
>>> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/).
>>>
>>
>>
>> credits for the links go to Scooter, who recommended those ;-)  My general
>> recommendation is to read Joshua Bloch's "Effective Java".
>> http://java.sun.com/docs/books/effective/ It is a collection of  rules that
>> should help in avoiding some frequently made mistakes...
>>
>> Andreas
>>
>>
>>
>>
>>
>>
>>> I also noted a useful tip for iterative development (
>>> http://en.flossmanuals.net/GSoCMentoring/Workflow).
>>>
>>> Thanks again,
>>> Mark
>>>
>>>
>>>
>>> On 4/27/2010 12:33 AM, Andreas Prlic wrote:
>>>
>>>> Dear all,
>>>>
>>>> Google has released the results for GSoC: Congratulations to Mark
>>>> Chapman and Jianjiong Gao for having been accepted to work on the MSA
>>>> and PTM projects for BioJava! Let's start the "community bonding"
>>>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap )  and we
>>>> all are looking forward to work with you on this during the summer. The
>>>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis
>>>> and Kyle Ellrott for the MSA project (and me).
>>>>
>>>> I want to thank all of of you who submitted proposals or showed interest
>>>> in other ways for the Google Summer of Code. We hope you are not too
>>>> disappointed if your application did not get accepted this time. We had
>>>> a  large number (52) applications and the the overall quality of the
>>>> submissions was very high. We would like to stay in touch with you and
>>>> we hope that you are interested in BioJava also beyond the scope of
>>>> GSoC. There are a number of different ways how to contribute:  We are
>>>> always looking for people who provide code and patches to further
>>>> improve our library, help out with the documentation on the Wiki page,
>>>> or answer questions on the mailing lists.
>>>>
>>>> Let's all give Mark and Jianjiong  a warm welcome to the BioJava
>>>> community.  For those of you who are interested in following the
>>>> progress of the projects, as usually, the development related
>>>> discussions are going to be on the biojava-dev list.
>>>>
>>>> Happy coding!
>>>>
>>>> Andreas
>>>>
>>>>
>>>>
>>
>>
>> --
>> -----------------------------------------------------------------------
>> Dr. Andreas Prlic
>> Senior Scientist, RCSB PDB Protein Data Bank
>> University of California, San Diego
>> (+1) 858.246.0526
>> -----------------------------------------------------------------------
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From andreas at sdsc.edu  Fri Apr 30 15:29:03 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 30 Apr 2010 08:29:03 -0700
Subject: [Biojava-dev] accepted GSoC projects
In-Reply-To: <4BD8DC33.7010607@cs.wisc.edu>
References: <u2w59a41c431004262233xe5553c17je23c2b42a3aae81d@mail.gmail.com>
	<4BD7B711.9090108@cs.wisc.edu>
	<w2g59a41c431004281031oe53560d6j2826a4cf4e5cb24d@mail.gmail.com>
	<6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu>
	<4BD8DC33.7010607@cs.wisc.edu>
Message-ID: <n2j59a41c431004300829n5155bcd0i4fbacb8e219a43ce@mail.gmail.com>

Hi Mark and Jianjiong,

In the meanwhile you should have received your login info for the develoment
SVN server. I suggest the following things as next steps:


*) If you have not done so already, sign up to the biojava-l and biojava-dev
mailing lists

*) Get a biojava checkout from the developmental SVN server.

*) add the LGPL license javadoc header
http://www.biojava.org/wiki/BioJava3_license to the templates in your IDE.

*) Take a look at the JUnit tests and add a new test for something that is
related for your projects

*) Take a look at the Wiki pages (e.g.
http://www.biojava.org/wiki/BioJava:CookBook ), get an account on the wiki
and improve one of the documentation pages

*) take a look at the javadocs at http://www.biojava.org/docs/api/index.html

Andreas


From andreas at sdsc.edu  Fri Apr 30 15:44:25 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 30 Apr 2010 08:44:25 -0700
Subject: [Biojava-dev] biojava SVN
Message-ID: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>

Hi,

The BioJava SVN has not been fully compiling ever since the Hackathon. I
guess things were quite in flux the last months and it is now time to make
sure SVN fully compiles again.  There is a few things we need to figure out
in order for that:

* Jar files for libraries that are not in a public Maven repository. Jules :
at some point you indicated that we might be able to get such jar files
hosted by the EBI Maven repository. Do you think that is still an
possibility and could you get a few libraries into that? In particular that
would be Jmol, Astex, and probably one or two other Jar files. That would
make the BioJava checkout process much smoother and not require a developer
to manually install jars for full functionality.

* We have a couple of modules that are fragmented and broken. This is due to
historic leftovers from when we started the re-factoring process. If all the
functionality has been moved into the new biojava3-core module, I would vote
for removing the modules starting with sequence*

Andreas


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------


From ayates at ebi.ac.uk  Fri Apr 30 15:48:01 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Fri, 30 Apr 2010 16:48:01 +0100
Subject: [Biojava-dev] biojava SVN
In-Reply-To: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
References: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
Message-ID: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>

Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on?

Andy

On 30 Apr 2010, at 16:44, Andreas Prlic wrote:

> Hi,
> 
> The BioJava SVN has not been fully compiling ever since the Hackathon. I
> guess things were quite in flux the last months and it is now time to make
> sure SVN fully compiles again.  There is a few things we need to figure out
> in order for that:
> 
> * Jar files for libraries that are not in a public Maven repository. Jules :
> at some point you indicated that we might be able to get such jar files
> hosted by the EBI Maven repository. Do you think that is still an
> possibility and could you get a few libraries into that? In particular that
> would be Jmol, Astex, and probably one or two other Jar files. That would
> make the BioJava checkout process much smoother and not require a developer
> to manually install jars for full functionality.
> 
> * We have a couple of modules that are fragmented and broken. This is due to
> historic leftovers from when we started the re-factoring process. If all the
> functionality has been moved into the new biojava3-core module, I would vote
> for removing the modules starting with sequence*
> 
> Andreas
> 
> 
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From ayates at ebi.ac.uk  Fri Apr 30 15:57:12 2010
From: ayates at ebi.ac.uk (Andy Yates)
Date: Fri, 30 Apr 2010 16:57:12 +0100
Subject: [Biojava-dev] biojava SVN
In-Reply-To: <CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>
References: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
	<475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>
	<CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>
Message-ID: <3C3AAC8F-5C03-44C1-B121-7808C0612A65@ebi.ac.uk>

As far as I remember you 'can' have one setup manually. I think I offered one hand-developed from one of my projects. Infact FYI:

http://code.google.com/p/dbcon/source/browse/#svn/trunk/maven-repo

It just requires the correct structure in place & it works. I went for it being hosted in SVN because there's a HTTP interface to it offered by Google. The EBI Maven repo is just a public HTTP directory. It's been some years since I did a deployment there but it's not hard to do & we should be able to do it locally & sync it to SVN 

Andy

On 30 Apr 2010, at 16:50, Richard Holland wrote:

> Could a small MVN repo be set up at OBF?
> 
> On 30 Apr 2010, at 16:48, Andy Yates wrote:
> 
>> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on?
>> 
>> Andy
>> 
>> On 30 Apr 2010, at 16:44, Andreas Prlic wrote:
>> 
>>> Hi,
>>> 
>>> The BioJava SVN has not been fully compiling ever since the Hackathon. I
>>> guess things were quite in flux the last months and it is now time to make
>>> sure SVN fully compiles again.  There is a few things we need to figure out
>>> in order for that:
>>> 
>>> * Jar files for libraries that are not in a public Maven repository. Jules :
>>> at some point you indicated that we might be able to get such jar files
>>> hosted by the EBI Maven repository. Do you think that is still an
>>> possibility and could you get a few libraries into that? In particular that
>>> would be Jmol, Astex, and probably one or two other Jar files. That would
>>> make the BioJava checkout process much smoother and not require a developer
>>> to manually install jars for full functionality.
>>> 
>>> * We have a couple of modules that are fragmented and broken. This is due to
>>> historic leftovers from when we started the re-factoring process. If all the
>>> functionality has been moved into the new biojava3-core module, I would vote
>>> for removing the modules starting with sequence*
>>> 
>>> Andreas
>>> 
>>> 
>>> -- 
>>> -----------------------------------------------------------------------
>>> Dr. Andreas Prlic
>>> Senior Scientist, RCSB PDB Protein Data Bank
>>> University of California, San Diego
>>> (+1) 858.246.0526
>>> -----------------------------------------------------------------------
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> 
>> -- 
>> Andrew Yates                   Ensembl Genomes Engineer
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/


From andreas at sdsc.edu  Fri Apr 30 16:27:09 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 30 Apr 2010 09:27:09 -0700
Subject: [Biojava-dev] biojava SVN
In-Reply-To: <CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>
References: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
	<475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>
	<CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>
Message-ID: <h2t59a41c431004300927gd9e48b6ag8415a16df49ba9f9@mail.gmail.com>

> Could a small MVN repo be set up at OBF?

I am pretty sure we could do that. Anybody volunteering? I can help with
getting the necessary permissions... Anybody knows some good docu for how to
set this up?

Andreas


On Fri, Apr 30, 2010 at 8:50 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Could a small MVN repo be set up at OBF?
>
> On 30 Apr 2010, at 16:48, Andy Yates wrote:
>
> > Does anyone know how hard it would be to get these into the public maven
> repository? The EBI repo is all well & good but updating it relies on
> BioJava always having a committer at the EBI. Now I know that is a very
> likely statement but is it something we can rely on?
> >
> > Andy
> >
> > On 30 Apr 2010, at 16:44, Andreas Prlic wrote:
> >
> >> Hi,
> >>
> >> The BioJava SVN has not been fully compiling ever since the Hackathon. I
> >> guess things were quite in flux the last months and it is now time to
> make
> >> sure SVN fully compiles again.  There is a few things we need to figure
> out
> >> in order for that:
> >>
> >> * Jar files for libraries that are not in a public Maven repository.
> Jules :
> >> at some point you indicated that we might be able to get such jar files
> >> hosted by the EBI Maven repository. Do you think that is still an
> >> possibility and could you get a few libraries into that? In particular
> that
> >> would be Jmol, Astex, and probably one or two other Jar files. That
> would
> >> make the BioJava checkout process much smoother and not require a
> developer
> >> to manually install jars for full functionality.
> >>
> >> * We have a couple of modules that are fragmented and broken. This is
> due to
> >> historic leftovers from when we started the re-factoring process. If all
> the
> >> functionality has been moved into the new biojava3-core module, I would
> vote
> >> for removing the modules starting with sequence*
> >>
> >> Andreas
> >>
> >>
> >> --
> >> -----------------------------------------------------------------------
> >> Dr. Andreas Prlic
> >> Senior Scientist, RCSB PDB Protein Data Bank
> >> University of California, San Diego
> >> (+1) 858.246.0526
> >> -----------------------------------------------------------------------
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> > --
> > Andrew Yates                   Ensembl Genomes Engineer
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >
> >
> >
> >
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------


From holland at eaglegenomics.com  Fri Apr 30 15:50:52 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 30 Apr 2010 16:50:52 +0100
Subject: [Biojava-dev] biojava SVN
In-Reply-To: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>
References: <u2o59a41c431004300844nc419c2cey235efb2f9d3fc7f2@mail.gmail.com>
	<475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk>
Message-ID: <CFB078AF-778C-4271-AEC7-7EFB17FA8E17@eaglegenomics.com>

Could a small MVN repo be set up at OBF?

On 30 Apr 2010, at 16:48, Andy Yates wrote:

> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on?
> 
> Andy
> 
> On 30 Apr 2010, at 16:44, Andreas Prlic wrote:
> 
>> Hi,
>> 
>> The BioJava SVN has not been fully compiling ever since the Hackathon. I
>> guess things were quite in flux the last months and it is now time to make
>> sure SVN fully compiles again.  There is a few things we need to figure out
>> in order for that:
>> 
>> * Jar files for libraries that are not in a public Maven repository. Jules :
>> at some point you indicated that we might be able to get such jar files
>> hosted by the EBI Maven repository. Do you think that is still an
>> possibility and could you get a few libraries into that? In particular that
>> would be Jmol, Astex, and probably one or two other Jar files. That would
>> make the BioJava checkout process much smoother and not require a developer
>> to manually install jars for full functionality.
>> 
>> * We have a couple of modules that are fragmented and broken. This is due to
>> historic leftovers from when we started the re-factoring process. If all the
>> functionality has been moved into the new biojava3-core module, I would vote
>> for removing the modules starting with sequence*
>> 
>> Andreas
>> 
>> 
>> -- 
>> -----------------------------------------------------------------------
>> Dr. Andreas Prlic
>> Senior Scientist, RCSB PDB Protein Data Bank
>> University of California, San Diego
>> (+1) 858.246.0526
>> -----------------------------------------------------------------------
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/