[Biojava-l] Issue with SimpleNCBITaxon class
Deepak Sheoran
sheoran143 at gmail.com
Sun Apr 11 22:48:00 UTC 2010
If we don't want to change the current code in biojava and still want to
fix this bug I have found a way,
1) we can do this by changing one of hibernate files called
"Taxon.hbm.xml" and replace the line
<property name="parentNCBITaxID" column="parent_taxon_id"/>
with
<property name="parentNCBITaxID" formula="(select tax.ncbi_taxon_id from
taxon tax where tax.taxon_id = parent_taxon_id)"/>
by changing the above setting in hibernate setting I am able to get the
correct linage for ncbi_taxon_id = 11876(Avian sarcoma virus) which is
Viruses; Retro-transcribing viruses; Retroviridae;
Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus.
2) But the possible issue which we might get is with Taxonomy loader
class which want to insert something for parent taxon_id into taxon
table which I think won't be possible if we do this change to hibernate
con-fig file.
Deepak Sheoran
On 4/11/2010 4:08 PM, Deepak Sheoran wrote:
> I am using same table with biojava and bioperl taxon program and the
> output I get is below:
>
> *Biojava:*
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the
> lineage i get is
> Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia
> australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum
> var. haydenii.
>
> Biojava process of finding names:
> 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240
> (wrong way of doing things)
>
> *Bioperl:*
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the
> lineage i get is
> Retroviridae; Orthoretrovirinae; Alpharetrovirus;
> unclassified Alpharetrovirus.
>
> Bioperl process of finding names:
> 11876==>353825==>153057==>327045==>11632 (Right way of doing things)
>
> Hint: biojava search ncbi_taxon_id column with a value from
> parent_taxon_id where bioperl search taxon_id column with a value from
> parent_taxon_id.
>
> *Taxon and Taxon_name Table content which is being relevant in
> discussion:*
>
> taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class
> 2901 3609 276240 genus Rhamnus scientific name
> 3610 4403 3609 species Platanus occidentalis scientific name
> 29052 48579 4403 species Suillus placidus scientific name
> 114412 143975 48579 species Diadasia australis scientific name
> 143976 176516 143975 species Arnicastrum guerrerense scientific name
> 30680 50447 176516 family Labiduridae scientific name
> 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii
> scientific name
> 9394 11632 17394 family Retroviridae scientific name
> 277861 327045 9394 subfamily Orthoretrovirinae scientific name
> 122448 153057 277861 genus Alpharetrovirus scientific name
> 301952 353825 122448 no rank unclassified Alpharetrovirus
> scientific name
> 9584
> 11876
> 301952
> species
> Avian sarcoma virus
> scientifice name
>
>
> Thanks
> Deepak
>
> On 4/11/2010 2:53 PM, Richard Holland wrote:
>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>>
>> thanks,
>> Richard
>>
>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>>
>>
>>> Hi,
>>>
>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>>
>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>>
>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>> <property name="nodeRank" column="node_rank"/>
>>> <property name="geneticCode" column="genetic_code"/>
>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>> <property name="leftValue" column="left_value"/>
>>> <property name="rightValue" column="right_value"/>
>>> <property name="parentNCBITaxID" column="parent_taxon_id"/> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>>
>>> Thanks
>>> Deepak Sheoran
>>>
>>>
>>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
More information about the Biojava-l
mailing list