[Biojava-l] Issue with SimpleNCBITaxon class
Deepak Sheoran
sheoran143 at gmail.com
Sun Apr 11 21:08:22 UTC 2010
I am using same table with biojava and bioperl taxon program and the
output I get is below:
*Biojava:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage
i get is
Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia
australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum
var. haydenii.
Biojava process of finding names:
11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240
(wrong way of doing things)
*Bioperl:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage
i get is
Retroviridae; Orthoretrovirinae; Alpharetrovirus;
unclassified Alpharetrovirus.
Bioperl process of finding names:
11876==>353825==>153057==>327045==>11632 (Right way of doing things)
Hint: biojava search ncbi_taxon_id column with a value from
parent_taxon_id where bioperl search taxon_id column with a value from
parent_taxon_id.
*Taxon and Taxon_name Table content which is being relevant in discussion:*
taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class
2901 3609 276240 genus Rhamnus scientific name
3610 4403 3609 species Platanus occidentalis scientific name
29052 48579 4403 species Suillus placidus scientific name
114412 143975 48579 species Diadasia australis scientific name
143976 176516 143975 species Arnicastrum guerrerense scientific name
30680 50447 176516 family Labiduridae scientific name
254757 301952 50447 varietas Oreostemma alpigenum var. haydenii
scientific name
9394 11632 17394 family Retroviridae scientific name
277861 327045 9394 subfamily Orthoretrovirinae scientific name
122448 153057 277861 genus Alpharetrovirus scientific name
301952 353825 122448 no rank unclassified Alpharetrovirus
scientific name
9584
11876
301952
species
Avian sarcoma virus
scientifice name
Thanks
Deepak
On 4/11/2010 2:53 PM, Richard Holland wrote:
> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead).
>
> thanks,
> Richard
>
> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>
>
>> Hi,
>>
>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it.
>>
>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
>>
>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>> <property name="nodeRank" column="node_rank"/>
>> <property name="geneticCode" column="genetic_code"/>
>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>> <property name="leftValue" column="left_value"/>
>> <property name="rightValue" column="right_value"/>
>> <property name="parentNCBITaxID" column="parent_taxon_id"/> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
>>
>> Thanks
>> Deepak Sheoran
>>
>>
>>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
More information about the Biojava-l
mailing list