[BioSQL-l] parent_taxon_id of a root node
Hilmar Lapp
hlapp at gmx.net
Sat Nov 15 18:34:45 UTC 2008
Sorry Peter - it looks like this slipped my attention (Oct was crazy).
Thanks for raising it again. I agree with you, this looks like a bug.
Would you mind filing it?
It's possible that has secretly been assumed as policy and hence led
to some people identifying the root node by equating parent and
taxon_id, but surely this sounds like the wrong way of doing it, so it
deserves fixing.
-hilmar
On Nov 14, 2008, at 3:48 PM, Peter wrote:
> On Fri, Oct 3, 2008m, I wrote:
>>
>> Hello all,
>>
>> I was puzzled to find the BioSQL script load_ncbi_taxonomy.pl will
>> set
>> the parent_taxon_id of the NCBI root node in the taxon table to point
>> to itself. I would have expected this to be NULL indicating no
>> parent. If someone is using the database directly, extracting a
>> lineage could trigger an infinite loop. Can anyone explain the
>> rational here?
>>
>> Note that when Biopython adds entries to the taxon table, it uses
>> NULL
>> for a root node. When retrieving sequences from a BioSQL database,
>> Biopython does cope with a root node with a NULL parent or a
>> self-parent - would it safe to assume BioPerl and Java can also cope
>> with both situations?
>>
>> Thanks,
>>
>> Peter
>>
>
> Hi again,
>
> I thought I'd raise this question again (as I didn't see any response
> last time), as I've just been bitten by the self-parent taxon problem
> this afternoon. This was for a simple webfront end to part of a
> BioSQL database using SQLAlchemy in python - but that's not important.
>
> I was using a simple loop to build up lineages, which was working fine
> until I ran load_ncbi_taxonomy.pl and suddenly my program seemed to
> just time out. I'd forgotten about the self-parent root nodes used by
> load_ncbi_taxonomy.pl which had triggered an infinite loop.
>
> I hit another (less serious) problem stemming for these self-parent
> root nodes when I wanted to generate a list of sub-lineages (child
> entries), essentially:
>
> SELECT * FROM taxon WHERE parent_taxon_id=12345;
>
> When calling this on a root node, I had to modify this to explicitly
> exclude itself from the children:
>
> SELECT * FROM taxon WHERE parent_taxon_id=12345 AND taxon_id<>12345;
>
> So to repeat my earlier question, is there a reason why
> parent_taxon_id isn't just NULL for root nodes? Was this a deliberate
> design choice - because if not, I think this could be regarded as a
> bug in load_ncbi_taxonomy.pl.
>
> Thanks
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list