[Bioperl-l] HomoloGene

Karger, Amir AKarger@CuraGen.com
Wed, 20 Feb 2002 08:51:48 -0500


> -----Original Message-----
> From: Andrew Macgregor [mailto:andrew@anatomy.otago.ac.nz]
>
> I'm just going by the readme. It looks to me that the unigene field 
> in either the first or the second organism can be blank.
> 
> >-The fourth (LocusLink ID), fifth (UniGene ID), and sixth (Accession
> >number) fields correspond to the first organism.  One or 
> both of UG ID
> >and LL ID may be present.  Locus Link and UniGene are in one-to-one
> >correspondence in the latter case, so no ambiguity arises through the
> >choice of set identifier.
> >-The seventh(LL), eighth(UG), and ninth(Accession) fields correspond
> >to the second organism.
> 
> This is the case in the example you give. It has LL and unigene for 
> org 1 but no accession number then LL but no unigene number or 
> accession number for org 2.

Oh, absolutely. I'm 100% fine with leaving a UG out. The example I gave was
supposed to demonstrate how most of the time when they left out an ID they
did exactly the right thing. It exactly follows the README, which says LL is
in 7, UG is in 8, and one or both may be empty. 

But let's put two rows from the file together, with a few extra spaces added
to make it more obvious. I line up the '|' characters, and...

Xl|Hs|t| |1091 |AB045628 |LL.9698 |153834   |D43951 |75.35
Xl|Dm|B| |1091 |AB045628 |        |LL.41094 |       |68.27

The first line follows the rules just fine. The second, though, puts the LL
ID into the column that the UG should be in. Why don't they put the LL into
the seventh column, and leave the eighth UG column blank? Or -- if there's
some reason that absolutely requires putting LL in column 8 -- can't they at
least tell us in the README that they'll do so?

Anyway, I guess this isn't terribly Bioperlish, except in terms of a warning
for writing parsers.

-Amir
CuraGen Corporation
 
LEGAL NOTICE - Unless expressly stated otherwise, this message is
confidential and may be privileged. It is intended for the addressee(s)
only. Access to this e-mail by anyone else is unauthorized. If you are not
an addressee, any disclosure or copying of the contents or any action taken
(or not taken) in reliance on it is unauthorized and may be unlawful. If you
are not an addressee, please inform the sender immediately.