[Biojava-dev] fetching obsolete/superseding files
Spencer Bliven
sbliven at ucsd.edu
Wed Apr 27 01:58:03 UTC 2011
Amr,
Try checking idStatus again now. The latest PDB website version just went
into production this afternoon. I currently see
<record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
replacedBy="8CAT 7CAT"/>
I merged in the code you sent me a few days ago for PDBFileReader and for
the caching in PDBStatus. I didn't switch PDBStatus from SAX to DOM because
I had already fixed that bug in another way by the time I got your code
(thanks for pointing it out). I also added methods to AtomCache to match the
setFetch* methods in PDBFileReader. I wrote some tests in TestAtomCache and
it seems to be working great.
Thanks for your contributions!
-Spencer
On Tue, Apr 26, 2011 at 2:55 AM, Amr AL-Hossary
<amr_alhossary at hotmail.com>wrote:
> The bug was fixed per "replaces", but "replacedBy" is not yet fixed.
> Here is current result
>
>
> <idStatus>
> <record structureId="1HHB" status="OBSOLETE" replacedBy="4HHB"/>
> <record structureId="2HHB" status="CURRENT" replaces="1HHB"/>
> <record structureId="3HHB" status="CURRENT" replaces="1HHB"/>
> <record structureId="4HHB" status="CURRENT" replaces="1HHB"/>
> <record structureId="1CAT" status="OBSOLETE" replacedBy="8CAT"/>
> <record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
> replacedBy="8CAT"/>
>
> <record structureId="7CAT" status="CURRENT" replaces="3CAT"/>
> <record structureId="8CAT" status="CURRENT" replaces="3CAT"/>
> <record structureId="1KSA" status="OBSOLETE" replacedBy="3ENI"/>
> <record structureId="3ENI" status="CURRENT" replaces="1M50 1KSA"/>
> <record structureId="1M50" status="OBSOLETE" replacedBy="3ENI"/>
> </idStatus>
>
> Did you receive my previous mail, Dr. Andreas?
>
> Amr
>
> --------------------------------------------------
> From: "Amr AL-Hossary" <amr_alhossary at hotmail.com>
> Sent: Tuesday, April 26, 2011 5:03 AM
> To: "Spencer Bliven" <sbliven at ucsd.edu>; "Andreas Prlic" <andreas at sdsc.edu
> >
> Cc: <biojava-dev at lists.open-bio.org>
>
> Subject: Re: [Biojava-dev] fetching obsolete/superseding files
>
> Thanks Spencer,
>> This explains a lot.
>> This way, the current implementation you provided is right and the
>> recursion flag is totally right.
>>
>> No I don't have write access yet, but Dr. Andreas had promised me to grant
>> me the right access after my 2nd participation.
>>
>> the list of status messages come from looking at the internals of the PDB
>>> website
>>>
>> Do you have access to the Webservice implementation?
>>
>> Amr
>>
>>
>> From: Spencer Bliven
>> Sent: Tuesday, April 26, 2011 1:53 AM
>> To: Andreas Prlic
>> Cc: Amr AL-Hossary ; biojava-dev at lists.open-bio.org
>> Subject: Re: [Biojava-dev] fetching obsolete/superseding files
>>
>>
>> Hey all,
>>
>> I think we are converging on a consistent model of PDB precedence. This
>> was obscured previously by the bug in how the idStatus page listed only a
>> single 'replacedBy' entry. Andreas has fixed this and it should go live
>> tomorrow. I'll write some unit tests and put update biojava at the same
>> time. Here is how things will work:
>>
>> PDB supersessions form a directed acyclic graph, where edges point from
>> an obsolete ID to the entry that directly superseded it. Each record
>> contained by idStatus contains a "replaces" attribute, which consists of a
>> space-delimited list of incoming edges, and a "replacedBy" attribute, which
>> consists of a space-delimited list of outgoing edges. Two examples:
>>
>> <idStatus>
>> <record structureId="1CAT" status="OBSOLETE" replacedBy="3CAT"/>
>> <record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
>> replacedBy="8CAT 7CAT"/>
>> <record structureId="7CAT" status="CURRENT" replaces="3CAT"/>
>> <record structureId="8CAT" status="CURRENT" replaces="3CAT"/>
>>
>> <record structureId="1KSA" status="OBSOLETE" replacedBy="3ENI"/>
>> <record structureId="3ENI" status="CURRENT" replaces="1M50 1KSA"/>
>> <record structureId="1M50" status="OBSOLETE" replacedBy="3ENI"/>
>> </idStatus>
>>
>> The non-recursive versions of getReplaces/getReplacement just get the
>> incoming/outgoing edges for a single node and require only a single REST
>> query. The recursive versions will do a depth-first search up/down the tree
>> and return a list of all nodes reached.
>>
>> Finally, the getCurrent() method should consistently return a single PDB
>> ID from among the results of recursive-getReplacement. To be consistent with
>> the old REST implementation, this will be the PDB ID that occurs last
>> alphabetically. Thus getCurrent(1HHB) will give 4HHB rather than 2HHB or
>> 3HHB, getCurrent(1CAT) will give 8CAT, and getCurrent(7CAT) will give 7CAT.
>>
>> Amr, I understand what you were thinking with the getNewestCurrent
>> method. It is appealing to think of 4HHB as the representative for all four
>> structures. However, there is a good reason that 2HHB and 3HHB are still
>> marked as current, and I think it is misleading to include a method that
>> favors 4HHB over other current IDs because it is alphabetically higher. We
>> should probably leave this method out of biojava.
>>
>>
>> Does anything seems wrong about this model of supersession? In
>> particular, does this address your question about the need for the recursion
>> flag, Amr? My plan is to commit the biojava changes shortly. Amr, do you
>> mind if I merge in your patch with the caching and PDBFileReader updates (Do
>> you have write access to SVN?)? Great code there!
>>
>> Finally, the list of status messages come from looking at the internals
>> of the PDB website. I haven't come across any examples of them myself to
>> test with. Many seem to be temporary statuses, for publication holds and the
>> like. I'm content to ignore them until someone requests something specific.
>>
>> -Spencer
>>
>>
>>
>> On Mon, Apr 25, 2011 at 2:22 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>
>> Hi Amr,
>>
>>
>> > And any way, the webservice returns only ONE PDB ID max per record
>> (please
>> > inspect the result returned by this query
>> > http://www.rcsb.org/pdb/rest/idStatus?structureId=1HHB,2HHB,3HHB,4HHB).
>>
>>
>> I believe that is a bug, I just fixed this and it should become
>> available with tomorrows web site update (around 00UTC).
>>
>>
>> > This way, I believe the best way to get the most recent ID is getting
>> the
>> > isReplacedBy attribute of the record of superseded record (e.g. from
>> 3HHB to
>> > 1HHB and then from 1HHB to 4HHB).
>>
>>
>> hope this will be simpler with the updated URL response ...
>>
>>
>> Andreas
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>>
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
More information about the biojava-dev
mailing list