[BioSQL-l] dates and terms
Hilmar Lapp
hlapp at gnf.org
Thu Mar 31 15:39:36 EST 2005
Bioperl-db stores these similarly, but the term is 'date_changed' which
basically comes from Bioperl's Bio::Seq::RichSeq.
You can compare these dates but it's hard to do so universally for a
search against the database. There is a scriptlet
scripts/biosql/update-on-new-date.pl in the bioperl-db repository that
shows a pretty straightforward approach for comparison. It uses
Date::Parse which does a nice job of detecting most date formats
automatically.
The formats being used are actually I believe not dramatically
different. In UniProt, they look like the following:
DT 01-NOV-1995 (Rel. 32, Created)
DT 01-OCT-1996 (Rel. 34, Last sequence update)
DT 28-FEB-2003 (Rel. 41, Last annotation update)
and these get stored as an array with the following elements
01-NOV-1995 (Rel. 32, Created)
01-OCT-1996 (Rel. 34, Last sequence update)
28-FEB-2003 (Rel. 41, Last annotation update)
Date::Parse will just ignore the non-date stuff in parentheses. I don't
know whether there's a similarly convenient library in Java.
In Oracle you can specify the date format when converting. So, the
following would take everything up to the first space character and
convert it assuming the format used above:
1 select to_date(decode(instr('01-NOV-1995 (Rel. 32, Created)',' '),
2 0,
3 '01-NOV-1995 (Rel. 32, Created)',
4 substr('01-NOV-1995 (Rel. 32, Created)',
5 1,
6 instr('01-NOV-1995 (Rel. 32,
Created)',
7 ' ')
8 )
9 ),
10 'dd-mon-yyyy')
11* from dual
SQL> /
TO_DATE(DECODE(IN
-----------------
11/01/95 00:00:00
1 row selected.
The DECODE() protects from cases when there is nothing following the
date.
If this looks too messy, hide it in a function:
1 CREATE OR REPLACE
2 FUNCTION biosql_to_date(qual_value IN VARCHAR2,
3 date_format IN VARCHAR2 DEFAULT
'dd-mon-yyyy')
4 RETURN DATE
5 IS
6 spacepos INTEGER;
7 BEGIN
8 spacepos := INSTR(qual_value,' ');
9 IF spacepos = 0 THEN
10 RETURN TO_DATE(qual_value,date_format);
11 END IF;
12 RETURN TO_DATE(SUBSTR(qual_value,1,spacepos),
13 date_format);
14* END;
SQL> /
Function created.
Elapsed: 00:00:00.60
SQL> select biosql_to_date('01-NOV-1995 (Rel. 32, Created)') from dual;
BIOSQL_TO_DATE('0
-----------------
11/01/95 00:00:00
1 row selected.
Elapsed: 00:00:00.01
SQL> select biosql_to_date('01-NOV-1995') from dual;
BIOSQL_TO_DATE('0
-----------------
11/01/95 00:00:00
1 row selected.
Elapsed: 00:00:00.01
Obviously, if you query using this or a similar function, the query
optimizer will do a full table scan and not use an index on
bioentry_qualifier_value. However, you can create a function index on
bioentry_qualifier_value.value using the above function, and queries
using the same function will then be indexed. In that case you would
need to make a small amendment to the function above by catching the
exception that results from parsing strings that aren't dates and then
return NULL instead. (Oracle does not index NULLs. Unlike in
PostgreSQL, you cannot have partial indexes in Oracle AFAIK, i.e., in
Oracle the index creation statement cannot contain a WHERE clause.)
Does this help?
-hilmar
On Mar 31, 2005, at 1:07 AM, mark.schreiber at novartis.com wrote:
> Hello -
>
> Many records that might be stored in BioSQL have associated date
> fields.
> Biojava stores these as value in bioentry_qualifier_value with the
> term_id
> pointing to the Term for date.
>
> This seems to place a serious limitation on searching by date. I would
> like to be able to search for sequences entered between X and Y or
> before
> X etc. Has anyone come up with a workaround for date operations on
> VarChar2 or Strings?
>
> Thanks
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax +65 6722 2910
>
>
> ______________________________________________________________________
> The Novartis email address format has changed to
> firstname.lastname at novartis.com. Please update your address book
> accordingly.
> ______________________________________________________________________
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list