Daniel Swarbrick
2011-05-31 13:23:35 UTC
I can almost hear the collective sigh as this topic once again rears
up ;-)
I am currently developing an app with Django that uses the SQLite
backend, and I noticed that Django stores DateTime fields as naive
(eg. non TZ-aware), local timestamps, making these databases non-
portable to servers running in different timezones.
I know this problem has been discussed several times before, but there
seems to have been some confusion in the past about how Postgres
stores TIMESTAMP values. Postgres's documentation states:
"For TIMESTAMP WITH TIME ZONE, the internally stored value is always
in UTC... An input value that has an explicit time zone specified is
converted to UTC using the appropriate offset for that time zone. If
no time zone is stated in the input string, then it is assumed to be
in the time zone indicated by the system's [or client connection's]
timezone parameter, and is converted to UTC using the offset for the
timezone zone."
It goes on:
"When a TIMESTAMP WITH TIME ZONE value is output, it is always
converted from UTC to the current timezone zone, and displayed as
local time in that zone."
Ok, so we've established that although Postgres TIMESTAMP WITH TIME
ZONE is TZ-aware, the internal storage is UTC. This pretty much
follows the rule for filesystem timestamps too - convert everything
back to UTC for internal storage.
MySQL has two column types capable of storing a DateTime - the
DATETIME and TIMESTAMP types. The DATETIME type can be likened to a
Python naive datetime (or a Postgres TIMESTAMP (without time zone)),
whereas the MySQL TIMESTAMP type once again stores values internally
as UTC, and when values are retrieved (SELECTed), they are displayed
as a local time for the client's current timezone.
Let me demonstrate. This was done on a server running the UTC+12
timezone.
mysql> SELECT @@global.time_zone, @@session.time_zone;
+--------------------+---------------------+
| @@global.time_zone | @@session.time_zone |
+--------------------+---------------------+
| SYSTEM | SYSTEM |
+--------------------+---------------------+
mysql> create table foo ( col_dt datetime, col_ts timestamp );
mysql> insert into foo values (now(), now());
mysql> select * from foo;
+---------------------+---------------------+
| col_dt | col_ts |
+---------------------+---------------------+
| 2011-06-01 00:19:43 | 2011-06-01 00:19:43 |
+---------------------+---------------------+
Now, if I change the client session's timezone...
mysql> set session time_zone = "+02:00";
mysql> select * from foo;
+---------------------+---------------------+
| col_dt | col_ts |
+---------------------+---------------------+
| 2011-06-01 00:19:43 | 2011-05-31 14:19:43 |
+---------------------+---------------------+
The naive DATETIME value is unchanged, and technically now incorrect
for the client session's timezone. The TIMESTAMP value has been
correctly adjusted however, to represent the same point in time, but
in the client session's new timezone.
In django.db.backends.postgresql_psycopg2.creation, we can see that a
DateTime field type is mapped to a TZ-aware "timestamp with time zone"
Postgres column type. However, in django.db.backends.mysql.creation,
we can see that the DateTime field type is mapped to a naive
"datetime" MySQL column type. Why is this?
Ok, the MySQL documentation states that a DATETIME column type can
store dates in the range '1000-01-01' to '9999-12-31'. However the
TIMESTAMP column type can only store dates in the range '1970-01-01
00:00:01' UTC to '2038-01-19 03:14:07' UTC, like a true Unix epoch
timestamp (seemingly regardless whether it's on 32 bit or 64 bit
platform).
Postgres on the other hand (even when using a "timestamp with time
zone") has a broader range of 4713 BC to 294276 AD. Ok, so it's fairly
obvious, MySQL's timestamp support is inferior to that of Postgres.
Let's not get sidetracked though.
Coming back to the original topic of timestamps in SQLite... One only
has to Google for "sqlite timestamp timezone", to see how often
developers are bitten by naive timestamp storage - even on iPhones! It
seems that a lot of apps store datetimes in SQLite as local, naive
values, assuming that that DB is never going to be moved to a
different timezone.
The way I see it, there are a few options for storage of timestamps in
SQLite (whose docs clearly acknowledge that it does not officially
support a timestamp column type).
1. Store timestamps as a Unix epoch value (integer). This is fast, and
will ensure correct date sorting. On the other hand, it's not really
human-friendly if manually browsing table data in SQLite.
2. Store timestamps as a string, but converted to UTC, eg. "2011-05-31
12:19:43". Since the TZ is not obvious from that string, Django would
need to be aware that it was UTC, and apply the appropriate offset to
convert to a local time. This means adding a dependency like pytz to
Django.
3. Store timestamps as a string, but append a timezone, eg.
"2011-06-01 00:19:43 +12:00". This would appear to be the safest,
least disruptive solution, since apps that don't use multiple
timezones would not notice any difference. The Python datetime values
would still be in the local timezone, but would no longer be naive.
Apps that needed to work with multiple timezones would now have the
information required to convert these local times back to a UTC value,
which is a lot safer for doing date arithmetic (especially when
crossing daylight saving transitions). One caveat however - SQL
sorting of timestamps that used a variety of UTC offsets would not
necessarily be correct, since it will be a simply string-sorting
algorithm.
Incidentally, "2011-05-31 12:19:43 +02:00" is not necessarily the same
as "2011-05-31 12:19:43" in "Europe/Berlin" timezone. The Olson/tzdata
timezone names are more than just a UTC offset - they also infer
whether daylight saving is active.
In the absence of SQLite handling the conversion and internal storage
as UTC itself automatically, I think it's important that Django add
that crucial UTC offset information to the timestamp-strings, or store
all timestamps as UTC. At least then the information is normalized,
and individual app developers can decide if they want to add full
timezone conversion machinery to their apps.
up ;-)
I am currently developing an app with Django that uses the SQLite
backend, and I noticed that Django stores DateTime fields as naive
(eg. non TZ-aware), local timestamps, making these databases non-
portable to servers running in different timezones.
I know this problem has been discussed several times before, but there
seems to have been some confusion in the past about how Postgres
stores TIMESTAMP values. Postgres's documentation states:
"For TIMESTAMP WITH TIME ZONE, the internally stored value is always
in UTC... An input value that has an explicit time zone specified is
converted to UTC using the appropriate offset for that time zone. If
no time zone is stated in the input string, then it is assumed to be
in the time zone indicated by the system's [or client connection's]
timezone parameter, and is converted to UTC using the offset for the
timezone zone."
It goes on:
"When a TIMESTAMP WITH TIME ZONE value is output, it is always
converted from UTC to the current timezone zone, and displayed as
local time in that zone."
Ok, so we've established that although Postgres TIMESTAMP WITH TIME
ZONE is TZ-aware, the internal storage is UTC. This pretty much
follows the rule for filesystem timestamps too - convert everything
back to UTC for internal storage.
MySQL has two column types capable of storing a DateTime - the
DATETIME and TIMESTAMP types. The DATETIME type can be likened to a
Python naive datetime (or a Postgres TIMESTAMP (without time zone)),
whereas the MySQL TIMESTAMP type once again stores values internally
as UTC, and when values are retrieved (SELECTed), they are displayed
as a local time for the client's current timezone.
Let me demonstrate. This was done on a server running the UTC+12
timezone.
mysql> SELECT @@global.time_zone, @@session.time_zone;
+--------------------+---------------------+
| @@global.time_zone | @@session.time_zone |
+--------------------+---------------------+
| SYSTEM | SYSTEM |
+--------------------+---------------------+
mysql> create table foo ( col_dt datetime, col_ts timestamp );
mysql> insert into foo values (now(), now());
mysql> select * from foo;
+---------------------+---------------------+
| col_dt | col_ts |
+---------------------+---------------------+
| 2011-06-01 00:19:43 | 2011-06-01 00:19:43 |
+---------------------+---------------------+
Now, if I change the client session's timezone...
mysql> set session time_zone = "+02:00";
mysql> select * from foo;
+---------------------+---------------------+
| col_dt | col_ts |
+---------------------+---------------------+
| 2011-06-01 00:19:43 | 2011-05-31 14:19:43 |
+---------------------+---------------------+
The naive DATETIME value is unchanged, and technically now incorrect
for the client session's timezone. The TIMESTAMP value has been
correctly adjusted however, to represent the same point in time, but
in the client session's new timezone.
In django.db.backends.postgresql_psycopg2.creation, we can see that a
DateTime field type is mapped to a TZ-aware "timestamp with time zone"
Postgres column type. However, in django.db.backends.mysql.creation,
we can see that the DateTime field type is mapped to a naive
"datetime" MySQL column type. Why is this?
Ok, the MySQL documentation states that a DATETIME column type can
store dates in the range '1000-01-01' to '9999-12-31'. However the
TIMESTAMP column type can only store dates in the range '1970-01-01
00:00:01' UTC to '2038-01-19 03:14:07' UTC, like a true Unix epoch
timestamp (seemingly regardless whether it's on 32 bit or 64 bit
platform).
Postgres on the other hand (even when using a "timestamp with time
zone") has a broader range of 4713 BC to 294276 AD. Ok, so it's fairly
obvious, MySQL's timestamp support is inferior to that of Postgres.
Let's not get sidetracked though.
Coming back to the original topic of timestamps in SQLite... One only
has to Google for "sqlite timestamp timezone", to see how often
developers are bitten by naive timestamp storage - even on iPhones! It
seems that a lot of apps store datetimes in SQLite as local, naive
values, assuming that that DB is never going to be moved to a
different timezone.
The way I see it, there are a few options for storage of timestamps in
SQLite (whose docs clearly acknowledge that it does not officially
support a timestamp column type).
1. Store timestamps as a Unix epoch value (integer). This is fast, and
will ensure correct date sorting. On the other hand, it's not really
human-friendly if manually browsing table data in SQLite.
2. Store timestamps as a string, but converted to UTC, eg. "2011-05-31
12:19:43". Since the TZ is not obvious from that string, Django would
need to be aware that it was UTC, and apply the appropriate offset to
convert to a local time. This means adding a dependency like pytz to
Django.
3. Store timestamps as a string, but append a timezone, eg.
"2011-06-01 00:19:43 +12:00". This would appear to be the safest,
least disruptive solution, since apps that don't use multiple
timezones would not notice any difference. The Python datetime values
would still be in the local timezone, but would no longer be naive.
Apps that needed to work with multiple timezones would now have the
information required to convert these local times back to a UTC value,
which is a lot safer for doing date arithmetic (especially when
crossing daylight saving transitions). One caveat however - SQL
sorting of timestamps that used a variety of UTC offsets would not
necessarily be correct, since it will be a simply string-sorting
algorithm.
Incidentally, "2011-05-31 12:19:43 +02:00" is not necessarily the same
as "2011-05-31 12:19:43" in "Europe/Berlin" timezone. The Olson/tzdata
timezone names are more than just a UTC offset - they also infer
whether daylight saving is active.
In the absence of SQLite handling the conversion and internal storage
as UTC itself automatically, I think it's important that Django add
that crucial UTC offset information to the timestamp-strings, or store
all timestamps as UTC. At least then the information is normalized,
and individual app developers can decide if they want to add full
timezone conversion machinery to their apps.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.