[Be-devel] storage of time information is local dependent

spam

2014-01-25 15:24:34 UTC

Dear All,

I'm completely new to BE, but am very interesting in getting it to work
as I really like the concept.
I tried the latest release (1.1.1) a few weeks ago, but after getting
database corruption, I decided to try the latest from git (commit 49808..).

I immediately got problem though, when trying "be init", followed by "be
new test1":
C:\utv\eget\t>be new test1
ERROR:
'utf8' codec can't decode byte 0xf6 in position 1: invalid start byte
You should set a locale that supports unicode, e.g.
export LANG=en_US.utf8
See http://docs.python.org/library/locale.html for details

as you can see from my command prompt, I tried this on Windows.
Win7/64-bit with Swedish locale to be more specific.

After some research, I found the offending part it in
libbe/util/utility.py, line 105:
RFC_2822_TIME_FMT = "%a, %d %b %Y %H:%M:%S +0000"

this is later used in
def time_to_str(time_val):

I believe that the problem is that on Windows, the C library used by
python emits an ansi-encoded string. The %a format gives the Swedish
abbreviation which is "l?" today. Later on, the json serialization has
problem with this.
My first thought was to try to get a unicode string instead of a more or
less arbitrary encoded string, but after digging around a little more, I
found out that the generated string is actually used in the stored database:
C:\utv\eget\t>grep time .be -r
.be/6a36cd03-4a5f-44a0-ba9c-c5bf3dbd7add/bugs/d23969fb-3ae2-4e7e-ac7e-5abec0b1b32a/values:
"time": "25 jan 2014 15:03:59 +0000"
.be/6a36cd03-4a5f-44a0-ba9c-c5bf3dbd7add/bugs/f4c10efd-d4c2-4fe7-8630-0b8c998eb46d/values:
"time": "25 jan 2014 13:24:13 +0000"
[I patched the RFC_2822_TIME_FMT format string to get be running, by
removing the %a]

My main question after the somewhat lengthy explanation is if it really
is by intention that the serialized format for dates should be locale
dependent? My guess is that it could cause problem for users running
different locales, who are accessing the same data? My suggestion would
be to stick to the offset_since_epoch format (in UTC) for storage, and
use the formatted version solely for user interaction.

As I'm completely new to BE, I hesitate a little before trying to
suggest a patch which changes the internal storage format. But if above
seems like a good idea and maybe if I could get some pointers/ideas from
someone more experienced with the codebase, I could try to make a fix
for the above problem.

Sidenote: I see an email on Nov 25, "Crash when showing a bug in a
different locale", which I suspect could be because of the same problem.

Best Regards,
Jesper