[Templates] Plugins and Unicode confusion

Trond Michelsen trondmm-tt@crusaders.no
Fri, 17 Feb 2006 01:32:33 +0100


On Fri, Feb 17, 2006 at 12:56:21AM +0100, Bernhard Graf wrote:
> On Friday 17 February 2006 00:28, Tatsuhiko Miyagawa wrote:
> > On 2/16/06, Bernhard Graf <tt@augensalat.de> wrote:
>>>> Yeah, POSIX.
>>> Then why does this work:
>>> perl -MPOSIX -e 'print POSIX::strftime("%B",0,0,0,1,2,106), "\n";'
>>> März
>>
>> Because that's latin-1 string which is okay to print out to terminal,
>> individually. Problem occurs when you concatinate latin-1 bytes and
>> utf-8 bytes.
> 
> No. My terminal uses utf-8 too.See my first posting.
> 
> "März" from within the utf-8 encoded template is printed OK, while März 
> from T::P::Date is displayed broken: My utf-8 terminal thinks latin 
> chars "är" (two bytes) is one utf-8 char.

That's a bit odd, not just because "är" is an illegal utf-8 sequence,
but "ä" should signal a three-byte utf-8 character.

>> Maybe you could try de_DE.UTF-8 if your system has that locale.
> ~> echo $LANG
> de_DE.UTF-8

would you mind piping your output through od, just to verify?

$ LC_ALL=de_DE.UTF-8 perl -MPOSIX -le 'print POSIX::strftime("%B",0,0,0,1,2,106)' | od -t c
0000000   M   Ã   ¤   r   z  \n
0000006

$ LC_ALL=de_DE.UTF-8 perl -MPOSIX -le 'print POSIX::strftime("%B",0,0,0,1,2,106)' | od -t x1
0000000 4d c3 a4 72 7a 0a
0000006

$ LC_ALL=de_DE.ISO-8859-1 perl -MPOSIX -le 'print POSIX::strftime("%B",0,0,0,1,2,106)' | od -t c 
0000000   M   ä   r   z  \n
0000005

$ LC_ALL=de_DE.ISO-8859-1 perl -MPOSIX -le 'print POSIX::strftime("%B",0,0,0,1,2,106)' | od -t x1
0000000 4d e4 72 7a 0a
0000005

-- 
Trond Michelsen