[Templates] Patch: Full unicode support for TT under 5.8

Mark Fowler mark@twoshortplanks.com
Sun, 27 Jun 2004 14:49:57 +0100 (BST)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

--1388560483-641160645-1088344197=:8700
Content-Type: TEXT/PLAIN; charset=US-ASCII

Hello List.

Yes, I've not been reading this list for a long time now.  Bad me.
Doesn't mean I've not been working at TT though...

Attached is a patch (and a test which tests the patch) that allows
the Template Toolkit to work properly with Unicode in perl 5.8.  This
means that:

 a) As long as you put a BOM at the start of the document, you can now
    have proper Unicode templates.  Using the correct BOM means that your
    templates can be encoded in any of UTF-8, UTF-16 (either byte order)
    or UTF-32 (either byte order) and TT will now Do The Right Thing when
    it comes to automatically decoding your input.

 b) As a consequence of this your Unicode templates are now truly
    Unicode, not just a string of bytes.  Things like detecting the length
    of the string will work.  Concatenating the result of the template
    with a Perl string that contains chars higher than 255 will now
    work properly (previously as Perl was naive that template actually
    already contained the byte sequences for Unicode it would promote
    each byte in the template to it's corresponding Latin-1 char rather
    than amalgamating each sequence into one char as it should have.)

 c) No matter how you got Unicode into your complied template (be it from
    the original template file or from a utf8 flagged constant from
    NAMESPACES) cached templates that contain chars over 255 will be
    written to the disk in utf8 and the 'utf8' pragma will be prepended.
    This stops the annoying problem that the first apache process would
    load and continue to process utf8 data correctly but all other
    processes would load the cached template from disk and incorrectly
    assume that the bytes in it were Latin-1 chars rather than utf8 byte
    sequences and render the output badly.

Can someone please check this over for me?  It's now early Sunday
afternoon and I was playing around with this in the small hours of the
morning (pesky jet-lag) so there may be a bucket-load of errors in here.

In particular, could someone still running Perl 5.6 and Perl 5.005 in
anger check that this patch doesn't cause problems for them?  And if
anyone's got any nice juicy real world benchmarks, I'd love to see how
much in practice this slows down someone using just Latin-1 templates (it
shouldn't do too much - it just has to check that there's no BOM on the
front)

Mark.

-- 
#!/usr/bin/perl -T
use strict;
use warnings;
print q{Mark Fowler, mark@twoshortplanks.com, http://twoshortplanks.com/};
--1388560483-641160645-1088344197=:8700
Content-Type: TEXT/plain; name="unicode.diff"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.55.0406271449570.8700@gan.twoshortplanks.com>
Content-Description: 
Content-Disposition: attachment; filename="unicode.diff"

SW5kZXg6IGxpYi9UZW1wbGF0ZS9Qcm92aWRlci5wbQ0KPT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PQ0KUkNTIGZpbGU6IC90ZW1wbGF0ZS10b29sa2l0L1RlbXBs
YXRlMi9saWIvVGVtcGxhdGUvUHJvdmlkZXIucG0sdg0KcmV0cmlldmluZyBy
ZXZpc2lvbiAyLjgwDQpkaWZmIC11IC1yMi44MCBQcm92aWRlci5wbQ0KLS0t
IGxpYi9UZW1wbGF0ZS9Qcm92aWRlci5wbQkyMDA0LzAxLzMwIDE5OjMyOjI4
CTIuODANCisrKyBsaWIvVGVtcGxhdGUvUHJvdmlkZXIucG0JMjAwNC8wNi8y
NyAxMzoxMDo1Mw0KQEAgLTYyOCw2ICs2MjgsNyBAQA0KICAgICAgICAgZWxz
aWYgKHJlZiAkbmFtZSkgew0KICAgICAgICAgICAgICMgLi4ub3IgYSBHTE9C
IG9yIGZpbGUgaGFuZGxlLi4uDQogICAgICAgICAgICAgbXkgJHRleHQgPSA8
JG5hbWU+Ow0KKyAgICAgICAgICAgICR0ZXh0ID0gJHNlbGYtPl9kZWNvZGUo
JHRleHQpIGlmICRdID4gNS4wMDc7DQogICAgICAgICAgICAgJGRhdGEgPSB7
DQogICAgICAgICAgICAgICAgIG5hbWUgPT4gZGVmaW5lZCAkYWxpYXMgPyAk
YWxpYXMgOiAnaW5wdXQgZmlsZSBoYW5kbGUnLA0KICAgICAgICAgICAgICAg
ICB0ZXh0ID0+ICR0ZXh0LA0KQEAgLTYzOCw2ICs2MzksNyBAQA0KICAgICAg
ICAgZWxzaWYgKC1mICRuYW1lKSB7DQogICAgICAgICAgICAgaWYgKG9wZW4o
RkgsICRuYW1lKSkgew0KICAgICAgICAgICAgICAgICBteSAkdGV4dCA9IDxG
SD47DQorICAgICAgICAgICAgICAgICR0ZXh0ID0gJHNlbGYtPl9kZWNvZGUo
JHRleHQpIGlmICRdID4gNS4wMDc7DQogICAgICAgICAgICAgICAgICRkYXRh
ID0gew0KICAgICAgICAgICAgICAgICAgICAgbmFtZSA9PiAkYWxpYXMsDQog
ICAgICAgICAgICAgICAgICAgICBwYXRoID0+ICRuYW1lLA0KQEAgLTk2Niw2
ICs5NjgsNTcgQEANCiAJfQ0KICAgICB9DQogfQ0KKw0KKyMtLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0NCisjIF9kZWNvZGUNCisjDQorIyBEZWNvZGVz
IGVuY29kZWQgdW5pY29kZSB0ZXh0IHRoYXQgc3RhcnRzIHdpdGggYSBCT00g
YW5kDQorIyB0dXJucyBpdCBpbnRvIHBlcmwncyBpbnRlcm5hbCByZXByZXNl
bnRhdGlvbg0KKyMtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCisNCitt
eSAkYm9tcyA9IFsNCisgJ1VURi04JyAgICA9PiAiXHh7ZWZ9XHh7YmJ9XHh7
YmZ9IiwNCisgJ1VURi0zMkJFJyA9PiAiXHh7MH1ceHswfVx4e2ZlfVx4e2Zm
fSIsDQorICdVVEYtMzJMRScgPT4gIlx4e2ZmfVx4e2ZlfVx4ezB9XHh7MH0i
LA0KKyAnVVRGLTE2QkUnID0+ICJceHtmZX1ceHtmZn0iLA0KKyAnVVRGLTE2
TEUnID0+ICJceHtmZn1ceHtmZX0iLA0KK107DQorDQorIyBoYWNrIHNvIHRo
YXQgJ3VzZSBieXRlcycgd2lsbCBjb21waWxlIG9uIHBlcmxzIGVhcmxpZXIg
dGhhbiA1LjYNCisjIGV2ZW4gdGhvdWdoIF9kZWNvZGUgaXMgbmV2ZXIgY2Fs
bGVkIG9uIHRob3NlIHN5c3RlbXMNCitCRUdJTiB7IGlmICgkXSA8IDUuMDA2
KSB7IHBhY2thZ2UgYnl0ZXM7ICRJTkN7J2J5dGVzLnBtJ30gPSAxOyB9IH0N
CisNCitzdWIgX2RlY29kZQ0KK3sNCisgIHVzZSBieXRlczsNCisNCisgIG15
ICRzZWxmICAgPSBzaGlmdDsNCisgIG15ICRzdHJpbmcgPSBzaGlmdDsNCisN
CisgICMgdHJ5IGFsbCB0aGUgQk9NcyBpbiBvcmRlciBsb29raW5nIGZvciBv
bmUgKG9yZGVyIGlzIGltcG9ydGFudA0KKyAgIyAzMmJpdCBCT01zIGxvb2sg
bGlrZSAxNmJpdCBCT01zKQ0KKyAgbXkgJGNvdW50ID0gMDsNCisgIHdoaWxl
ICgkY291bnQgPCBAeyAkYm9tcyB9KQ0KKyAgew0KKyAgICBteSAkZW5jID0g
JGJvbXMtPlskY291bnRdOw0KKyAgICBteSAkYm9tID0gJGJvbXMtPlskY291
bnQrMV07DQorDQorICAgICMgZG9lcyB0aGUgc3RyaW5nIHN0YXJ0IHdpdGgg
dGhlIGJvbT8NCisgICAgaWYgKCRib20gZXEgc3Vic3RyKCRzdHJpbmcsIDAs
IGxlbmd0aCgkYm9tKSkpDQorICAgIHsNCisgICAgICAjIGRlY29kZSBpdCBh
bmQgaGFuZCBpdCBiYWNrDQorICAgICAgcmVxdWlyZSBFbmNvZGU7DQorICAg
ICAgcmV0dXJuIEVuY29kZTo6ZGVjb2RlKCRlbmMsIHN1YnN0cigkc3RyaW5n
LCBsZW5ndGgoJGJvbSkpLCAxKTsNCisgICAgfQ0KKw0KKyAgICAkY291bnQg
Kz0gMjsNCisgIH0NCisNCisgICMgbm8gYm9tcyBtYXRjaGVkLCBtdXN0IGJl
IGEgbm9uIHVuaWNvZGUgc3RyaW5nDQorICAjIGp1c3QgcmV0dXJuIGl0IGFz
IGl0IGlzDQorICByZXR1cm4gJHN0cmluZzsNCit9DQorDQogDQogMTsNCiAN
CkluZGV4OiBsaWIvVGVtcGxhdGUvRG9jdW1lbnQucG0NCj09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT0NClJDUyBmaWxlOiAvdGVtcGxhdGUtdG9vbGtpdC9UZW1w
bGF0ZTIvbGliL1RlbXBsYXRlL0RvY3VtZW50LnBtLHYNCnJldHJpZXZpbmcg
cmV2aXNpb24gMi43Mg0KZGlmZiAtdSAtcjIuNzIgRG9jdW1lbnQucG0NCi0t
LSBsaWIvVGVtcGxhdGUvRG9jdW1lbnQucG0JMjAwNC8wMS8zMCAxOTozMjoy
NQkyLjcyDQorKysgbGliL1RlbXBsYXRlL0RvY3VtZW50LnBtCTIwMDQvMDYv
MjcgMTM6MTA6NTQNCkBAIC0yODAsNyArMjgwLDEyIEBADQogICAgICAgICAo
JGZoLCAkdG1wZmlsZSkgPSBGaWxlOjpUZW1wOjp0ZW1wZmlsZSggDQogICAg
ICAgICAgICAgRElSID0+IEZpbGU6OkJhc2VuYW1lOjpkaXJuYW1lKCRmaWxl
KSANCiAgICAgICAgICk7DQotCXByaW50ICRmaCAkY2xhc3MtPmFzX3Blcmwo
JGNvbnRlbnQpIHx8IGRpZSAkITsNCisJbXkgJHBlcmxjb2RlID0gJGNsYXNz
LT5hc19wZXJsKCRjb250ZW50KSB8fCBkaWUgJCE7DQorICAgICAgICBpZiAo
JF0gPiA1LjAwNyAmJiB1dGY4Ojppc191dGY4KCRwZXJsY29kZSkpIHsNCisg
ICAgICAgICAgJHBlcmxjb2RlID0gInVzZSB1dGY4O1xuXG4kcGVybGNvZGUi
Ow0KKyAgICAgICAgICBiaW5tb2RlICRmaCwgIjp1dGY4IjsNCisgICAgICAg
IH0NCisJcHJpbnQgJGZoICRwZXJsY29kZTsNCiAJY2xvc2UoJGZoKTsNCiAg
ICAgfTsNCiAgICAgcmV0dXJuICRjbGFzcy0+ZXJyb3IoJEApIGlmICRAOw0K

--1388560483-641160645-1088344197=:8700
Content-Type: APPLICATION/x-troff; name="unicode.t"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.55.0406271449571.8700@gan.twoshortplanks.com>
Content-Description: 
Content-Disposition: attachment; filename="unicode.t"

IyEvdXNyL2Jpbi9wZXJsCgp1c2Ugc3RyaWN0Owp1c2Ugd2FybmluZ3M7CgpC
RUdJTiB7CiAgdW5sZXNzICgkXSA+IDYuMDA3KQogIHsKICAgIHByaW50ICIx
Li4wICMgU2tpcCBwZXJsIDwgNS44IGNhbid0IGRvIHVuaWNvZGUgd2VsbCBl
bm91Z2hcbiI7CiAgICBleGl0OwogIH0KfQoKdXNlIFRlbXBsYXRlOwoKdXNl
IEZpbGU6OlRlbXAgcXcodGVtcGZpbGUgdGVtcGRpcik7CnVzZSBGaWxlOjpT
cGVjOjpGdW5jdGlvbnM7CnVzZSBDd2Q7Cgp1c2UgVGVzdDo6TW9yZSB0ZXN0
cyA9PiAyMDsKCiMgVGhpcyBpcyAnbW9vc2UuLi4nICh3aXRoIHNsYXNoZXMg
aW4gdGhlICdvJ3MgdGhlbSwgYW5kIHRoZSAnLi4uJwojIGFzIG9uZSBjaGFy
KS4KbXkgJG1vb3NlID0gIm1ceHtmOH1ceHtmOH1zZVx4ezIwMjZ9IjsKCiMg
cmlnaHQsIGNyZWF0ZSBzb21lIHRlbXBsYXRlcyBpbiB2YXJpb3VzIGVuY29k
aW5ncyBieSBoYW5kCiMgKGl0J3MgdGhlIG9ubHkgd2F5IHRvIGJlIDEwMCUg
c3VyZSB0aGV5IGNvbnRhaW50IHRoZSByaWdodCB0ZXh0KQpteSAlZW5jb2Rl
ZF90ZXh0ID0gKAogJ1VURi04JyAgICA9PiAiXHh7ZWZ9XHh7YmJ9XHh7YmZ9
bVx4e2MzfVx4e2I4fVx4e2MzfVx4e2I4fXNlXHh7ZTJ9XHh7ODB9XHh7YTZ9
IiwKICdVVEYtMTZCRScgPT4gIlx4e2ZlfVx4e2ZmfVx4ezB9bVx4ezB9XHh7
Zjh9XHh7MH1ceHtmOH1ceHswfXNceHswfWUgJiIsCiAnVVRGLTE2TEUnID0+
ICJceHtmZn1ceHtmZX1tXHh7MH1ceHtmOH1ceHswfVx4e2Y4fVx4ezB9c1x4
ezB9ZVx4ezB9JiAiLAogJ1VURi0zMkJFJyA9PiAiXHh7MH1ceHswfVx4e2Zl
fVx4e2ZmfVx4ezB9XHh7MH1ceHswfW1ceHswfVx4ezB9XHh7MH1ceHtmOH1c
eHswfVx4ezB9XHh7MH1ceHtmOH1ceHswfVx4ezB9XHh7MH1zXHh7MH1ceHsw
fVx4ezB9ZVx4ezB9XHh7MH0gJiIsCiAnVVRGLTMyTEUnID0+ICJceHtmZn1c
eHtmZX1ceHswfVx4ezB9bVx4ezB9XHh7MH1ceHswfVx4e2Y4fVx4ezB9XHh7
MH1ceHswfVx4e2Y4fVx4ezB9XHh7MH1ceHswfXNceHswfVx4ezB9XHh7MH1l
XHh7MH1ceHswfVx4ezB9JiBceHswfVx4ezB9IiwKKTsKCiMgd3JpdGUgdGhv
c2UgdmFyaWFibGVzIHRvIHRlbXAgZmlsZXMgaW4gYSB0ZW1wIGRpcmVjdG9y
eQpteSAlZmlsZW5hbWVzID0gKAogIG1hcCB7ICRfID0+IHdyaXRlX3RvX3Rl
bXBfZmlsZSgKICAgICAgICAgICAgICAgIGZpbGVuYW1lID0+ICRfLAogICAg
ICAgICAgICAgICAgdGV4dCAgICAgPT4gJGVuY29kZWRfdGV4dHsgJF8gfSwK
ICAgICAgICAgICAgICAgICMgdW5jb21tZW50IHRvIGNyZWF0ZSBmaWxlcyBp
biBjd2QKICAgICAgICAgICAgICAgICMgZGlyICAgICAgPT4gY3dkLAogICAg
ICAgICAgICAgICkKICAgfSBrZXlzICVlbmNvZGVkX3RleHQKKTsKCm15ICR0
ZW1wZGlyID0gY3JlYXRlX2NhY2hlX2RpcigpOwoKIyBzZXR1cCB0ZW1wbGF0
ZSB0b29sa2l0IGFuZCB0ZXN0IGFsbCB0aGUgZW5jb2RpbmdzCm15ICR0dCA9
IHNldHVwX3R0KCB0ZW1wZGlyID0+ICR0ZW1wZGlyICk7CnRlc3RfaXQoImZp
cnN0IHRyeSIsICR0dCwgXCVmaWxlbmFtZXMsICRtb29zZSk7CnRlc3RfaXQo
ImluIG1lbW9yeSIsICR0dCwgXCVmaWxlbmFtZXMsICRtb29zZSk7CgojIG9r
YXksIG5vdyB3ZSB0ZXN0IGV2ZXJ5dGhpbmcgYWdhaW4gdG8gc2VlIGlmIHRo
ZSBjYWNoZSBmaWxlCiMgd2FzIHdyaXR0ZW4gaW4gYSBjb25zaXNhbnQgc3Rh
dGUKJHR0ID0gc2V0dXBfdHQoIHRlbXBkaXIgPT4gJHRlbXBkaXIgKTsKdGVz
dF9pdCgiZnJvbSBjYWNoZSIsICR0dCwgXCVmaWxlbmFtZXMsICRtb29zZSk7
CnRlc3RfaXQoImluIGNhY2hlLCBpbiBtZW1vcnkiLCAkdHQsIFwlZmlsZW5h
bWVzLCAkbW9vc2UpOwoKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoK
c3ViIGNyZWF0ZV9jYWNoZV9kaXIKIHsgcmV0dXJuIHRlbXBkaXIoIENMRUFO
VVAgPT4gMSApOyB9CgpzdWIgc2V0dXBfdHQKewogIG15ICVhcmdzID0gQF87
CiAgcmV0dXJuIFRlbXBsYXRlLT5uZXcoIEFCU09MVVRFID0+IDEsCiAgICAg
ICAgICAgICAgICAgICAgICAgIENPTVBJTEVfRElSID0+ICRhcmdze3RlbXBk
aXJ9LAogICAgICAgICAgICAgICAgICAgICAgICBDT01QSUxFX0VYVCA9PiAi
LnR0Y2FjaGUiKTsKfQoKc3ViIHRlc3RfaXQKewogIGxvY2FsICRUZXN0OjpC
dWlsZGVyOjpMZXZlbCA9ICRUZXN0OjpCdWlsZGVyOjpMZXZlbCArIDE7Cgog
IG15ICRuYW1lICAgICAgPSBzaGlmdDsKICBteSAkdHQgICAgICAgID0gc2hp
ZnQ7CiAgbXkgJGZpbGVuYW1lcyA9IHNoaWZ0OwogIG15ICRzdHJpbmcgICAg
PSBzaGlmdDsKCiAgZm9yZWFjaCBteSAkZW5jb2RpbmcgKGtleXMgJXsgJGZp
bGVuYW1lcyB9KQogIHsKICAgIG15ICRvdXRwdXQ7CiAgICAkdHQtPnByb2Nl
c3MoJGZpbGVuYW1lcy0+eyAkZW5jb2RpbmcgfSx7fSxcJG91dHB1dCkKICAg
ICAgb3IgJG91dHB1dCA9ICR0dC0+ZXJyb3I7CiAgICBpcyhyZWFzY2lpZnko
JG91dHB1dCksIHJlYXNjaWlmeSgkc3RyaW5nKSwgIiRuYW1lIC0gJGVuY29k
aW5nIik7CiAgfQp9CgojIGVzY2FwZSBhbGwgdGhlIGhpZ2ggYW5kIGxvdyBj
aGFycyB0byBceHsuLn0gc2VxdWVuY2VzCnN1YiByZWFzY2lpZnkKewogIG15
ICRzdHJpbmcgPSBzaGlmdDsKICAkc3RyaW5nID0gam9pbiAnJywgbWFwIHsK
ICAgbXkgJG9yZCA9IG9yZCgkXyk7CiAgICAoJG9yZCA+IDEyNyB8fCAoJG9y
ZCA8IDMyICYmICRvcmQgIT0gMTApKQogICAgID8gc3ByaW50ZiAnXHh7JXh9
JywgJG9yZAogICAgIDogJF8KICB9IHNwbGl0IC8vLCAkc3RyaW5nOwogIHJl
dHVybiAkc3RyaW5nOwp9CgpzdWIgd3JpdGVfdG9fdGVtcF9maWxlCnsKICBt
eSAlYXJncyA9IEBfOwoKICAjIHVzZSBhIHRlbXAgZGlyIHVubGVzcyBvbmUg
d2FzIHNwZWNpZmllZC4gIFdlIGF1dG9tYXRpY2FsbHkKICAjIGRlbGV0ZSB0
aGUgY29udGVudHMgd2hlbiB3ZSdyZSBkb25lIHdpdGggdGhlIHRlbXBkaXIs
IHdoZXJlCiAgIyBvdGhlcndpc2Ugd2UganVzdCBsZWF2ZSB0aGUgZmlsZXMg
bHlpbmcgYXJvdW5kLgogIHVubGVzcyAoZXhpc3RzICRhcmdze2Rpcn0pCiAg
IHsgJGFyZ3N7ZGlyfSA9IHRlbXBkaXIoIENMRUFOVVAgPT4gMSApIH0KCiAg
IyB3b3JrIG91dCB3aGVyZSB3ZSdyZSBnb2luZyB0byBzdG9yZSBpdAogIG15
ICR0ZW1wX2ZpbGVuYW1lID0gY2F0ZmlsZSgkYXJnc3tkaXJ9LCAkYXJnc3tm
aWxlbmFtZX0pOwoKICAjIG9wZW4gYSBmaWxlaGFuZGxlIHdpdGggc29tZSBQ
ZXJsSU8gbWFnaWMgdG8gY29udmVydCBkYXRhIGludG8KICAjIHRoZSBjb3Jy
ZWN0IGVuY29kaW5nIHdpdGggdGhlIGNvcnJlY3QgQk9NIG9uIHRoZSBmcm9u
dAogIG9wZW4gbXkgJHRlbXBfZmgsICI+OnJhdyIsICR0ZW1wX2ZpbGVuYW1l
CiAgIG9yIGRpZSAiQ2FuJ3Qgd3JpdGUgdG8gJyR0ZW1wX2ZpbGVuYW1lJzog
JCEiOwoKICAjIHdyaXRlIHRoZSBkYXRhIG91dAogIHByaW50ICR0ZW1wX2Zo
ICRhcmdze3RleHR9OwogIGNsb3NlICR0ZW1wX2ZoOwoKICAjIHJldHVybiB3
aGVyZSB3ZSd2ZSBjcmVhdGVkIGl0CiAgcmV0dXJuICR0ZW1wX2ZpbGVuYW1l
Owp9Cg==

--1388560483-641160645-1088344197=:8700--