[Templates] Processing templates from variables
Paul Seamons
mail@seamons.com
Tue, 7 Aug 2007 12:42:50 -0600
> Why MD5 and not SHA1? MD5 is known weak.
I'm sure that using SHA1 could be made configurable.
I'm sure I'm digging a hole for myself here - but...
- MD5 is weak for encryption and security purposes - there is still utility
in other areas.
- Collisions are certainly possible with MD5. Collisions on human generated
strings (ascii) are rare - if they are even possible (without a hundred
monkeys typing for eons).
- Collisions have already been found for SHA1. If MD5 isn't acceptible, then
possibly SHA1 isn't either.
(http://www.schneier.com/blog/archives/2005/02/sha1_broken.html)
- MD5 still represents a larger install base than SHA1.
So - it is in theory possible that there could be an exploit by using MD5.
The exploit would involve a hacker being able to have an arbitrary, most
likely binary, string passed to the eval filter. The eval filter would then
generate an MD5 sum. If the hacker's hacked string were used first, then any
templates eval'ing strings that resolve to the same MD5 could use the
hacker's hacked string.
So for the exploit to work (in no particular order):
1) The hacker has to be able to upload a string that will be used by the
system.
2) The string probably has to be binary.
3) The hacker has to get their file into the cache first.
4) The hacker has to know that you are using string eval.
5) The hacker has to know the MD5 sum of a string that you are using.
Looking at the points
For point 1 - the point of any interactive websites is that users can upload
strings - either by file upload or form submission or even by simple url
strings. It is very possible for a hacker to get information on to a
website. The question then is how is that information used. I could
envision a template designer trying to format the users submitted data - this
is very common. OK - so point 1 is certainly doable.
For point 2 - ok - that is a dumb point - all strings are binary really
(strings of bits). However, strings that are going to be good candidates for
collisions are going to be non-ascii. I guess Unicode is non-ascii too. But
at any rate we have to avoid doing validation on the user's submitted data
for it to be used.
For point 3 - the hacker would have little control over whether his string
was used first. In mod_perl cases the caches have to be pre-initialized and
would already have the correct cache. In other cases, the user's data could
end up getting cached first - assuming that you are caching the user's data.
For point 4 and 5 - I guess it is an open source world, and there are
certainly projects that have their source code available for all to see.
These projects could easily be reviewed for strings that are passed to eval.
So it is possible - if the user's data is not validated, is then passed to the
eval filter, and is used before strings generating the same MD5.
Collisions have been found for MD5, but I still use MD5 on a regular basis
with no worries - so long as I'm using in safe applications. I think
Template hashsums are a safe application (at least for the next three
months).
Paul
http://www.accessdata.com/media/en_US/print/papers/wp.MD5_Collisions.en_us.pdf