Comparto con ustedes esta nota que leí aquí.
Collisions in the MD5 cryptographic hash function
It is now well-known that the crytographic hash function MD5 has been broken. In March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they describe an algorithm that can find two different sequences of 128 bytes with the same MD5 hash. One famous such pair is the following: d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89and d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89Each of these blocks has MD5 hash 79054025255fb1a26e4bc422aef54eb4. Ben Laurie has a nice website that visualizes this MD5 collision. For a non-technical, though slightly outdated, introduction to hash functions, see Steve Friedle's Illustrated Guide. |
As we will explain below, the algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes:
|
The following is an improvement of Diaz's example, which does not need a special extractor. Here are two pairs of executable programs (one pair runs on Windows, one pair on Linux). These programs must be run from the console. |
How it works
The above files were generated by exploiting two facts: the block structure of the MD5 function, and the fact that Wang and Yu's technique works for an arbitrary initialization vector. To understand what this means, it is useful to have a general idea of how the MD5 function processes its input. This is done by an iteration method known as the Merkle-Damgard method. A given input file is first padded so that its length will be a multiple of 64 bytes. It is then divided into individual 64-byte blocks M0, M1, ..., Mn-1. The MD5 hash is computed by computing a sequence of 16-byte states s0, ..., sn, according to the rule: The method of Wang and Yu makes it possible, for a given initialization vector s, to find two pairs of blocks M,M' and N,N', such that f(f(s, M), M') = f(f(s, N), N'). It is important that this works for any initialization vector s, and not just for the standard initialization vector s0. Combining these observations, it is possible to find pairs of files of arbitrary length, which are identical except for 128 bytes somewhere in the middle of the file, and which have identical MD5 hash. Indeed, let us write the two files as sequences of 64-byte blocks: The blocks at the beginning of the files, M0, ..., Mi-1, can be chosen arbitrarily. Suppose that the internal state of the MD5 hash function after processing these blocks is si. Now we can apply Wang and Yu's method to the initialization vector si, to find two pairs of blocks Mi, Mi+1 and Ni, Ni+1, such that si+2 = f(f(si, Mi), Mi+1) = f(f(si, Ni), Ni+1). This guarantees that the internal state si+2 after the i+2st block will be the same for the two files. Finally, the remaining blocks Mi+2, ..., Mn can again be chosen arbitrarily. So how can we use this technique to produce a pair of programs (or postscript files) that have identical MD5 hash, yet behave in arbitrary different ways? This is simple. All we have to do is write the two programs like this:
Program 1: if (data1 == data1) then { good_program } else { evil_program } and arrange things so that "data1" = Mi, Mi+1 and "data2" = Ni, Ni+1 in the above scheme. This can even be done in a compiled program, by first compiling it with dummy values for data1 and data2, and later replacing them with the properly computed values. |
Here, you can download the software that I used to create MD5-colliding executable files.
Quick usage instructions:Note for Windows users: the below instructions are for Unix/Linux. On Windows, you may have to append ".exe" to the names of executable files. Also, to use "make", you must have the GNU tools installed and working.
|
Fuente: Peter Selinger's WebPage.
0 comentarios:
Post a Comment