This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)
(Learn how and when to remove this template message)No issues specified. Please specify issues, or remove this template. |
MAC address anonymization performs a one-way function on a MAC address so that the result may be used in tracking systems for reporting and the general public, while making it nearly impossible to obtain the original MAC address from the result. The idea is that this process allows companies like Google,[1] Apple[2] and iInside[3] - which track users movements via computer hardware to simultaneously preserve the identities of the people they are tracking, as well as the hardware itself.
An example of MAC address anonymization would be to use a simple hash algorithm. Given an address of 11:22:33:44:55:66
, the MD5 hash algorithm produces eb341820cd3a3485461a61b1e97d31b1
(32 hexadecimal digits).[4]
An address only one character different (11:22:33:44:55:67
) produces 391907146439938c9821856fa181052e
,[5] an entirely different hash due to the avalanche effect.
Tracking companies rely on the assumption that address anonymization is akin to encryption. Given a message, and an encryption method that is well known to both the encoder and potential decryptor, modern encryption methods (such as Advanced Encryption Standard (AES) or RSA) will yield a result that is unbreakable in practice.
The problem lies in the fact that there are only 248 (281,474,976,710,656) possible MAC addresses. Given the encoding algorithm, an index can easily be created for each possible address. By using rainbow table compression, the index can be made small enough to be portable. Building the index is an embarrassingly parallel problem, and so the work can be accelerated greatly e.g. by renting a large amount of cloud computing resources temporarily.
For example, if a single CPU can compute 1,000,000 encrypted MACs per second, then generating the full table takes 8.9 CPU-years. With a fleet of 1,000 CPUs, this would only take around 78 hours. Using a rainbow table with a "depth" of 1,000,000 hashes per entry, the resulting table would only contain a few hundred million entries (a few GB) and require 0.5 seconds (on average, ignoring I/O time) to reverse any encrypted MAC into its original form.
One approach to mitigate this attack would be to use a deliberately slow one-way function for MAC addresses, such as a slow Key derivation function (KDF). For instance, if the KDF were tuned to require 0.1 seconds per MAC address anonymization operation (on a typical consumer CPU), generating a rainbow table would require 892,000 CPU-years.
Where data protection law requires anonymization, the method used should exclude any possibility of the original MAC address to be identified. Some companies truncate IPv4 addresses by removing the final octet, thus in effect retaining information about the user's ISP or subnet, but not directly identifying the individual. The activity could then originate from any of 254 IP addresses. This may not always be enough to guarantee anonymization.[6]