Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008[1] and made available in the Intel Westmere processors announced in early 2010. Mathematically, the instruction implements multiplication of polynomials over the finite field GF(2) where the bitstring [math]\displaystyle{ a_0a_1\ldots a_{63} }[/math] represents the polynomial [math]\displaystyle{ a_0 + a_1X + a_2X^2 + \cdots + a_{63}X^{63} }[/math]. The CLMUL instruction also allows a more efficient implementation of the closely related multiplication of larger finite fields GF(2k) than the traditional instruction set.[2]
One use of these instructions is to improve the speed of applications doing block cipher encryption in Galois/Counter Mode, which depends on finite field GF(2k) multiplication. Another application is the fast calculation of CRC values,[3] including those used to implement the LZ77 sliding window DEFLATE algorithm in zlib and pngcrush.[4]
ARMv8 also has a version of CLMUL. SPARC calls their version XMULX, for "XOR multiplication".
New instructions
The instruction computes the 128-bit carry-less product of two 64-bit values. The destination is a 128-bit XMM register. The source may be another XMM register or memory. An immediate operand specifies which halves of the 128-bit operands are multiplied. Mnemonics specifying specific values of the immediate operand are also defined:
| Instruction
|
Opcode
|
Description
|
PCLMULQDQ xmmreg,xmmrm,imm |
[rmi: 66 0f 3a 44 /r ib]
|
Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2)[X].
|
PCLMULLQLQDQ xmmreg,xmmrm |
[rm: 66 0f 3a 44 /r 00]
|
Multiply the low halves of the two registers.
|
PCLMULHQLQDQ xmmreg,xmmrm |
[rm: 66 0f 3a 44 /r 01]
|
Multiply the high half of the destination register by the low half of the source register.
|
PCLMULLQHQDQ xmmreg,xmmrm |
[rm: 66 0f 3a 44 /r 10]
|
Multiply the low half of the destination register by the high half of the source register.
|
PCLMULHQHQDQ xmmreg,xmmrm |
[rm: 66 0f 3a 44 /r 11]
|
Multiply the high halves of the two registers.
|
A EVEX vectorized version (VPCLMULQDQ) is seen in AVX-512.
CPUs with CLMUL instruction set
- Intel
- Westmere processor (March 2010).
- Sandy Bridge processor
- Ivy Bridge processor
- Haswell processor
- Broadwell processor (with increased throughput and lower latency[5])
- Skylake (and later) processor
- Goldmont processor
- AMD:
- Jaguar-based processors and newer [6]
- Puma-based processors and newer
- "Heavy Equipment" processors
- Bulldozer-based processors [7]
- Piledriver-based processors
- Steamroller-based processors
- Excavator-based processors and newer
- Zen processors
- Zen+ processors
- Zen2 (and later) processors
The presence of the CLMUL instruction set can be checked by testing one of the CPU feature bits.
See also
- Finite field arithmetic
- AES instruction set
- FMA3 instruction set
- FMA4 instruction set
- AVX instruction set
References
- ↑ "Intel Software Network". Intel. Archived from the original on 2008-04-07. https://web.archive.org/web/20080407095317/http://softwareprojects.intel.com/avx/. Retrieved 2008-04-05.
- ↑ "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode – Rev 2.02". Intel. 2014-04-20. http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/.
- ↑ "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ". http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf.
- ↑ Vlad Krasnov (2015-07-08). "Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code". CloudFlare. https://blog.cloudflare.com/cloudflare-fights-cancer/. Retrieved 2016-09-04.
- ↑ Johan De Gelas (2017-03-31). "The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads". Anandtech. p. 3. http://www.anandtech.com/show/10158/the-intel-xeon-e5-v4-review.
- ↑ "Slide detailing improvements of Jaguar over Bobcat". AMD. http://www.slideshare.net/AMDPhil/bobcat-to-jaguarv2. Retrieved August 3, 2013.
- ↑ Dave Christie (6 May 2009). "Striking a balance". AMD Developer blogs. Archived from the original on 9 November 2013. https://archive.today/20131109140737/http://developer.amd.com/2009/05/06/striking-a-balance/. Retrieved 2011-03-11.
AMD technology |
|---|
| Software |
- AMD Radeon Software
- AGESA
- AMDGPU
|
|---|
| Platforms | |
|---|
| Technology |
- Cool'n'Quiet
- High Bandwidth Memory
- PowerNow!
- PowerPlay
- PowerTune
- Turbo Core
- ASTC
- AMD Wraith
|
|---|
| Instructions |
- X86-64
- 3DNow!
- AVX
- XOP
- CVT16/F16C
- FMA
- BMI
- SSE5
- ASF
- AES
|
|---|
Intel technology |
|---|
| Platforms |
- Centrino
- Centrino 2
- Viiv
- MID
- Tablet
- CULV
- Ultrabook
- Skulltrail
- NUC
- Galileo
- Edison
- Curie
|
|---|
| Discontinued |
- Common Building Block
- MultiProcessor Specification
- Intel Communication Streaming Architecture
- Intel Inboard 386
- Intel Play
- MMC-1
- MMC-2
|
|---|
| Current |
- Advanced Programmable Interrupt Controller
- Intel Turbo Boost
- vPro
- Intel Secure Key
- Intel Management Engine
- Active Management Technology
- High-bandwidth Digital Content Protection
- High Definition Audio
- Hub Architecture
- Rapid Storage Technology
- Enhanced SpeedStep
- Serial Digital Video Out
- Host Embedded Controller Interface
- Hyper-threading
- Omni-Path
- Platform Environment Control Interface
- QuickPath Interconnect
- Platform Controller Hub
- System Management Bus
- Thunderbolt
- Ultra Path Interconnect
|
|---|
| Upcoming | |
|---|
Instruction set extensions |
|---|
| SIMD (RISC) |
- Alpha
- ARM
- MIPS
- MDMX
- MIPS-3D
- MXU
- MIPS SIMD
- PA-RISC
- Power ISA
- SPARC
|
|---|
| SIMD (x86) |
- MMX (1996)
- 3DNow! (1998)
- SSE (1999)
- SSE2 (2001)
- SSE3 (2004)
- SSSE3 (2006)
- SSE4 (2006)
- SSE5
(2007)
- AVX (2008)
- F16C (2009)
- XOP (2009)
- FMA (FMA4: 2011, FMA3: 2012)
- AVX2 (2013)
- AVX-512 (2015)
|
|---|
| Bit manipulation |
- BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012)
- ADX (2014)
|
|---|
| Compressed instructions | |
|---|
| Security and cryptography |
- AES-NI (2008); 32- and 64-bit ARMv8 also has AES instructions
- CLMUL (2010)
- RDRAND (2012)
- SHA (2013)
- MPX (2015)
- SGX (2015)
|
|---|
| Transactional memory | |
|---|
| Virtualization | |
|---|
Suspended extensions' dates have been struck through. |
 | Original source: https://en.wikipedia.org/wiki/CLMUL instruction set. Read more |