Clmul Instruction Set

Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008^[1] and made available in the Intel Westmere processors announced in early 2010. Mathematically, the instruction implements multiplication of polynomials over the finite field GF(2) where the bitstring [math]\displaystyle{ a_0a_1\ldots a_{63} }[/math] represents the polynomial [math]\displaystyle{ a_0 + a_1X + a_2X^2 + \cdots + a_{63}X^{63} }[/math]. The CLMUL instruction also allows a more efficient implementation of the closely related multiplication of larger finite fields GF(2^k) than the traditional instruction set.^[2]

One use of these instructions is to improve the speed of applications doing block cipher encryption in Galois/Counter Mode, which depends on finite field GF(2^k) multiplication. Another application is the fast calculation of CRC values,^[3] including those used to implement the LZ77 sliding window DEFLATE algorithm in zlib and pngcrush.^[4]

ARMv8 also has a version of CLMUL. SPARC calls their version XMULX, for "XOR multiplication".

New instructions

The instruction computes the 128-bit carry-less product of two 64-bit values. The destination is a 128-bit XMM register. The source may be another XMM register or memory. An immediate operand specifies which halves of the 128-bit operands are multiplied. Mnemonics specifying specific values of the immediate operand are also defined:

Instruction	Opcode	Description
`PCLMULQDQ xmmreg,xmmrm,imm`	`[rmi: 66 0f 3a 44 /r ib]`	Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2)[X].
`PCLMULLQLQDQ xmmreg,xmmrm`	`[rm: 66 0f 3a 44 /r 00]`	Multiply the low halves of the two registers.
`PCLMULHQLQDQ xmmreg,xmmrm`	`[rm: 66 0f 3a 44 /r 01]`	Multiply the high half of the destination register by the low half of the source register.
`PCLMULLQHQDQ xmmreg,xmmrm`	`[rm: 66 0f 3a 44 /r 10]`	Multiply the low half of the destination register by the high half of the source register.
`PCLMULHQHQDQ xmmreg,xmmrm`	`[rm: 66 0f 3a 44 /r 11]`	Multiply the high halves of the two registers.

A EVEX vectorized version (VPCLMULQDQ) is seen in AVX-512.

CPUs with CLMUL instruction set

Intel
- Westmere processor (March 2010).
- Sandy Bridge processor
- Ivy Bridge processor
- Haswell processor
- Broadwell processor (with increased throughput and lower latency^[5])
- Skylake (and later) processor
- Goldmont processor
AMD:
- Jaguar-based processors and newer ^[6]
- Puma-based processors and newer
- "Heavy Equipment" processors
  - Bulldozer-based processors ^[7]
  - Piledriver-based processors
  - Steamroller-based processors
  - Excavator-based processors and newer
- Zen processors
- Zen+ processors
- Zen2 (and later) processors

The presence of the CLMUL instruction set can be checked by testing one of the CPU feature bits.

References

↑ "Intel Software Network". Intel. Archived from the original on 2008-04-07. https://web.archive.org/web/20080407095317/http://softwareprojects.intel.com/avx/. Retrieved 2008-04-05.
↑ "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode – Rev 2.02". Intel. 2014-04-20. http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/.
↑ "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ". http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf.
↑ Vlad Krasnov (2015-07-08). "Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code". CloudFlare. https://blog.cloudflare.com/cloudflare-fights-cancer/. Retrieved 2016-09-04.
↑ Johan De Gelas (2017-03-31). "The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads". Anandtech. p. 3. http://www.anandtech.com/show/10158/the-intel-xeon-e5-v4-review.
↑ "Slide detailing improvements of Jaguar over Bobcat". AMD. http://www.slideshare.net/AMDPhil/bobcat-to-jaguarv2. Retrieved August 3, 2013.
↑ Dave Christie (6 May 2009). "Striking a balance". AMD Developer blogs. Archived from the original on 9 November 2013. https://archive.today/20131109140737/http://developer.amd.com/2009/05/06/striking-a-balance/. Retrieved 2011-03-11.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/CLMUL instruction set. Read more

[1] "Intel Software Network". Intel. Archived from the original on 2008-04-07. https://web.archive.org/web/20080407095317/http://softwareprojects.intel.com/avx/. Retrieved 2008-04-05.

[2] "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode – Rev 2.02". Intel. 2014-04-20. http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/.

[3] "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ". http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf.

[4] Vlad Krasnov (2015-07-08). "Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code". CloudFlare. https://blog.cloudflare.com/cloudflare-fights-cancer/. Retrieved 2016-09-04.

[5] Johan De Gelas (2017-03-31). "The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads". Anandtech. p. 3. http://www.anandtech.com/show/10158/the-intel-xeon-e5-v4-review.

[6] "Slide detailing improvements of Jaguar over Bobcat". AMD. http://www.slideshare.net/AMDPhil/bobcat-to-jaguarv2. Retrieved August 3, 2013.

[7] Dave Christie (6 May 2009). "Striking a balance". AMD Developer blogs. Archived from the original on 9 November 2013. https://archive.today/20131109140737/http://developer.amd.com/2009/05/06/striking-a-balance/. Retrieved 2011-03-11.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

v t e AMD technology
Software	AMD Radeon Software AGESA AMDGPU
Platforms	Spider Dragon Horus
Technology	Cool'n'Quiet High Bandwidth Memory PowerNow! PowerPlay PowerTune Turbo Core ASTC AMD Wraith
Instructions	X86-64 3DNow! AVX XOP CVT16/F16C FMA FMA3 FMA4 BMI ABM BMI1 TBM SSE5 ASF AES

v t e Intel technology
Platforms	Centrino Centrino 2 Viiv MID Tablet CULV Ultrabook Skulltrail NUC Galileo Edison Curie
Discontinued	Common Building Block MultiProcessor Specification Intel Communication Streaming Architecture Intel Inboard 386 Intel Play MMC-1 MMC-2
Current	Advanced Programmable Interrupt Controller Intel Turbo Boost vPro Intel Secure Key Intel Management Engine Active Management Technology AMT versions High-bandwidth Digital Content Protection High Definition Audio Hub Architecture Rapid Storage Technology Enhanced SpeedStep Serial Digital Video Out Host Embedded Controller Interface Hyper-threading Omni-Path Platform Environment Control Interface QuickPath Interconnect Platform Controller Hub System Management Bus Thunderbolt Ultra Path Interconnect
Upcoming	Silicon Photonics Link

v t e Instruction set extensions
SIMD (RISC)	Alpha MVI ARM NEON SVE MIPS MDMX MIPS-3D MXU MIPS SIMD PA-RISC MAX Power ISA VMX SPARC VIS
SIMD (x86)	MMX (1996) 3DNow! (1998) SSE (1999) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) SSE5 ~~(2007)~~ AVX (2008) F16C (2009) XOP (2009) FMA (FMA4: 2011, FMA3: 2012) AVX2 (2013) AVX-512 (2015)
Bit manipulation	BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) ADX (2014)
Compressed instructions	Thumb MIPS16e ASE
Security and cryptography	AES-NI (2008); 32- and 64-bit ARMv8 also has AES instructions CLMUL (2010) RDRAND (2012) SHA (2013) MPX (2015) SGX (2015)
Transactional memory	TSX (2013) ASF
Virtualization	VT-x (2005) AMD-V (2006)
Suspended extensions' dates have been ~~struck through~~.

Clmul Instruction Set

Contents

New instructions

CPUs with CLMUL instruction set

See also

References