Clmul Instruction Set

From Handwiki

Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008[1] and made available in the Intel Westmere processors announced in early 2010. Mathematically, the instruction implements multiplication of polynomials over the finite field GF(2) where the bitstring [math]\displaystyle{ a_0a_1\ldots a_{63} }[/math] represents the polynomial [math]\displaystyle{ a_0 + a_1X + a_2X^2 + \cdots + a_{63}X^{63} }[/math]. The CLMUL instruction also allows a more efficient implementation of the closely related multiplication of larger finite fields GF(2k) than the traditional instruction set.[2]

One use of these instructions is to improve the speed of applications doing block cipher encryption in Galois/Counter Mode, which depends on finite field GF(2k) multiplication. Another application is the fast calculation of CRC values,[3] including those used to implement the LZ77 sliding window DEFLATE algorithm in zlib and pngcrush.[4]

ARMv8 also has a version of CLMUL. SPARC calls their version XMULX, for "XOR multiplication".

New instructions

The instruction computes the 128-bit carry-less product of two 64-bit values. The destination is a 128-bit XMM register. The source may be another XMM register or memory. An immediate operand specifies which halves of the 128-bit operands are multiplied. Mnemonics specifying specific values of the immediate operand are also defined:

Instruction Opcode Description
PCLMULQDQ xmmreg,xmmrm,imm [rmi: 66 0f 3a 44 /r ib] Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2)[X].
PCLMULLQLQDQ xmmreg,xmmrm [rm:  66 0f 3a 44 /r 00] Multiply the low halves of the two registers.
PCLMULHQLQDQ xmmreg,xmmrm [rm:  66 0f 3a 44 /r 01] Multiply the high half of the destination register by the low half of the source register.
PCLMULLQHQDQ xmmreg,xmmrm [rm:  66 0f 3a 44 /r 10] Multiply the low half of the destination register by the high half of the source register.
PCLMULHQHQDQ xmmreg,xmmrm [rm:  66 0f 3a 44 /r 11] Multiply the high halves of the two registers.

A EVEX vectorized version (VPCLMULQDQ) is seen in AVX-512.

CPUs with CLMUL instruction set

  • Intel
    • Westmere processor (March 2010).
    • Sandy Bridge processor
    • Ivy Bridge processor
    • Haswell processor
    • Broadwell processor (with increased throughput and lower latency[5])
    • Skylake (and later) processor
    • Goldmont processor
  • AMD:
    • Jaguar-based processors and newer [6]
    • Puma-based processors and newer
    • "Heavy Equipment" processors
      • Bulldozer-based processors [7]
      • Piledriver-based processors
      • Steamroller-based processors
      • Excavator-based processors and newer
    • Zen processors
    • Zen+ processors
    • Zen2 (and later) processors

The presence of the CLMUL instruction set can be checked by testing one of the CPU feature bits.

See also

  • Finite field arithmetic
  • AES instruction set
  • FMA3 instruction set
  • FMA4 instruction set
  • AVX instruction set

References

  1. "Intel Software Network". Intel. Archived from the original on 2008-04-07. https://web.archive.org/web/20080407095317/http://softwareprojects.intel.com/avx/. Retrieved 2008-04-05. 
  2. "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode – Rev 2.02". Intel. 2014-04-20. http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/. 
  3. "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ". http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf. 
  4. Vlad Krasnov (2015-07-08). "Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code". CloudFlare. https://blog.cloudflare.com/cloudflare-fights-cancer/. Retrieved 2016-09-04. 
  5. Johan De Gelas (2017-03-31). "The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads". Anandtech. p. 3. http://www.anandtech.com/show/10158/the-intel-xeon-e5-v4-review. 
  6. "Slide detailing improvements of Jaguar over Bobcat". AMD. http://www.slideshare.net/AMDPhil/bobcat-to-jaguarv2. Retrieved August 3, 2013. 
  7. Dave Christie (6 May 2009). "Striking a balance". AMD Developer blogs. Archived from the original on 9 November 2013. https://archive.today/20131109140737/http://developer.amd.com/2009/05/06/striking-a-balance/. Retrieved 2011-03-11. 



Retrieved from "https://handwiki.org/wiki/index.php?title=CLMUL_instruction_set&oldid=3014157"

Categories: [X86 architecture] [X86 instructions]


Download as ZWI file | Last modified: 05/15/2024 13:59:38 | 11 views
☰ Source: https://handwiki.org/wiki/CLMUL_instruction_set | License: CC BY-SA 3.0

ZWI is not signed. [what is this?]