Release date | December 7, 2017 |
---|---|
Codename | Volta |
Fabrication process | TSMC 12 nm (FinFET) |
Cards | |
Enthusiast |
|
History | |
Predecessor | Pascal |
Variant | Turing (consumer, professional) |
Successor | Ampere (consumer, professional) |
Support status | |
Supported |
Volta is the codename, but not the trademark,[1] for a GPU microarchitecture developed by Nvidia, succeeding Pascal. It was first announced on a roadmap in March 2013,[2] although the first product was not announced until May 2017.[3] The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta. It was Nvidia's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores.[4] The architecture is produced with TSMC's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.
The first graphics card to use it was the datacenter Tesla V100, e.g. as part of the Nvidia DGX-1 system.[3] It has also been used in the Quadro GV100 and Titan V. There were no mainstream GeForce graphics cards based on Volta.
After two USPTO proceedings,[5][6] on Jul. 03, 2023 Nvidia lost the Volta trademark application in the field of artificial intelligence. The Volta trademark[7] owner remains Volta Robots, a company specialized in AI and vision algorithms for robots and unmanned vehicles.
Architectural improvements of the Volta architecture include the following:
Comparison of Compute Capability: GP100 vs GV100 vs GA100[15]
GPU features | Nvidia Tesla P100 | Nvidia Tesla V100 | Nvidia A100 |
---|---|---|---|
GPU codename | GP100 | GV100 | GA100 |
GPU architecture | Nvidia Pascal | Nvidia Volta | Nvidia Ampere |
Compute capability | 6.0 | 7.0 | 8.0 |
Threads / warp | 32 | 32 | 32 |
Max warps / SM | 64 | 64 | 64 |
Max threads / SM | 2048 | 2048 | 2048 |
Max thread blocks / SM | 32 | 32 | 32 |
Max 32-bit registers / SM | 65536 | 65536 | 65536 |
Max registers / block | 65536 | 65536 | 65536 |
Max registers / thread | 255 | 255 | 255 |
Max thread block size | 1024 | 1024 | 1024 |
FP32 cores / SM | 64 | 64 | 64 |
Ratio of SM registers to FP32 cores | 1024 | 1024 | 1024 |
Shared Memory Size / SM | 64 KB | Configurable up to 96 KB | Configurable up to 164 KB |
Comparison of Precision Support Matrix[16][17]
Supported CUDA Core Precisions | Supported Tensor Core Precisions | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FP16 | FP32 | FP64 | INT1 | INT4 | INT8 | TF32 | BF16 | FP16 | FP32 | FP64 | INT1 | INT4 | INT8 | TF32 | BF16 | |
Nvidia Tesla P4 | No | Yes | Yes | No | No | Yes | No | No | No | No | No | No | No | No | No | No |
Nvidia P100 | Yes | Yes | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No |
Nvidia Volta | Yes | Yes | Yes | No | No | Yes | No | No | Yes | No | No | No | No | No | No | No |
Nvidia Turing | Yes | Yes | Yes | No | No | No | No | No | Yes | No | No | Yes | Yes | Yes | No | No |
Nvidia A100 | Yes | Yes | Yes | No | No | Yes | No | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes |
Legend:
Comparison of Decode Performance
Concurrent streams | H.264 decode (1080p30) | H.265 (HEVC) decode (1080p30) | VP9 decode (1080p30) |
---|---|---|---|
V100 | 16 | 22 | 22 |
A100 | 75 | 157 | 108 |
Volta has been announced as the GPU microarchitecture within the Xavier generation of Tegra SoC focusing on self-driving cars.[18][19]
At Nvidia's annual GPU Technology Conference keynote on May 10, 2017, Nvidia officially announced the Volta microarchitecture along with the Tesla V100.[3] The Volta GV100 GPU is built on a 12 nm process size using HBM2 memory with 900 GB/s of bandwidth.[20]
Nvidia officially announced the Nvidia TITAN V on December 7, 2017.[21][22]
Nvidia officially announced the Quadro GV100 on March 27, 2018.[23]
Model | Launch | Code Name (s) | Fab (nm) |
Transistors (billion) |
Die size (mm2) |
Bus Interface | Core config | SM Count[a] |
Graphics Processing Clusters[b] |
L2 Cache Size (MiB) |
Clock speeds | Fillrate | Memory | Processing power (GFLOPS) | TDP (Watts) |
NVLink Support | Launch Price (USD) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CUDA core[c] |
Tensor core[d] |
Base core clock (MHz) |
Boost clock (MHz) |
Memory (MT/s) |
Pixel (GP/s) |
Texture (GT/s) |
Size (GiB) |
Bandwidth (GB/s) |
Bus Type |
Bus width (bit) |
Single precision (boost) |
Double precision (boost) |
Half precision (boost) | |||||||||||||
MSRP | ||||||||||||||||||||||||||
Nvidia Titan V[24] | December 7, 2017 | GV100-400-A1 | TSMC 12 nm | 21.1 | 815 | PCIe 3.0 ×16 | 5120:320:96 | 640 | 80 | 6 | 4.5 | 1200 | 1455 | 1700 | 139.7 | 465.6 | 12 | 652.8 | HBM2 | 3072 | 12288 (14899) | 6144 (7450) | 24576 (29798) | 250 | No | $2,999 |
Nvidia Quadro GV100[25] | March 27, 2018 | GV100 | 5120:320:128 | 6 | 1132 | 1628 | 1696 | 208.4 | 521 | 32 | 868.4 | 4096 | 11592 (16671) | 5796 (8335) | 23183 (33341) | Yes | $8,999 | |||||||||
Nvidia Titan V CEO Edition[26][27] | June 21, 2018 | 1200 | 1455 | 1700 | 186.2 | 465.6 | 870.4 | 12288 (14899) | 6144 (7450) | 24576 (29798) | N/A |
Volta is also reported to be included in the Summit and Sierra supercomputers, used for GPGPU compute.[28][29] The Volta GPUs will connect to the POWER9 CPUs via NVLink 2.0, which is expected to support cache coherency and therefore improve GPGPU performance.[30][11][31]
Comparison of accelerators used in DGX:[32][33][34]
Model | Architecture | Socket | FP32 CUDA cores |
FP64 cores (excl. tensor) |
Mixed INT32/FP32 cores |
INT32 cores |
Boost clock |
Memory clock |
Memory bus width |
Memory bandwidth |
VRAM | Single precision (FP32) |
Double precision (FP64) |
INT8 (non-tensor) |
INT8 dense tensor |
INT32 | FP4 dense tensor |
FP16 | FP16 dense tensor |
bfloat16 dense tensor |
TensorFloat-32 (TF32) dense tensor |
FP64 dense tensor |
Interconnect (NVLink) |
GPU | L1 Cache | L2 Cache | TDP | Die size | Transistor count |
Process | Launched |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
B200 | Blackwell | SXM6 | N/A | N/A | N/A | N/A | N/A | 8 Gbit/s HBM3e | 8192-bit | 8 TB/sec | 192 GB HBM3e | N/A | N/A | N/A | 4.5 POPS | N/A | 9 PFLOPS | N/A | 2.25 PFLOPS | 2.25 PFLOPS | 1.2 PFLOPS | 40 TFLOPS | 1.8 TB/sec | GB100 | N/A | N/A | 1000 W | N/A | 208 B | TSMC 4NP | Q4 2024 (expected) |
B100 | Blackwell | SXM6 | N/A | N/A | N/A | N/A | N/A | 8 Gbit/s HBM3e | 8192-bit | 8 TB/sec | 192 GB HBM3e | N/A | N/A | N/A | 3.5 POPS | N/A | 7 PFLOPS | N/A | 1.98 PFLOPS | 1.98 PFLOPS | 989 TFLOPS | 30 TFLOPS | 1.8 TB/sec | GB100 | N/A | N/A | 700 W | N/A | 208 B | TSMC 4NP | |
H200 | Hopper | SXM5 | 16896 | 4608 | 16896 | N/A | 1980 MHz | 6.3 Gbit/s HBM3e | 6144-bit | 4.8 TB/sec | 141 GB HBM3e | 67 TFLOPS | 34 TFLOPS | N/A | 1.98 POPS | N/A | N/A | N/A | 990 TFLOPS | 990 TFLOPS | 495 TFLOPS | 67 TFLOPS | 900 GB/sec | GH100 | 25344 KB (192 KB × 132) | 51200 KB | 1000 W | 814 mm2 | 80 B | TSMC 4N | Q3 2023 |
H100 | Hopper | SXM5 | 16896 | 4608 | 16896 | N/A | 1980 MHz | 5.2 Gbit/s HBM3 | 5120-bit | 3.35 TB/sec | 80 GB HBM3 | 67 TFLOPS | 34 TFLOPS | N/A | 1.98 POPS | N/A | N/A | N/A | 990 TFLOPS | 990 TFLOPS | 495 TFLOPS | 67 TFLOPS | 900 GB/sec | GH100 | 25344 KB (192 KB × 132) | 51200 KB | 700 W | 814 mm2 | 80 B | TSMC 4N | Q3 2022 |
A100 80GB | Ampere | SXM4 | 6912 | 3456 | 6912 | N/A | 1410 MHz | 3.2 Gbit/s HBM2e | 5120-bit | 1.52 TB/sec | 80 GB HBM2e | 19.5 TFLOPS | 9.7 TFLOPS | N/A | 624 TOPS | 19.5 TOPS | N/A | 78 TFLOPS | 312 TFLOPS | 312 TFLOPS | 156 TFLOPS | 19.5 TFLOPS | 600 GB/sec | GA100 | 20736 KB (192 KB × 108) | 40960 KB | 400 W | 826 mm2 | 54.2 B | TSMC N7 | Q1 2020 |
A100 40GB | Ampere | SXM4 | 6912 | 3456 | 6912 | N/A | 1410 MHz | 2.4 Gbit/s HBM2 | 5120-bit | 1.52 TB/sec | 40 GB HBM2 | 19.5 TFLOPS | 9.7 TFLOPS | N/A | 624 TOPS | 19.5 TOPS | N/A | 78 TFLOPS | 312 TFLOPS | 312 TFLOPS | 156 TFLOPS | 19.5 TFLOPS | 600 GB/sec | GA100 | 20736 KB (192 KB × 108) | 40960 KB | 400 W | 826 mm2 | 54.2 B | TSMC N7 | |
V100 32GB | Volta | SXM3 | 5120 | 2560 | N/A | 5120 | 1530 MHz | 1.75 Gbit/s HBM2 | 4096-bit | 900 GB/sec | 32 GB HBM2 | 15.7 TFLOPS | 7.8 TFLOPS | 62 TOPS | N/A | 15.7 TOPS | N/A | 31.4 TFLOPS | 125 TFLOPS | N/A | N/A | N/A | 300 GB/sec | GV100 | 10240 KB (128 KB × 80) | 6144 KB | 350 W | 815 mm2 | 21.1 B | TSMC 12FFN | Q3 2017 |
V100 16GB | Volta | SXM2 | 5120 | 2560 | N/A | 5120 | 1530 MHz | 1.75 Gbit/s HBM2 | 4096-bit | 900 GB/sec | 16 GB HBM2 | 15.7 TFLOPS | 7.8 TFLOPS | 62 TOPS | N/A | 15.7 TOPS | N/A | 31.4 TFLOPS | 125 TFLOPS | N/A | N/A | N/A | 300 GB/sec | GV100 | 10240 KB (128 KB × 80) | 6144 KB | 300 W | 815 mm2 | 21.1 B | TSMC 12FFN | |
P100 | Pascal | SXM/SXM2 | N/A | 1792 | 3584 | N/A | 1480 MHz | 1.4 Gbit/s HBM2 | 4096-bit | 720 GB/sec | 16 GB HBM2 | 10.6 TFLOPS | 5.3 TFLOPS | N/A | N/A | N/A | N/A | 21.2 TFLOPS | N/A | N/A | N/A | N/A | 160 GB/sec | GP100 | 1344 KB (24 KB × 56) | 4096 KB | 300 W | 610 mm2 | 15.3 B | TSMC 16FF+ | Q2 2016 |