General information | |
---|---|
Launched | 2011 |
Performance | |
Max. CPU clock rate | 2.85 GHz to 3.0 GHz |
Cache | |
L1 cache | 8×(16+16) kB |
L2 cache | 8×128 kB |
L3 cache | 4 MB |
Architecture and classification | |
Technology node | 40 nm |
Instruction set | SPARC V9 |
Physical specifications | |
Cores |
|
Products, models, variants | |
Core name |
|
History | |
Predecessor | SPARC T3 |
Successor | SPARC T5 |
The SPARC T4 is a SPARC multicore microprocessor introduced in 2011 by Oracle Corporation. The processor is designed to offer high multithreaded performance (8 threads per core, with 8 cores per chip), as well as high single threaded performance from the same chip.[1] The chip is the 4th generation[2] processor in the T-Series family. Sun Microsystems brought the first T-Series processor (UltraSPARC T1) to market in 2005.
The chip is the first Sun/Oracle SPARC chip to use dynamic threading[3] and out-of-order execution.[4] It incorporates one floating point unit and one dedicated cryptographic unit per core.[2] The cores use the 64-bit SPARC Version 9 architecture running at frequencies between 2.85 GHz and 3.0 GHz, and are built in a 40 nm process with a die size of 403 mm2 (0.625 sq in).[1]
An eight core, eight thread per core chip built in a 40 nm process and running at 2.5 GHz was described in Sun Microsystems' processor roadmap of 2009. It was codenamed "Yosemite Falls" and given an expected release date of late 2011. The processor was expected to introduce a new microarchitecture, codenamed "VT Core". The online technology website The Register speculated that this chip would be named "T4", being the successor to the SPARC T3.[5] The Yosemite Falls CPU product remained on Oracle Corporation's processor roadmap after the company took over Sun in early 2010.[6] In December 2010 the T4 processor was confirmed by Oracle's VP of hardware development to be designed for improved per-thread performance, with eight cores, and with an expected release within one year.[7][8]
The processor design was presented at the 2011 Hot Chips conference.[9] The cores (renamed "S3" from "VT") included a dual-issue 16 stage integer pipeline, and 11-cycle floating point pipeline, both giving improvements over the previous ("S2") core used in the SPARC T3 processor. Each core has associated 16 KB data and 16 KB instruction L1 caches, and a unified 128 KB L2 Cache. All eight cores share 4 MB L3 cache, and the total transistor count is approximately 855 million.[9] The design was the first Sun/Oracle SPARC processor with out-of-order execution[10] and was the first processor in the SPARC T-Series family to include the ability to issue more than one instruction per cycle to a core's execution units.[11]
The T4 processor was officially introduced as part of Oracle's SPARC T4 servers in September 2011.[12] Initial product releases of a single processor T4-1 rack server ran at 2.85 GHz.[3] The dual processor T4-2 ran at the same 2.85 GHz frequency, and the quad processor T4-4 server ran at 3.0 GHz.[13]
The SPARC S3 core also include a thread priority mechanism (called "dynamic threading") whereby each thread is allocated resources based on need, giving increased performance.[9] Most S3 core resources are shared among all active threads, up to 8 of them. Shared resources include branch prediction structures, various buffer entries, and out-of-order execution resources. Static resource allocation reserves the resources to the threads based on a policy whether the thread can use them or not. Dynamic threading allocates these resources to the threads that are ready and will use them, thus improving performance.[4]
Cryptographic performance was also increased over the T3 chip by design improvements including a new set of cryptographic instructions.[8] UltraSPARC T2 and T3's per-core cryptographic coprocessors were replaced with in-core accelerators and instruction-based cryptography. The implementation is designed to achieve wire speed encryption and decryption on the SPARC T4's 10-Gbit/s Ethernet ports.[4]
The architectural changes are claimed to deliver a 5x improvement in single thread integer performance[9] and twice the per-thread throughput performance compared to the previous generation T3.[4] The published SPECjvm2008 result for a 16-core T4-2 is 454 ops/m[14] and 321 ops/m[15] for the 32-core T3-2 which is a ratio of 2.8x in performance per core.