For plug and play low latency, scalable performance, GroqCard accelerator packages a single GroqChip™ processor into a standard PCIe Gen4 x16 form factor providing hassle-free server integration. Featuring up to 11 RealScale™ chip-to-chip connections alongside an internal software-defined network, GroqCard enables near-linear multi-server and multi-rack scalability without the need for external switches.
Key Features
Fully deterministic processor provides predictable and repeatable performance with no run-to-run variation.
230 MB of on-die memory delivers large globally sharable SRAM for high-bandwidth, low-latency access to model parameters without the need for external memory.
Up to 80 TBs on-die memory bandwidth facilitates massive concurrency and data parallelism needed for bandwidth sensitive applications.
Up to 11 RealScale™ chip-to-chip connectors enable near-linear multi-server and multi-rack scalability without the need for external switches.
End-to-end on-chip protection improves uptime and reliability with error-correction code (ECC) protection throughout the entire GroqChip data path.
PCIe Gen4 x16 interface delivers up to 31.5GBs of bi-directional bandwidth in an industry standard interface for fast device and network connections – all with a lightweight open source driver and no CPU burden.
Specifications
Form Factor
Dual width, full height, ¾ length PCI Express Gen4 x16 adapter
Performance
Up to 750 TOPs, 188 TFLOPs (INT8, FP16 @900 MHz)
Memory
230 MB SRAM per chip
Up to 80 TB/s on-die memory bandwidth
Chip Scaling
Up to 11 RealScale™ chip-to-chip connectors
Numerics
- MXM: INT8, FP16
- VXM: INT8, INT16, INT32, FP16, FP32
Power
- Max: 375W
- TDP: 275
- Typical: 240W