Comparison brief
89.3% sustained utilization on transformer attention.
Peak TOPS lies. What matters is sustained utilization on the kernel that dominates inference. The Invotet Unified Engine sustains 89.3% of theoretical peak on transformer attention — measured today on the production FPGA module that ships in every Invotet accelerator. Jetson-class GPUs run the same kernel at 8–12% utilization. The architectural advantage is real, deployable, and shipping — not a tape-out projection.
Test platform
Invotet Unified Engine v1.1 · Kintex UltraScale+ FPGA
Every claim below is measured on hardware that ships.
The Invotet Unified Engine runs on a Kintex UltraScale+ FPGA at 333 MHz, 4.5 W — the same fabric inside every AeroScale V1 and AstroCore S module. Every operator and attention number on this page is measured on that hardware. No chip-down ASIC. No tape-out projection.
- Engine
- Invotet Unified Engine
- Test platform
- Kintex UltraScale+
- Engine clock
- 333MHz
- Power
- 4.5W
- Theoretical peak (BF16)
- 42GFLOPS
- Sustained on attention
- 37.50GFLOPS
- Data path
- 256bits
- Memory interface
- DDR4-1333
- On-chip SRAM
- 1.05MB
v1.1
Production FPGA module
89.3% of peak
32-bit
Operator-level throughput
BF16 sustained throughput on the operators that dominate inference.
All measurements include DMA in/out and run at M = K = N = 1024. Utilization is achieved throughput as a fraction of theoretical peak (42 GFLOPS at 4.5 W).
| Op | Kind | Achieved (GFLOPS) | Utilization | Latency (ms) |
|---|---|---|---|---|
| A·Bᵀ | MatMul | 40.17 | 95.6% | 53.3 |
| A·Bᵀ + C | MatMul + bias | 40.10 | 95.5% | 53.5 |
| GELU(A·Bᵀ) | MatMul + GELU | 40.02 | 95.3% | 53.7 |
| SiLU(A·Bᵀ) | MatMul + SiLU | 40.02 | 95.3% | 53.7 |
| softmax(A·Bᵀ) | MatMul + softmax | 37.76 | 89.9% | 57.0 |
| LayerNorm | Normalization | 5.90 | Memory-bound | 1.24 |
| RMSNorm | Normalization | 4.81 | Memory-bound | 0.87 |
| Memory-efficient attention· seq = 3072, head_dim = 256, bias off | Fused attention | 37.50 | 89.3% | — |
Attention utilization
89.3% of theoretical peak on transformer attention.
Sustained throughput on the memory-efficient attention kernel across sequence length and head dimension. Peak measured at seq = 3072, head_dim = 256.
Theoretical peak: 42 GFLOPS at 4.5 W
| Seq len | hd=64 | hd=128 | hd=256 | Peak util |
|---|---|---|---|---|
| 64 | 11.0 | 14.0 | 16.0 | 38% |
| 128 | 18.0 | 22.0 | 24.5 | 58% |
| 256 | 24.5 | 28.0 | 30.5 | 73% |
| 512 | 30.0 | 32.5 | 34.5 | 82% |
| 1024 | 33.0 | 35.5 | 36.5 | 87% |
| 2048 | 35.0 | 36.0 | 37.0 | 88% |
| 3072 | 36.0 | 36.5 | 37.5 | 89% |
| 4096 | 34.5 | 35.5 | 36.0 | 86% |
| 6144 | 35.0 | 36.0 | 36.5 | 87% |
| 8192 | 34.0 | 35.0 | 35.5 | 85% |
Side-by-side
NVIDIA Jetson Orin Nano 8GB vs. Invotet Unified Engine.
Production module · Ampere · 25 W (Super profile)
Same workload, two columns. The Invotet Unified Engine sustains 89% utilization on attention; Jetson sustains 8–12% on the same kernel. Every number on the Invotet side is measured on the production FPGA module.
| Metric | Invotet Unified Engine | NVIDIA Jetson Orin Nano 8GB |
|---|---|---|
Theoretical BF16 peak Theoretical peak. Useful only as a ceiling — most hardware never reaches it on a real model. | 42GFLOPS Measured · FPGA module | ~1,600GFLOPS Public · vendor data |
Sustained on attention (BF16) What each module actually delivers under the kernel that dominates inference. | 37.5GFLOPS Measured · FPGA module | 130–190GFLOPS Public · vendor data |
Utilization on attention The architectural advantage. Measured today on the production FPGA module. | 89.3% of peak Measured · FPGA module | 8–12% of peak Public · vendor data |
Power | 4.5W Measured · FPGA module | 15–25W Public · vendor data |
Useful FLOPs per watt (attention) Sustained throughput on attention divided by module power. The Invotet number is measured; no ASIC roadmap required. | 8.3GFLOPS/W Measured · FPGA module | ~6GFLOPS/W Public · vendor data |
Methodology
What's measured, and where the comparator data comes from.
- 01All Invotet Unified Engine numbers in the operator and attention tables are measured on the production FPGA module — Kintex UltraScale+ at 333 MHz, 4.5 W, BF16. Source: Invotet Unified Engine v1.1 reference (operator benchmarks, M = K = N = 1024, including DMA in/out).
- 02Jetson Orin Nano values use the publicly published module specs for Orin Nano 8GB / Super profile, plus 8–12% sustained utilization on transformer attention. The utilization range is consistent with widely-cited llama.cpp and TensorRT-LLM community benchmarks for Ampere-class Jetson on attention-dominated decode workloads.
- 03The load-bearing claim is sustained utilization on transformer attention — 89.3% on the Invotet FPGA module vs. 8–12% on Jetson on the same kernel. Every other row on the comparison table is a downstream consequence of that architectural difference, measured on hardware shipping today.
Need the full reference, an NDA, or a custom comparison run?
Open a sales conversation. We will route the full Invotet Unified Engine v1.1 reference and a tailored comparison against your candidate silicon to your inbox the same day.
