Comparison brief

89.3% sustained utilization on transformer attention.

Peak TOPS lies. What matters is sustained utilization on the kernel that dominates inference. The Invotet Unified Engine sustains 89.3% of theoretical peak on transformer attention — measured today on the production FPGA module that ships in every Invotet accelerator. Jetson-class GPUs run the same kernel at 8–12% utilization. The architectural advantage is real, deployable, and shipping — not a tape-out projection.

Contact Sales See the comparison

Test platform

Invotet Unified Engine v1.1 · Kintex UltraScale+ FPGA

Every claim below is measured on hardware that ships.

The Invotet Unified Engine runs on a Kintex UltraScale+ FPGA at 333 MHz, 4.5 W — the same fabric inside every AeroScale V1 and AstroCore S module. Every operator and attention number on this page is measured on that hardware. No chip-down ASIC. No tape-out projection.

Engine: Invotet Unified Engine
Test platform: Kintex UltraScale+
Engine clock: 333MHz
Power: 4.5W
Theoretical peak (BF16): 42GFLOPS
Sustained on attention: 37.50GFLOPS
Data path: 256bits
Memory interface: DDR4-1333
On-chip SRAM: 1.05MB

Operator-level throughput

BF16 sustained throughput on the operators that dominate inference.

All measurements include DMA in/out and run at M = K = N = 1024. Utilization is achieved throughput as a fraction of theoretical peak (42 GFLOPS at 4.5 W).

Op	Kind	Achieved (GFLOPS)	Utilization	Latency (ms)
A·Bᵀ	MatMul	40.17	95.6%	53.3
A·Bᵀ + C	MatMul + bias	40.10	95.5%	53.5
GELU(A·Bᵀ)	MatMul + GELU	40.02	95.3%	53.7
SiLU(A·Bᵀ)	MatMul + SiLU	40.02	95.3%	53.7
softmax(A·Bᵀ)	MatMul + softmax	37.76	89.9%	57.0
LayerNorm	Normalization	5.90	Memory-bound	1.24
RMSNorm	Normalization	4.81	Memory-bound	0.87
Memory-efficient attention· seq = 3072, head_dim = 256, bias off	Fused attention	37.50	89.3%	—

Attention utilization

89.3% of theoretical peak on transformer attention.

Sustained throughput on the memory-efficient attention kernel across sequence length and head dimension. Peak measured at seq = 3072, head_dim = 256.

Theoretical peak: 42 GFLOPS at 4.5 W

Seq len	hd=64	hd=128	hd=256	Peak util
64	11.0	14.0	16.0	38%
128	18.0	22.0	24.5	58%
256	24.5	28.0	30.5	73%
512	30.0	32.5	34.5	82%
1024	33.0	35.5	36.5	87%
2048	35.0	36.0	37.0	88%
3072	36.0	36.5	37.5	89%
4096	34.5	35.5	36.0	86%
6144	35.0	36.0	36.5	87%
8192	34.0	35.0	35.5	85%

Side-by-side

NVIDIA Jetson Orin Nano 8GB vs. Invotet Unified Engine.

Production module · Ampere · 25 W (Super profile)

Same workload, two columns. The Invotet Unified Engine sustains 89% utilization on attention; Jetson sustains 8–12% on the same kernel. Every number on the Invotet side is measured on the production FPGA module.

Metric	Invotet Unified Engine	NVIDIA Jetson Orin Nano 8GB
Theoretical BF16 peak Theoretical peak. Useful only as a ceiling — most hardware never reaches it on a real model.	42GFLOPS Measured · FPGA module	~1,600GFLOPS Public · vendor data
Sustained on attention (BF16) What each module actually delivers under the kernel that dominates inference.	37.5GFLOPS Measured · FPGA module	130–190GFLOPS Public · vendor data
Utilization on attention The architectural advantage. Measured today on the production FPGA module.	89.3% of peak Measured · FPGA module	8–12% of peak Public · vendor data
Power	4.5W Measured · FPGA module	15–25W Public · vendor data
Useful FLOPs per watt (attention) Sustained throughput on attention divided by module power. The Invotet number is measured; no ASIC roadmap required.	8.3GFLOPS/W Measured · FPGA module	~6GFLOPS/W Public · vendor data

Metric

Invotet Unified Engine

NVIDIA Jetson Orin Nano 8GB

Theoretical BF16 peak

Theoretical peak. Useful only as a ceiling — most hardware never reaches it on a real model.

42GFLOPS

Measured · FPGA module

~1,600GFLOPS

Public · vendor data

Sustained on attention (BF16)

What each module actually delivers under the kernel that dominates inference.

37.5GFLOPS

Measured · FPGA module

130–190GFLOPS

Public · vendor data

Utilization on attention

The architectural advantage. Measured today on the production FPGA module.

89.3% of peak

Measured · FPGA module

8–12% of peak

Public · vendor data

Power

4.5W

Measured · FPGA module

15–25W

Public · vendor data

Useful FLOPs per watt (attention)

Sustained throughput on attention divided by module power. The Invotet number is measured; no ASIC roadmap required.

8.3GFLOPS/W

Measured · FPGA module

~6GFLOPS/W

Public · vendor data

Methodology

What's measured, and where the comparator data comes from.

01All Invotet Unified Engine numbers in the operator and attention tables are measured on the production FPGA module — Kintex UltraScale+ at 333 MHz, 4.5 W, BF16. Source: Invotet Unified Engine v1.1 reference (operator benchmarks, M = K = N = 1024, including DMA in/out).
02Jetson Orin Nano values use the publicly published module specs for Orin Nano 8GB / Super profile, plus 8–12% sustained utilization on transformer attention. The utilization range is consistent with widely-cited llama.cpp and TensorRT-LLM community benchmarks for Ampere-class Jetson on attention-dominated decode workloads.
03The load-bearing claim is sustained utilization on transformer attention — 89.3% on the Invotet FPGA module vs. 8–12% on Jetson on the same kernel. Every other row on the comparison table is a downstream consequence of that architectural difference, measured on hardware shipping today.

Need the full reference, an NDA, or a custom comparison run?

Open a sales conversation. We will route the full Invotet Unified Engine v1.1 reference and a tailored comparison against your candidate silicon to your inbox the same day.

Contact Sales Model Explorer