Invotet logo

Comparison brief

89.3% sustained utilization on transformer attention.

Peak TOPS lies. What matters is sustained utilization on the kernel that dominates inference. The Invotet Unified Engine sustains 89.3% of theoretical peak on transformer attention — measured today on the production FPGA module that ships in every Invotet accelerator. Jetson-class GPUs run the same kernel at 8–12% utilization. The architectural advantage is real, deployable, and shipping — not a tape-out projection.

Test platform

Invotet Unified Engine v1.1 · Kintex UltraScale+ FPGA

Every claim below is measured on hardware that ships.

The Invotet Unified Engine runs on a Kintex UltraScale+ FPGA at 333 MHz, 4.5 W — the same fabric inside every AeroScale V1 and AstroCore S module. Every operator and attention number on this page is measured on that hardware. No chip-down ASIC. No tape-out projection.

Engine
Invotet Unified Engine

v1.1

Test platform
Kintex UltraScale+

Production FPGA module

Engine clock
333MHz
Power
4.5W
Theoretical peak (BF16)
42GFLOPS
Sustained on attention
37.50GFLOPS

89.3% of peak

Data path
256bits
Memory interface
DDR4-1333

32-bit

On-chip SRAM
1.05MB

Operator-level throughput

BF16 sustained throughput on the operators that dominate inference.

All measurements include DMA in/out and run at M = K = N = 1024. Utilization is achieved throughput as a fraction of theoretical peak (42 GFLOPS at 4.5 W).

OpKindAchieved (GFLOPS)UtilizationLatency (ms)
A·BᵀMatMul40.1795.6%53.3
A·Bᵀ + CMatMul + bias40.1095.5%53.5
GELU(A·Bᵀ)MatMul + GELU40.0295.3%53.7
SiLU(A·Bᵀ)MatMul + SiLU40.0295.3%53.7
softmax(A·Bᵀ)MatMul + softmax37.7689.9%57.0
LayerNormNormalization5.90Memory-bound1.24
RMSNormNormalization4.81Memory-bound0.87
Memory-efficient attention· seq = 3072, head_dim = 256, bias offFused attention37.5089.3%

Attention utilization

89.3% of theoretical peak on transformer attention.

Sustained throughput on the memory-efficient attention kernel across sequence length and head dimension. Peak measured at seq = 3072, head_dim = 256.

Theoretical peak: 42 GFLOPS at 4.5 W

Seq lenhd=64hd=128hd=256Peak util
6411.014.016.038%
12818.022.024.558%
25624.528.030.573%
51230.032.534.582%
102433.035.536.587%
204835.036.037.088%
307236.036.537.589%
409634.535.536.086%
614435.036.036.587%
819234.035.035.585%

Side-by-side

NVIDIA Jetson Orin Nano 8GB vs. Invotet Unified Engine.

Production module · Ampere · 25 W (Super profile)

Same workload, two columns. The Invotet Unified Engine sustains 89% utilization on attention; Jetson sustains 8–12% on the same kernel. Every number on the Invotet side is measured on the production FPGA module.

MetricInvotet Unified EngineNVIDIA Jetson Orin Nano 8GB

Theoretical BF16 peak

Theoretical peak. Useful only as a ceiling — most hardware never reaches it on a real model.

42GFLOPS

Measured · FPGA module

~1,600GFLOPS

Public · vendor data

Sustained on attention (BF16)

What each module actually delivers under the kernel that dominates inference.

37.5GFLOPS

Measured · FPGA module

130–190GFLOPS

Public · vendor data

Utilization on attention

The architectural advantage. Measured today on the production FPGA module.

89.3% of peak

Measured · FPGA module

8–12% of peak

Public · vendor data

Power

4.5W

Measured · FPGA module

15–25W

Public · vendor data

Useful FLOPs per watt (attention)

Sustained throughput on attention divided by module power. The Invotet number is measured; no ASIC roadmap required.

8.3GFLOPS/W

Measured · FPGA module

~6GFLOPS/W

Public · vendor data

Methodology

What's measured, and where the comparator data comes from.

  1. 01All Invotet Unified Engine numbers in the operator and attention tables are measured on the production FPGA module — Kintex UltraScale+ at 333 MHz, 4.5 W, BF16. Source: Invotet Unified Engine v1.1 reference (operator benchmarks, M = K = N = 1024, including DMA in/out).
  2. 02Jetson Orin Nano values use the publicly published module specs for Orin Nano 8GB / Super profile, plus 8–12% sustained utilization on transformer attention. The utilization range is consistent with widely-cited llama.cpp and TensorRT-LLM community benchmarks for Ampere-class Jetson on attention-dominated decode workloads.
  3. 03The load-bearing claim is sustained utilization on transformer attention — 89.3% on the Invotet FPGA module vs. 8–12% on Jetson on the same kernel. Every other row on the comparison table is a downstream consequence of that architectural difference, measured on hardware shipping today.

Need the full reference, an NDA, or a custom comparison run?

Open a sales conversation. We will route the full Invotet Unified Engine v1.1 reference and a tailored comparison against your candidate silicon to your inbox the same day.