Architecture

A compute engine designed from scratch for transformer workloads.

Invotet modules don’t ride a repurposed mobile GPU. The compute engine inside was derived from the full attention kernel — matrix multiplication, quantization, normalization, and data movement — and optimized for the operations that actually run a modern model.

Headline figure

Up to 20×

more efficient than NVIDIA Jetson

Per-watt efficiency on the workloads that actually run a modern model.

Per-watt efficiency on representative transformer workloads, vs. comparable Jetson-class modules.

See the comparison brief

Architecture pillars

Three load-bearing decisions in the compute engine.

Pillar 01
01
Unified compute engine
Systolic arrays and vector processing combined in a single engine, with multiple architectural optimizations. The result is one of the smallest logic-footprint compute engines built so far — extremely high hardware utilization, up to 20× more efficient computation.
Pillar 02
02
Logic circuits optimized for GPT
Matrix-matrix multiplication, softmax, element-wise operations, and the rest of the transformer operator set are executed natively by purpose-built logic — not emulated through general-purpose paths.
Pillar 03
03
Attention-kernel-driven design
Architecture derived from the operations that dominate real workloads. Optimized data paths maximize utilization and minimize the memory movement that bottlenecks general-purpose accelerators on modern models.

Talk to the engine

The SDK turns the architecture into a one-line deploy.

PyTorch, ONNX, or HuggingFace checkpoints compile directly for the module. No CUDA in the loop.

Explore the SDK Contact Sales

A compute engine designed from scratch for transformer workloads.

Per-watt efficiency on the workloads that actually run a modern model.

Three load-bearing decisions in the compute engine.

Unified compute engine

Logic circuits optimized for GPT

Attention-kernel-driven design

The SDK turns the architecture into a one-line deploy.