Architecture
A compute engine designed from scratch for transformer workloads.
Invotet modules don’t ride a repurposed mobile GPU. The compute engine inside was derived from the full attention kernel — matrix multiplication, quantization, normalization, and data movement — and optimized for the operations that actually run a modern model.
Headline figure
Up to 20×
more efficient than NVIDIA Jetson
Per-watt efficiency on the workloads that actually run a modern model.
Per-watt efficiency on representative transformer workloads, vs. comparable Jetson-class modules.
Architecture pillars
Three load-bearing decisions in the compute engine.
- Pillar 01
01
Unified compute engine
Systolic arrays and vector processing combined in a single engine, with multiple architectural optimizations. The result is one of the smallest logic-footprint compute engines built so far — extremely high hardware utilization, up to 20× more efficient computation.
- Pillar 02
02
Logic circuits optimized for GPT
Matrix-matrix multiplication, softmax, element-wise operations, and the rest of the transformer operator set are executed natively by purpose-built logic — not emulated through general-purpose paths.
- Pillar 03
03
Attention-kernel-driven design
Architecture derived from the operations that dominate real workloads. Optimized data paths maximize utilization and minimize the memory movement that bottlenecks general-purpose accelerators on modern models.
Talk to the engine
The SDK turns the architecture into a one-line deploy.
PyTorch, ONNX, or HuggingFace checkpoints compile directly for the module. No CUDA in the loop.
