Invotet logo

Architecture

A compute engine designed from scratch for transformer workloads.

Invotet modules don’t ride a repurposed mobile GPU. The compute engine inside was derived from the full attention kernel — matrix multiplication, quantization, normalization, and data movement — and optimized for the operations that actually run a modern model.

Headline figure

Up to 20×

more efficient than NVIDIA Jetson

Per-watt efficiency on the workloads that actually run a modern model.

Per-watt efficiency on representative transformer workloads, vs. comparable Jetson-class modules.

Architecture pillars

Three load-bearing decisions in the compute engine.

  1. Pillar 01

    01

    Unified compute engine

    Systolic arrays and vector processing combined in a single engine, with multiple architectural optimizations. The result is one of the smallest logic-footprint compute engines built so far — extremely high hardware utilization, up to 20× more efficient computation.

  2. Pillar 02

    02

    Logic circuits optimized for GPT

    Matrix-matrix multiplication, softmax, element-wise operations, and the rest of the transformer operator set are executed natively by purpose-built logic — not emulated through general-purpose paths.

  3. Pillar 03

    03

    Attention-kernel-driven design

    Architecture derived from the operations that dominate real workloads. Optimized data paths maximize utilization and minimize the memory movement that bottlenecks general-purpose accelerators on modern models.

Talk to the engine

The SDK turns the architecture into a one-line deploy.

PyTorch, ONNX, or HuggingFace checkpoints compile directly for the module. No CUDA in the loop.