Invotet logo

Model Explorer

Frontier checkpoints, on the device that has to think for itself.

A working catalog of LLMs, vision-language models, and perception checkpoints validated against the Invotet SDK and Unified Engine — measured or projected on the production FPGA module that ships in every Invotet accelerator. Each row carries an evidence badge so you know what’s measured, what’s modeled from operator-level data, and what’s still pending a full inference run.

FamilyPrecisionMemoryStatus

Llama 3 8B Instruct

Memory-bandwidth-bound on the FPGA module’s DDR4 interface.

Llama8BINT4 weights · BF16 act.4.0 GB

~1.3tok/s

Projected

Qwen 2.5 7B Instruct

Qwen7.6BINT4 weights · BF16 act.3.8 GB

tok/s

Pending

Qwen2-VL 7B

Qwen-VL7.6BINT4 weights · BF16 act.3.8 GB

tok/s

Pending

Mistral 7B Instruct v0.3

Compile path validated. Throughput benchmark pending.

Mistral7.3BINT4 weights · BF16 act.3.7 GB

tok/s

Pending

LLaVA 1.6 7B

Vision tower compiled; end-to-end benchmark pending.

LLaVA7.0BINT4 weights · BF16 act.3.6 GB

tok/s

Pending

Phi-3 Mini 3.8B

Projection derived from operator-level benchmarks; full inference run pending.

Phi3.8BINT4 weights · BF16 act.1.9 GB

~2.7tok/s

Projected

PaliGemma 3B

Image-conditioned generation; decode-only throughput. Vision-tower latency adds end-to-end.

Gemma2.9BINT4 weights · BF16 act.1.5 GB

~3.5tok/s

Projected

Llama 3.2 1B Instruct

Conversational latency on the production FPGA module.

Llama1.2BINT4 weights · BF16 act.0.6 GB

~10tok/s

Projected

SAM ViT-Base

Encoder benchmarked; mask decoder integration in progress.

SAM91MINT4 weights · BF16 act.46 MB

FPS

Pending

DETR ResNet-50

DETR41MBF1682 MB

FPS

Pending

YOLOv8 Medium

YOLO25.9MINT852 MB

FPS

Pending

YOLOv8 Small

YOLO11.2MINT822 MB

~50FPS

Projected

YOLOv8 Nano

640×640 input. Projection from operator-level kernels; full pipeline run pending.

YOLO3.2MINT86 MB

~120FPS

Projected

Methodology

How throughput numbers are derived.

  1. 01Status legend — measured: number from a real inference run on the production FPGA module. projected: derived from the engine’s operator-level achieved GFLOPS plus transformer math (parameters, KV-cache, memory bandwidth) — modeled, not yet measured end-to-end. pending: compile path is validated, throughput benchmark not yet run.
  2. 02LLM/VLM projections use the FPGA module’s memory bandwidth (DDR4-1333, 32-bit data path → 5.33 GB/s) as the bandwidth ceiling for INT4-quantized decode. Every number reflects the hardware shipping today, not a tape-out projection.
  3. 03Perception FPS projections use the engine’s 89% sustained utilization on attention-equivalent kernels at the model’s input resolution. They are upper bounds for end-to-end throughput including pre/post-processing and a single-buffer pipeline.
  4. 04Throughput-pending rows have a working compile path through the SDK; the row exists so sales can confirm support today even if the timing run hasn’t landed.

Don't see your model?

The SDK compiles PyTorch, ONNX, and HuggingFace checkpoints — if you can export it, we can probably run it. Tell us your checkpoint family and target module and we will get it on the explorer.