Model Explorer

Frontier checkpoints, on the device that has to think for itself.

A working catalog of LLMs, vision-language models, and perception checkpoints validated against the Invotet SDK and Unified Engine — measured or projected on the production FPGA module that ships in every Invotet accelerator. Each row carries an evidence badge so you know what’s measured, what’s modeled from operator-level data, and what’s still pending a full inference run.

Request Early Access See the benchmarks

	Family		Precision	Memory		Status
Llama 3 8B Instruct Memory-bandwidth-bound on the FPGA module’s DDR4 interface.	Llama	8B	INT4 weights · BF16 act.	4.0 GB	~1.3tok/s	Projected
Qwen 2.5 7B Instruct	Qwen	7.6B	INT4 weights · BF16 act.	3.8 GB	—tok/s	Pending
Qwen2-VL 7B	Qwen-VL	7.6B	INT4 weights · BF16 act.	3.8 GB	—tok/s	Pending
Mistral 7B Instruct v0.3 Compile path validated. Throughput benchmark pending.	Mistral	7.3B	INT4 weights · BF16 act.	3.7 GB	—tok/s	Pending
LLaVA 1.6 7B Vision tower compiled; end-to-end benchmark pending.	LLaVA	7.0B	INT4 weights · BF16 act.	3.6 GB	—tok/s	Pending
Phi-3 Mini 3.8B Projection derived from operator-level benchmarks; full inference run pending.	Phi	3.8B	INT4 weights · BF16 act.	1.9 GB	~2.7tok/s	Projected
PaliGemma 3B Image-conditioned generation; decode-only throughput. Vision-tower latency adds end-to-end.	Gemma	2.9B	INT4 weights · BF16 act.	1.5 GB	~3.5tok/s	Projected
Llama 3.2 1B Instruct Conversational latency on the production FPGA module.	Llama	1.2B	INT4 weights · BF16 act.	0.6 GB	~10tok/s	Projected
SAM ViT-Base Encoder benchmarked; mask decoder integration in progress.	SAM	91M	INT4 weights · BF16 act.	46 MB	—FPS	Pending
DETR ResNet-50	DETR	41M	BF16	82 MB	—FPS	Pending
YOLOv8 Medium	YOLO	25.9M	INT8	52 MB	—FPS	Pending
YOLOv8 Small	YOLO	11.2M	INT8	22 MB	~50FPS	Projected
YOLOv8 Nano 640×640 input. Projection from operator-level kernels; full pipeline run pending.	YOLO	3.2M	INT8	6 MB	~120FPS	Projected

Methodology

How throughput numbers are derived.

01Status legend — measured: number from a real inference run on the production FPGA module. projected: derived from the engine’s operator-level achieved GFLOPS plus transformer math (parameters, KV-cache, memory bandwidth) — modeled, not yet measured end-to-end. pending: compile path is validated, throughput benchmark not yet run.
02LLM/VLM projections use the FPGA module’s memory bandwidth (DDR4-1333, 32-bit data path → 5.33 GB/s) as the bandwidth ceiling for INT4-quantized decode. Every number reflects the hardware shipping today, not a tape-out projection.
03Perception FPS projections use the engine’s 89% sustained utilization on attention-equivalent kernels at the model’s input resolution. They are upper bounds for end-to-end throughput including pre/post-processing and a single-buffer pipeline.
04Throughput-pending rows have a working compile path through the SDK; the row exists so sales can confirm support today even if the timing run hasn’t landed.

Don't see your model?

The SDK compiles PyTorch, ONNX, and HuggingFace checkpoints — if you can export it, we can probably run it. Tell us your checkpoint family and target module and we will get it on the explorer.

Contact Sales