Model Explorer
Frontier checkpoints, on the device that has to think for itself.
A working catalog of LLMs, vision-language models, and perception checkpoints validated against the Invotet SDK and Unified Engine — measured or projected on the production FPGA module that ships in every Invotet accelerator. Each row carries an evidence badge so you know what’s measured, what’s modeled from operator-level data, and what’s still pending a full inference run.
| Family | Precision | Memory | Status | |||
|---|---|---|---|---|---|---|
Llama 3 8B Instruct Memory-bandwidth-bound on the FPGA module’s DDR4 interface. | Llama | 8B | INT4 weights · BF16 act. | 4.0 GB | ~1.3tok/s | Projected |
Qwen 2.5 7B Instruct | Qwen | 7.6B | INT4 weights · BF16 act. | 3.8 GB | —tok/s | Pending |
Qwen2-VL 7B | Qwen-VL | 7.6B | INT4 weights · BF16 act. | 3.8 GB | —tok/s | Pending |
Mistral 7B Instruct v0.3 Compile path validated. Throughput benchmark pending. | Mistral | 7.3B | INT4 weights · BF16 act. | 3.7 GB | —tok/s | Pending |
LLaVA 1.6 7B Vision tower compiled; end-to-end benchmark pending. | LLaVA | 7.0B | INT4 weights · BF16 act. | 3.6 GB | —tok/s | Pending |
Phi-3 Mini 3.8B Projection derived from operator-level benchmarks; full inference run pending. | Phi | 3.8B | INT4 weights · BF16 act. | 1.9 GB | ~2.7tok/s | Projected |
PaliGemma 3B Image-conditioned generation; decode-only throughput. Vision-tower latency adds end-to-end. | Gemma | 2.9B | INT4 weights · BF16 act. | 1.5 GB | ~3.5tok/s | Projected |
Llama 3.2 1B Instruct Conversational latency on the production FPGA module. | Llama | 1.2B | INT4 weights · BF16 act. | 0.6 GB | ~10tok/s | Projected |
SAM ViT-Base Encoder benchmarked; mask decoder integration in progress. | SAM | 91M | INT4 weights · BF16 act. | 46 MB | —FPS | Pending |
DETR ResNet-50 | DETR | 41M | BF16 | 82 MB | —FPS | Pending |
YOLOv8 Medium | YOLO | 25.9M | INT8 | 52 MB | —FPS | Pending |
YOLOv8 Small | YOLO | 11.2M | INT8 | 22 MB | ~50FPS | Projected |
YOLOv8 Nano 640×640 input. Projection from operator-level kernels; full pipeline run pending. | YOLO | 3.2M | INT8 | 6 MB | ~120FPS | Projected |
Methodology
How throughput numbers are derived.
- 01Status legend — measured: number from a real inference run on the production FPGA module. projected: derived from the engine’s operator-level achieved GFLOPS plus transformer math (parameters, KV-cache, memory bandwidth) — modeled, not yet measured end-to-end. pending: compile path is validated, throughput benchmark not yet run.
- 02LLM/VLM projections use the FPGA module’s memory bandwidth (DDR4-1333, 32-bit data path → 5.33 GB/s) as the bandwidth ceiling for INT4-quantized decode. Every number reflects the hardware shipping today, not a tape-out projection.
- 03Perception FPS projections use the engine’s 89% sustained utilization on attention-equivalent kernels at the model’s input resolution. They are upper bounds for end-to-end throughput including pre/post-processing and a single-buffer pipeline.
- 04Throughput-pending rows have a working compile path through the SDK; the row exists so sales can confirm support today even if the timing run hasn’t landed.
Don't see your model?
The SDK compiles PyTorch, ONNX, and HuggingFace checkpoints — if you can export it, we can probably run it. Tell us your checkpoint family and target module and we will get it on the explorer.
