May 17, 2026

llmfit: Open-Source Tool to Find Which LLM Runs on Your Hardware

llmfit scans hundreds of models and providers to recommend the best LLM for your exact hardware—RAM, VRAM, CPU—all from a single terminal command, with an interactive TUI and real community benchmarks.

Have you ever spent hours downloading a large language model only to find it won't even load on your GPU? llmfit solves that frustration by telling you upfront which models will actually run on your hardware—before you download a single byte.

What is llmfit?

llmfit is an open-source terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU. It detects your hardware, scores each model across quality, speed, fit, and context dimensions, and tells you which ones will run well on your machine. It ships with an interactive TUI (default) and a classic CLI mode, supports multi-GPU setups, MoE architectures, dynamic quantization selection, speed estimation, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).

Choosing the right LLM for local inference is a guessing game. You see a model on Hugging Face, check its parameter count, and hope your GPU memory matches. Real-world performance depends on quantization, context length, CPU speed, and runtime backend. llmfit automates that analysis—it reads your hardware, cross-references it against a database of thousands of models, and gives you a ranked list with estimated tokens per second and VRAM usage.

How it works

Hardware detection: Probes your CPU model, core count, total RAM, GPU name, VRAM, and available backends. Automatic and runs in seconds.
Model database: Curated list of hundreds of popular open-weights models with metadata—architecture, quantization options, required memory, context length, and typical performance.
Scoring engine: Each model scored on four axes: quality (based on benchmarks), speed (estimated tok/s on your hardware), fit (how well it matches your memory budget), and context (max context length). Composite score determines rank.
TUI & CLI: Default is a full-screen terminal interface for filtering, sorting, searching, and comparing models. Press b to open the community leaderboard with real tok/s data from other users. CLI mode outputs JSON for scripting.
Installation: Available via Homebrew, Scoop, MacPorts, pip/uv, Docker, or building from source with Cargo.

Quick start

Install on macOS via Homebrew:

brew install llmfit

Then run the TUI:

llmfit

llmfit terminal user interface showing model recommendations with scores and memory usage

To get a JSON recommendation for a specific use case:

llmfit recommend --use-case coding | jq '.models[].name'

This prints the names of the top models for coding tasks based on your hardware.

Real-world example

Suppose you have an NVIDIA RTX 3090 (24 GB VRAM) and want a chat assistant. llmfit shows that Mistral 7B Q4_K_M fits perfectly, while a 70B parameter model might only run at very low quantization. Here's a simulated output:

Model                    Score   Tok/s   VRAM   Fit
------------------------ ------- ------- ------ ------
mistralai/Mistral-7B-v0.1 92      45      6.2GB  Perfect
technium/Llama-3-8B-Instruct 88   38      8.1GB  Good
meta-llama/Llama-2-13B    78      25      14GB   Good
codellama/CodeLlama-34B   65      12      22GB   Marginal

Pros, cons, and alternatives

Pros:

Automatic hardware detection removes guesswork.
Huge model database with detailed metadata.
Interactive TUI with powerful filtering and sorting.
Community leaderboard gives real-world performance data.
Supports multiple runtime backends (Ollama, llama.cpp, MLX, etc.).

Cons:

Speed estimates are approximate and may not match actual inference.
No built-in model download manager for all providers (some backends handle their own).
Primarily focused on local inference; less useful for cloud-hosted models.

Alternatives:

Ollama: Simpler for running models locally, but less hardware analysis and no cross-model recommendation.
LocalAI: Self-hosted API for local models, focuses on serving rather than selection.
LM Studio: GUI application to discover and run models, but less flexible than a terminal tool.

My verdict — should you use it?

llmfit is perfect for anyone who runs LLMs locally and wants to optimize their hardware investment. If you've ever downloaded a model only to hit an OOM error, you'll appreciate the upfront analysis. If you strictly use cloud APIs or run a single known model, you probably don't need it. For tinkerers and self-hosters, it's a must-have utility.

llmfit is MIT-licensed and built in Rust. Check it out on GitHub.

What is llmfit?

How it works

Hardware detection: Probes your CPU model, core count, total RAM, GPU name, VRAM, and available backends. Automatic and runs in seconds.

Model database: Curated list of hundreds of popular open-weights models with metadata—architecture, quantization options, required memory, context length, and typical performance.

Scoring engine: Each model scored on four axes: quality (based on benchmarks), speed (estimated tok/s on your hardware), fit (how well it matches your memory budget), and context (max context length). Composite score determines rank.

TUI & CLI: Default is a full-screen terminal interface for filtering, sorting, searching, and comparing models. Press b to open the community leaderboard with real tok/s data from other users. CLI mode outputs JSON for scripting.

Installation: Available via Homebrew, Scoop, MacPorts, pip/uv, Docker, or building from source with Cargo.

Real-world example

Model Score Tok/s VRAM Fit ------------------------ ------- ------- ------ ------ mistralai/Mistral-7B-v0.1 92 45 6.2GB Perfect technium/Llama-3-8B-Instruct 88 38 8.1GB Good meta-llama/Llama-2-13B 78 25 14GB Good codellama/CodeLlama-34B 65 12 22GB Marginal

Pros, cons, and alternatives

Pros:

Automatic hardware detection removes guesswork.

Huge model database with detailed metadata.

Interactive TUI with powerful filtering and sorting.

Community leaderboard gives real-world performance data.

Supports multiple runtime backends (Ollama, llama.cpp, MLX, etc.).

Cons:

Speed estimates are approximate and may not match actual inference.

No built-in model download manager for all providers (some backends handle their own).

Primarily focused on local inference; less useful for cloud-hosted models.

Alternatives:

Ollama: Simpler for running models locally, but less hardware analysis and no cross-model recommendation.

LocalAI: Self-hosted API for local models, focuses on serving rather than selection.

LM Studio: GUI application to discover and run models, but less flexible than a terminal tool.

My verdict — should you use it?

llmfit is MIT-licensed and built in Rust. Check it out on GitHub.