own/metal
← LLM Runners tool · /LLAMAC

llama.cpp

The C/C++ LLM inference engine that runs everywhere.

// github

★ 114.4k

last commit · today

moderate CPU only MIT

// readme · what it is

llama.cpp is the upstream that powers Ollama, LM Studio, LocalAI and most consumer LLM apps. Run it directly when you want fine control — custom quantization, exotic hardware targets, or the slimmest possible footprint. Ships a built-in OpenAI-compatible HTTP server.

// deploy notes

Compiling from source unlocks the best perf for your CPU/GPU. Pre-built binaries available.

[ ALTERNATIVE TO ]