← LLM Runners tool · /LLAMAC

llama.cpp

The C/C++ LLM inference engine that runs everywhere.

// github

★ 121.0k

last commit · 1 day ago

moderate CPU only MIT

// readme · what it is

llama.cpp is the upstream that powers Ollama, LM Studio, LocalAI and most consumer LLM apps. Run it directly when you want fine control — custom quantization, exotic hardware targets, or the slimmest possible footprint. Ships a built-in OpenAI-compatible HTTP server.

// deploy notes

Compiling from source unlocks the best perf for your CPU/GPU. Pre-built binaries available.

[ ALTERNATIVE TO ]

OpenAI API

// links

website https://github.com/ggerganov/llama.cpp
github github.com/ggerganov/llama.cpp