Open-source ML.
Ready to run.

Pick any model from the HuggingFace Hub. Install with one click.
Detection, segmentation, VLMs, speech, diffusion, classification, depth, OCR, and more.
Runs on CPU or GPU.

DETR SAM 2 SigLIP 2 InternVL Whisper FLUX SegFormer GPT-OSS Pixtral SDXL

Download for Windows latest

LocalML

⚠ Setup Python runtime → Update available · v0.3.2 5 local models

GPU: 0.01 / 8.00 GB | RAM: 9.08 / 15.71 GB | CPU: 24.00 %

Home

Installed only

Recents

detr-resnet-50

Object detection · 2m

sam2-hiera-large

Mask generation · 1h

Qwen2.5-VL-3B-Instruct

Image-text-to-text · 3h

segformer-b5-cityscapes

Segmentation · yesterday

TUESDAY, MAY 5

What would you like to run today?

Pick from your library, browse the Hub, or paste a model id.

⌕Search a task, family, or paste an HF id…

All VLM Text Segmentation SAM Detection Classify Diffusion Depth Docs / OCR ASR TTS

INSTALLED · 5

Florence-2-base

microsoft · image-text-to-text

462 MB
whisper-tiny

openai · automatic-speech-recognition

150 MB
detr-resnet-50

facebook · object-detection

165 MB
segformer-b0-ade-512

nvidia · image-segmentation

14 MB
Llama-3.2-1B-Instruct

meta-llama · text-generation

2.5 GB

SUGGESTED FOR YOU

InternVL2_5-1B

OpenGVLab · image-text-to-text

↓ 1.9 GB
SmolLM3-3B

HuggingFaceTB · text-generation

↓ 6.0 GB
sam2.1-hiera-tiny

facebook · mask-generation

↓ 150 MB
stable-diffusion-xl-base-1.0

stabilityai · text-to-image

↓ 6.9 GB
bark-small

suno · text-to-speech

↓ 1.2 GB

See it in action.

Three different models, three different tasks, all running locally on the same machine.

Input photo — input a photo from your disk

Detection output with bounding boxes — output bounding boxes & labels

facebook/detr-resnet-50 · object-detection · 167 MB · runs on CPU or GPU

Segmentation masks output — output per-pixel masks

facebook/sam-vit-base · mask-generation · 375 MB · runs on CPU or GPU

“Hi, how are you? How’s your day going?”

prompt your text

audio press play

facebook/mms-tts-eng · text-to-speech · 145 MB · runs on CPU or GPU

Not just LLMs.

Eleven task workspaces. Every major modality.

⊡

Detection

DETR, YOLOS, RT-DETR, D-FINE, Table Transformer. Draws labeled boxes server-side.

◉

Segmentation

SegFormer, Mask2Former, OneFormer, EoMT. Panoptic, instance, semantic. Composited overlays.

✦

Mask generation

SAM v1, SAM 2, SAM 2.1, SAM 3. Auto grid-sampling mode, full multi-region output.

☰

VLMs

Qwen-VL, LLaVA, Florence-2, Moondream, PaliGemma. Ask anything about an image.

♪

Speech

Whisper, Wav2Vec2, MMS for ASR. SpeechT5, Bark, VITS for TTS. Both directions, long-audio aware.

◈

Classification

ViT, ResNet, ConvNeXt, BEiT, SigLIP, CLIP. Image, zero-shot, audio. Confidence-ranked labels.

❋

Diffusion

Stable Diffusion, SDXL, FLUX, Kandinsky, PixArt. Text-to-image, img2img, inpaint.

⌨

Text generation

Llama, Mistral, Qwen, Gemma, Phi, DeepSeek. Chat-template aware, reasoning-model aware.

◐

Depth

DPT, MiDaS, ZoeDepth, Depth Anything v1/v2, Depth Pro. Single image → colorized depth map.

▤

Documents · OCR

TrOCR, Donut, LayoutLMv3, Pix2Struct. Read scanned pages, receipts, forms. Ask questions about them.

Everything in the Hub, ready to run.

200+ model families, each one verified against our architecture whitelist. If it shows up in LocalML, it loads. No broken downloads, no missing packages, no guesswork.

Detection

DETRYOLOSRT-DETRRT-DETRv2D-FINEConditional-DETRDeformable-DETRTable-TransformerOWL-ViTOWLv2Grounding-DINO

Segmentation

SegFormerMaskFormerMask2FormerOneFormerEoMTUperNetBEiTDPTDETR-panopticMobileViT

Mask generation

SAMSAM 2SAM 2.1SAM 3MedSAM

VLMs

Qwen-VLQwen2.5-VLQwen3-VLLLaVALLaVA-NextViP-LLaVAFlorence-2MoondreamPaliGemmaIdefics 2/3SmolVLMKosmos-2InternVLPixtralFastVLMLFM2-VLDeepSeek-VLJanus-ProFuyuOvisAriaGLM4VCohere2-VisionEmu3

Text generation

Llama 3/4GPT-OSSMistral 3Qwen 2/3Gemma 2/3/3nPhi 3/4DeepSeekSmolLM3OLMo 3OLMoEFalcon-H1Nemotron-HBitNetStarCoder 2CohereGraniteMiniMax

ASR · TTS

WhisperDistil-WhisperWav2Vec2MMSMoonshineParakeetSpeechT5BarkVITS

Diffusion

SD 1.5SD 2.1SDXLSD 3 / 3.5FLUX.1KandinskyPixArtSanaKolors

Classification

ViTDeiTSwinConvNeXtBEiTResNetEfficientNetMobileNetCLIPSigLIPSigLIP 2

Depth

DPTGLPNZoeDepthDepth AnythingDepth Anything v2Depth ProMiDaS

Documents · OCR

TrOCRDonutLayoutLMLayoutLMv2LayoutLMv3Pix2Struct

Runs everywhere you do.

Native installers for Windows, macOS, and Linux. CUDA · Apple MPS · CPU.

Windows

x64 · CUDA 12.4

macOS

Apple Silicon · Intel

Linux

x64 · CUDA 12.4 · CPU

First launch.

LocalML isn't code-signed yet. Your OS will warn you on first run. Here's what to expect.

Windows

SmartScreen will show a blue "Windows protected your PC" screen. Click More info, then Run anyway.

macOS

Gatekeeper will say "LocalML is damaged" or "cannot be opened". In Terminal, run sudo xattr -dr com.apple.quarantine /Applications/LocalML.app and enter your password. Or right-click the app in Finder, pick Open, then click Open again in the dialog.

Linux

Make the AppImage executable: chmod +x LocalML-*.AppImage, then double-click or run it from your terminal.

Code signing on Windows and macOS costs hundreds per year. We'll add it once the project can sustain it.

Open-source ML.Ready to run.

See it in action.

Not just LLMs.

Detection

Segmentation

Mask generation

VLMs

Speech

Classification

Diffusion

Text generation

Depth

Documents · OCR

Everything in the Hub, ready to run.

Detection

Segmentation

Mask generation

VLMs

Text generation

ASR · TTS

Diffusion

Classification

Depth

Documents · OCR

Runs everywhere you do.

First launch.

Open-source ML.
Ready to run.