Open-source ML.
Ready to run.

Pick any model from the HuggingFace Hub. Install with one click.
Detection, segmentation, VLMs, speech, diffusion, classification, depth, OCR, and more.
Runs on CPU or GPU.

DETR SAM 2 SigLIP 2 InternVL Whisper FLUX SegFormer GPT-OSS Pixtral SDXL
LocalML
Setup Python runtime Update available · v0.3.2 5 local models
GPU: 0.01 / 8.00 GB | RAM: 9.08 / 15.71 GB | CPU: 24.00 %
Home
Installed only
Recents
detr-resnet-50
Object detection · 2m
sam2-hiera-large
Mask generation · 1h
Qwen2.5-VL-3B-Instruct
Image-text-to-text · 3h
segformer-b5-cityscapes
Segmentation · yesterday
TUESDAY, MAY 5
What would you like to run today?
Pick from your library, browse the Hub, or paste a model id.
All VLM Text Segmentation SAM Detection Classify Diffusion Depth Docs / OCR ASR TTS
INSTALLED · 5
  • Florence-2-base
    microsoft · image-text-to-text
    462 MB
  • whisper-tiny
    openai · automatic-speech-recognition
    150 MB
  • detr-resnet-50
    facebook · object-detection
    165 MB
  • segformer-b0-ade-512
    nvidia · image-segmentation
    14 MB
  • Llama-3.2-1B-Instruct
    meta-llama · text-generation
    2.5 GB
SUGGESTED FOR YOU
  • InternVL2_5-1B
    OpenGVLab · image-text-to-text
    ↓ 1.9 GB
  • SmolLM3-3B
    HuggingFaceTB · text-generation
    ↓ 6.0 GB
  • sam2.1-hiera-tiny
    facebook · mask-generation
    ↓ 150 MB
  • stable-diffusion-xl-base-1.0
    stabilityai · text-to-image
    ↓ 6.9 GB
  • bark-small
    suno · text-to-speech
    ↓ 1.2 GB

See it in action.

Three different models, three different tasks, all running locally on the same machine.

Input photo
input a photo from your disk
Detection output with bounding boxes
output bounding boxes & labels
facebook/detr-resnet-50 · object-detection · 167 MB · runs on CPU or GPU
Input photo
input the same photo
Segmentation masks output
output per-pixel masks
facebook/sam-vit-base · mask-generation · 375 MB · runs on CPU or GPU
“Hi, how are you? How’s your day going?”
prompt your text
audio press play
facebook/mms-tts-eng · text-to-speech · 145 MB · runs on CPU or GPU

Not just LLMs.

Eleven task workspaces. Every major modality.

Detection

DETR, YOLOS, RT-DETR, D-FINE, Table Transformer. Draws labeled boxes server-side.

Segmentation

SegFormer, Mask2Former, OneFormer, EoMT. Panoptic, instance, semantic. Composited overlays.

Mask generation

SAM v1, SAM 2, SAM 2.1, SAM 3. Auto grid-sampling mode, full multi-region output.

VLMs

Qwen-VL, LLaVA, Florence-2, Moondream, PaliGemma. Ask anything about an image.

Speech

Whisper, Wav2Vec2, MMS for ASR. SpeechT5, Bark, VITS for TTS. Both directions, long-audio aware.

Classification

ViT, ResNet, ConvNeXt, BEiT, SigLIP, CLIP. Image, zero-shot, audio. Confidence-ranked labels.

Diffusion

Stable Diffusion, SDXL, FLUX, Kandinsky, PixArt. Text-to-image, img2img, inpaint.

Text generation

Llama, Mistral, Qwen, Gemma, Phi, DeepSeek. Chat-template aware, reasoning-model aware.

Depth

DPT, MiDaS, ZoeDepth, Depth Anything v1/v2, Depth Pro. Single image → colorized depth map.

Documents · OCR

TrOCR, Donut, LayoutLMv3, Pix2Struct. Read scanned pages, receipts, forms. Ask questions about them.

Everything in the Hub, ready to run.

200+ model families, each one verified against our architecture whitelist. If it shows up in LocalML, it loads. No broken downloads, no missing packages, no guesswork.

Detection

DETRYOLOSRT-DETRRT-DETRv2D-FINEConditional-DETRDeformable-DETRTable-TransformerOWL-ViTOWLv2Grounding-DINO

Segmentation

SegFormerMaskFormerMask2FormerOneFormerEoMTUperNetBEiTDPTDETR-panopticMobileViT

Mask generation

SAMSAM 2SAM 2.1SAM 3MedSAM

VLMs

Qwen-VLQwen2.5-VLQwen3-VLLLaVALLaVA-NextViP-LLaVAFlorence-2MoondreamPaliGemmaIdefics 2/3SmolVLMKosmos-2InternVLPixtralFastVLMLFM2-VLDeepSeek-VLJanus-ProFuyuOvisAriaGLM4VCohere2-VisionEmu3

Text generation

Llama 3/4GPT-OSSMistral 3Qwen 2/3Gemma 2/3/3nPhi 3/4DeepSeekSmolLM3OLMo 3OLMoEFalcon-H1Nemotron-HBitNetStarCoder 2CohereGraniteMiniMax

ASR · TTS

WhisperDistil-WhisperWav2Vec2MMSMoonshineParakeetSpeechT5BarkVITS

Diffusion

SD 1.5SD 2.1SDXLSD 3 / 3.5FLUX.1KandinskyPixArtSanaKolors

Classification

ViTDeiTSwinConvNeXtBEiTResNetEfficientNetMobileNetCLIPSigLIPSigLIP 2

Depth

DPTGLPNZoeDepthDepth AnythingDepth Anything v2Depth ProMiDaS

Documents · OCR

TrOCRDonutLayoutLMLayoutLMv2LayoutLMv3Pix2Struct

Runs everywhere you do.

Native installers for Windows, macOS, and Linux. CUDA · Apple MPS · CPU.

First launch.

LocalML isn't code-signed yet. Your OS will warn you on first run. Here's what to expect.

Windows

SmartScreen will show a blue "Windows protected your PC" screen. Click More info, then Run anyway.

macOS

Gatekeeper will say "LocalML is damaged" or "cannot be opened". In Terminal, run sudo xattr -dr com.apple.quarantine /Applications/LocalML.app and enter your password. Or right-click the app in Finder, pick Open, then click Open again in the dialog.

Linux

Make the AppImage executable: chmod +x LocalML-*.AppImage, then double-click or run it from your terminal.

Code signing on Windows and macOS costs hundreds per year. We'll add it once the project can sustain it.