Hardware Requirements
Running local models requires significant RAM/VRAM:- Capable coding models (30B+ parameters) need 24GB+ VRAM or unified memory (Apple Silicon M2 Pro/Max and above)
- Smaller models run on less hardware but may struggle with complex coding tasks
- Systems with limited VRAM: Ollama’s cloud models are an excellent free alternative — they run on Ollama’s servers with no local GPU needed
Recommended Hardware
| Hardware | Capability | Recommended Models |
|---|---|---|
| Apple Silicon M2 Pro/Max+ (32GB+) | High | qwen3-coder (local), MLX models |
| NVIDIA 3090/4090 (24GB+) | High | qwen3-coder, gpt-oss:20b |
| Mid-range GPU (12-24GB) | Mid | gpt-oss:20b, qwen2.5-coder:14b |
| Low-end GPU (under 12GB) | Low | Use Ollama Cloud models |
| CPU only | Minimal | Use Ollama Cloud (recommended) |
Ollama
Ollama runs models locally (free) or on Ollama’s cloud (no GPU needed).Installation
Quick Setup (Recommended)
Ollama 0.15+ can auto-configure Claude Code:Cloud Models (No GPU Required)
Cloud models run on Ollama’s infrastructure — ideal if your system doesn’t have enough VRAM for local models. Pull the manifest first (tiny download, the model runs remotely):Available Cloud Models
| Cloud Model | SWE-bench | Params (active) | Best For | License |
|---|---|---|---|---|
minimax-m2.5:cloud | 80.2% | 230B MoE (10B) | Coding, agentic workflows | MIT |
glm-5:cloud | 77.8% | 744B MoE (40B) | Reasoning, math, knowledge | MIT |
Local Models (Free, Private)
Local models require sufficient VRAM — 24GB+ recommended for capable coding models.Recommended Local Models
| Model | Size | VRAM Needed | Best For |
|---|---|---|---|
qwen3-coder | 30B | ~28GB | Coding tasks, large context |
gpt-oss:20b | 20B | ~16GB | Strong general-purpose |
qwen2.5-coder:14b | 14B | ~12GB | Mid-range GPUs |
qwen2.5-coder:7b | 7B | ~8GB | Limited VRAM |
Model Aliases
Create aliases for tools expecting Anthropic model names:Configuration
Override defaults in~/.ai-runner/secrets.sh:
Auto-Download Feature
When you specify a model that isn’t installed locally, Andi AIRun offers a choice between local and cloud:Usage Examples
Ollama Anthropic API Compatibility
Learn more about Ollama’s Anthropic API compatibility
LM Studio
LM Studio runs local models with Anthropic API compatibility. Especially powerful on Apple Silicon with MLX models. Requires sufficient RAM/VRAM for the model you choose.Advantages Over Ollama
- MLX model support (significantly faster on Apple Silicon)
- GGUF + MLX formats supported
- Bring your own models from HuggingFace
Installation
Download from lmstudio.aiSetup
- Download a model in LM Studio (e.g., from HuggingFace)
- Load the model in LM Studio UI
-
Start the server:
Or start from the LM Studio app’s local server tab.
-
Run Andi AIRun:
Recommended Models
For Claude Code, use models with:- 25K+ context window (required for Claude Code’s heavy context usage)
- Function calling / tool use support
openai/gpt-oss-20b- Strong general-purposeibm/granite-4-micro- Fast, efficient
Apple Silicon Optimization
LM Studio supports MLX models which are significantly faster than GGUF on M1/M2/M3/M4 chips. When downloading models, look for MLX versions for best performance.Configuration
Override defaults in~/.ai-runner/secrets.sh:
Context Window
Configure context size in LM Studio:- UI: Settings → Context Length
- Minimum recommended: 25K tokens
- Higher is better for complex coding tasks
Auto-Download Feature
When you specify a model that isn’t available, Andi AIRun will offer to download it:Usage Examples
LM Studio Claude Code Guide
Learn more about using LM Studio with Claude Code
Comparison: Ollama vs LM Studio
| Feature | Ollama | LM Studio |
|---|---|---|
| Cloud models | ✅ Yes (free) | ❌ No |
| MLX support | ❌ No | ✅ Yes (faster on Apple Silicon) |
| Model formats | Ollama format | GGUF, MLX |
| Model library | Curated | HuggingFace, custom |
| Setup | Command-line focused | GUI-focused |
| Best for | Quick start, cloud fallback | Apple Silicon, custom models |
Next Steps
Cloud Providers
Configure cloud providers for more powerful models
Switching Providers
Learn to switch between providers seamlessly