Why Local LLMs Matter
Cloud APIs (OpenAI, Claude, Gemini) are powerful but come with trade-offs:
- Privacy — Your data is sent to third-party servers
- Latency — Network round-trips add 500ms-2s per request
- Cost — High-volume inference becomes expensive
Ollama solves this by running open-source LLMs locally on your server. Zero external calls, instant inference, unlimited usage.
Installation (Linux/Mac)
One command:
curl -fsSL https://ollama.com/install.sh | sh
Start the service:
ollama serve
That's it. Ollama is now listening on http://localhost:11434
Popular Models
- codellama:13b — Code generation, Python/JavaScript/Odoo Python. 4GB RAM, 200ms response
- mistral:7b — Fast general-purpose, good for summarization. 3.5GB RAM
- llama3:8b — Reasoning, explanation, creative tasks. 4GB RAM
- neural-chat:7b — Conversational AI, lightweight. 3GB RAM
Using Ollama from Python
import requests
import json
response = requests.post('http://localhost:11434/api/generate', json={
'model': 'codellama:13b',
'prompt': 'Write an Odoo model for customer tickets',
'stream': False
})
result = response.json()
print(result['response'])
Real-World Odoo Use Case
Imagine a CRM that auto-classifies incoming emails:
- Email lands in Odoo mail.mail
- Server hook calls Ollama: "Classify this email: urgent? sales? support?"
- Response in 200ms, zero API cost, no data leaves your server
- Auto-assign to queue, set priority, create lead
Performance Benchmark
Latency comparison (request: "summarize this contract in 50 words"):
- Claude API: 1200ms (network + inference)
- Ollama mistral:7b: 180ms (local inference)
- Ollama on GPU: 80ms
For high-volume tasks (100 emails/day), Ollama pays for itself.