Author: Julian A. Gonzalez, IBM Champion 2025
Date: 6-29-2025
Welcome to this beginner-friendly guide for running IBM’s Granite 3.2-2B language model on a Raspberry Pi 5! This guide shows you how to set up a local AI assistant using only your Raspberry Pi’s built-in hardware - no external drives or complex configurations needed.
What to Expect:
The Granite 3.2-2B model has 2 billion parameters and runs well on the Raspberry Pi 5’s CPU. Expect responses in 5-15 seconds. Your Pi will get warm during use, so good ventilation is recommended.
Important: This guide uses only the Raspberry Pi 5’s built-in storage and memory. No external drives needed!
Open a terminal and run these commands:
sudo apt update
sudo apt upgrade -y
sudo reboot
After reboot, verify you have the 64-bit system:
uname -m
You should see aarch64
(this means 64-bit ARM).
Ollama makes it easy to run AI models on your Raspberry Pi. Here’s how to install it:
Run this single command to install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
That’s it! Ollama is now installed and running.
Check if Ollama is working:
ollama --version
You should see version information displayed.
Let’s make a few simple changes to get the best performance from your Pi’s CPU:
This makes your Pi’s CPU run at full speed:
echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Create a simple service to set performance mode on every boot:
sudo nano /etc/systemd/system/cpu-performance.service
Copy and paste this content:
[Unit]
Description=Set CPU to Performance Mode
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor'
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Save the file (Ctrl+X
, then Y
, then Enter
), then enable it:
sudo systemctl enable cpu-performance.service
sudo systemctl start cpu-performance.service
Tell Ollama to use all 4 CPU cores:
sudo systemctl edit ollama.service
Add these lines in the editor that opens:
[Service]
Environment=OLLAMA_NUM_THREADS=4
Save and exit, then restart Ollama:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Now for the exciting part - getting the AI model!
Try this first - it might work directly:
ollama pull granite3.2:2b
If this works, skip to the Testing section.
If the direct method doesn’t work, we’ll create a custom setup:
nano granite-pi.Modelfile
FROM ibm/granite-3.2-2b
PARAMETER temperature 0.7
PARAMETER num_ctx 2048
SYSTEM """
You are a helpful AI assistant running on Raspberry Pi 5 hardware.
Keep responses concise and efficient due to hardware limitations.
"""
ollama create granite-pi -f granite-pi.Modelfile
ollama list
Let’s make sure everything works!
Start a conversation with your AI:
ollama run granite-pi
Try asking it something simple:
Hello! What can you help me with?
What is a Raspberry Pi?
Tell me a short joke
To exit the conversation, type /bye
or press Ctrl+C
.
Create a simple temperature monitor to keep an eye on your Pi:
cat > ~/check_temp.sh << 'EOF'
#!/bin/bash
temp=$(vcgencmd measure_temp | cut -d= -f2 | cut -d\' -f1)
echo "CPU Temperature: $temp"
if (( $(echo "$temp > 70" | bc -l) )); then
echo "⚠️ Getting warm! Consider better cooling."
fi
EOF
chmod +x ~/check_temp.sh
Run it anytime with:
~/check_temp.sh
Here are some easy ways to use your new AI assistant:
# Ask a quick question
echo "Explain what machine learning is in simple terms" | ollama run granite-pi --format json | jq -r '.response'
cat > ~/chat.sh << 'EOF'
#!/bin/bash
echo "🤖 Granite AI Assistant (type 'quit' to exit)"
echo "=================================="
while true; do
echo -n "You: "
read input
if [ "$input" = "quit" ]; then
echo "Goodbye!"
break
fi
echo -n "AI: "
echo "$input" | ollama run granite-pi --format json | jq -r '.response'
echo ""
done
EOF
chmod +x ~/chat.sh
Use it with:
~/chat.sh
cat > ~/ai_status.sh << 'EOF'
#!/bin/bash
echo "🔍 Raspberry Pi AI Status"
echo "========================"
echo "Temperature: $(vcgencmd measure_temp)"
echo "Memory usage: $(free -h | grep Mem | awk '{print $3 "/" $2}')"
echo "CPU governor: $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)"
echo "Ollama status: $(systemctl is-active ollama)"
echo "Available models:"
ollama list
EOF
chmod +x ~/ai_status.sh
Symptoms: Error messages when trying to run the model
Solutions:
free -h
sudo systemctl restart ollama
ollama run granite-pi --num_ctx 1024
Symptoms: Responses take more than 30 seconds
Solutions:
vcgencmd measure_temp
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Symptoms: System freezes or becomes very slow
Solutions:
ollama run granite-pi --num_ctx 1024
Symptoms: Ollama crashes with memory errors
Solutions:
sudo reboot
Congratulations! You now have a working AI assistant on your Raspberry Pi. Here are some ideas for what to do next:
Try asking your AI to:
# Daily briefing script
cat > ~/daily_brief.sh << 'EOF'
#!/bin/bash
echo "Good morning! Here's your daily tech tip:" | ollama run granite-pi --format json | jq -r '.response'
EOF
Your Raspberry Pi is a great platform for learning about:
| Raspberry Pi Model | Response Time | Best Use Cases | |——————–|——————-|————————————| | Pi 5 (4GB RAM) | 10-20 seconds | Simple questions, learning | | Pi 5 (8GB RAM) | 5-15 seconds | General use, longer conversations |
You’ve successfully set up a powerful AI assistant using just your Raspberry Pi 5’s built-in hardware! This simple setup proves that you don’t need expensive equipment to experiment with AI technology.
The Granite 3.2-2B model running on your Pi’s CPU provides a great introduction to local AI without any cloud dependencies. You have complete control over your data and can experiment freely.
Remember:
This is just the beginning of your AI journey. Enjoy exploring the possibilities!
Julian A. Gonzalez, IBM Champion 2025