️ Image Caption Generator using BLIP + PiCamera on Raspberry Pi
Turn your Raspberry Pi into a smart image captioning device using BLIP, an AI model that generates natural language descriptions of images.
Overview
In this project, you will:
Capture images using the PiCamera
Use the BLIP (Bootstrapped Language Image Pretraining) model to generate captions
Display or store the generated captions
Optionally build a simple web interface with Gradio
This project works best with Raspberry Pi 4 or 5 and requires an internet connection (for downloading models or using APIs).
Requirements
Step 1: Install System Dependencies
sudo apt update && sudo apt upgrade -y
sudo apt install python3-pip python3-venv libjpeg-dev libopenjp2-7-dev
Step 2: Set Up Python Environment
mkdir ~/blip-caption
cd ~/blip-caption
python3 -m venv venv
source venv/bin/activate
pip install torch torchvision
Step 3: Install HuggingFace Transformers and BLIP
pip install transformers timm pillow
Step 4: Capture Image with PiCamera
If you're using the legacy PiCamera:
pip install picamera
And use this code to capture an image:
from picamera import PiCamera
from time import sleep
camera = PiCamera()
camera.start_preview()
sleep(2)
camera.capture('image.jpg')
camera.stop_preview()
For USB camera, use OpenCV:
pip install opencv-python
import cv2
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
cv2.imwrite('image.jpg', frame)
cap.release()
Step 5: Generate Caption Using BLIP
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import torch
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
raw_image = Image.open("image.jpg").convert('RGB')
inputs = processor(raw_image, return_tensors="pt")
out = model.generate(**inputs)
caption = processor.decode(out[0], skip_special_tokens=True)
print("Caption:", caption)
Optional: Add Gradio Interface
pip install gradio
import gradio as gr
def caption_image(image):
inputs = processor(image, return_tensors="pt")
out = model.generate(**inputs)
return processor.decode(out[0], skip_special_tokens=True)
gr.Interface(fn=caption_image, inputs="image", outputs="text").launch()
✅ Conclusion
You've built a working image caption generator using Raspberry Pi, PiCamera, and the BLIP AI model. You can now:
Add automatic uploads or email integration
Extend it into a surveillance or accessibility tool
Use it in photo archiving projects