Setup Overview #
My setup consists of an NVIDIA GTX 1060 GPU with CachyOS Linux.
Python Prerequisites #
Create Python Venv #
CachyOS uses fish as the default shell, adapt activate.fish if necessary.
# Create virtual environment
python3 -m venv .venv-hf-project-1
# Active environment
source .venv-hf-project-1/bin/activate.fish
# Upgrade pip
pip install --upgrade pip
# (Deactivate venv)
deactivate
Pip Requirements #
Create Pinned Version List #
Find CUDA version: https://download.pytorch.org/whl/
# List available versions
pip index versions torchaudio \
--index-url https://download.pytorch.org/whl/cu118
# Shell output:
torch (2.7.1+cu118)
Available versions: 2.7.1+cu118, 2.7.0+cu118, 2.6.0+cu118, 2.5.1+cu118, 2.5.0+cu118
Create a file with the intented packages:
- requirements.in
--index-url https://download.pytorch.org/whl/cu118
--extra-index-url https://pypi.org/simple
torch==2.7.1+cu118
python-dotenv
pillow
transformers
accelerate
datasets
huggingface_hub
# Install pip tools
pip install pip-tools
# Create a txt file with the exact pinned package and dependencies versions
pip-compile -o requirements.txt requirements.in
The pinned down requirements and their dependencies look like this:
- requirements.txt
#
# This file is autogenerated by pip-compile with Python 3.13
# by the following command:
#
# pip-compile --output-file=requirements.txt requirements.in
#
--index-url https://download.pytorch.org/whl/cu118
--extra-index-url https://pypi.org/simple
accelerate==1.12.0
# via -r requirements.in
aiohappyeyeballs==2.6.1
# via aiohttp
aiohttp==3.13.2
# via fsspec
aiosignal==1.4.0
# via aiohttp
anyio==4.12.0
# via httpx
attrs==25.4.0
# via aiohttp
certifi==2025.11.12
# via
# httpcore
# httpx
# requests
charset-normalizer==3.4.4
# via requests
datasets==4.4.1
# via -r requirements.in
dill==0.4.0
# via
# datasets
# multiprocess
filelock==3.20.0
# via
# datasets
# huggingface-hub
# torch
# transformers
frozenlist==1.8.0
# via
# aiohttp
# aiosignal
fsspec[http]==2025.10.0
# via
# datasets
# huggingface-hub
# torch
h11==0.16.0
# via httpcore
hf-xet==1.2.0
# via huggingface-hub
httpcore==1.0.9
# via httpx
httpx==0.28.1
# via datasets
huggingface-hub==0.36.0
# via
# -r requirements.in
# accelerate
# datasets
# tokenizers
# transformers
idna==3.11
# via
# anyio
# httpx
# requests
# yarl
jinja2==3.1.6
# via torch
markupsafe==3.0.3
# via jinja2
mpmath==1.3.0
# via sympy
multidict==6.7.0
# via
# aiohttp
# yarl
multiprocess==0.70.18
# via datasets
networkx==3.6.1
# via torch
numpy==2.3.5
# via
# accelerate
# datasets
# pandas
# transformers
nvidia-cublas-cu11==11.11.3.6
# via
# nvidia-cudnn-cu11
# nvidia-cusolver-cu11
# torch
nvidia-cuda-cupti-cu11==11.8.87
# via torch
nvidia-cuda-nvrtc-cu11==11.8.89
# via torch
nvidia-cuda-runtime-cu11==11.8.89
# via torch
nvidia-cudnn-cu11==9.1.0.70
# via torch
nvidia-cufft-cu11==10.9.0.58
# via torch
nvidia-curand-cu11==10.3.0.86
# via torch
nvidia-cusolver-cu11==11.4.1.48
# via torch
nvidia-cusparse-cu11==11.7.5.86
# via torch
nvidia-nccl-cu11==2.21.5
# via torch
nvidia-nvtx-cu11==11.8.86
# via torch
packaging==25.0
# via
# accelerate
# datasets
# huggingface-hub
# transformers
pandas==2.3.3
# via datasets
pillow==12.0.0
# via -r requirements.in
propcache==0.4.1
# via
# aiohttp
# yarl
psutil==7.1.3
# via accelerate
pyarrow==22.0.0
# via datasets
python-dateutil==2.9.0.post0
# via pandas
python-dotenv==1.2.1
# via -r requirements.in
pytz==2025.2
# via pandas
pyyaml==6.0.3
# via
# accelerate
# datasets
# huggingface-hub
# transformers
regex==2025.11.3
# via transformers
requests==2.32.5
# via
# datasets
# huggingface-hub
# transformers
safetensors==0.7.0
# via
# accelerate
# transformers
six==1.17.0
# via python-dateutil
sympy==1.14.0
# via torch
tokenizers==0.22.1
# via transformers
torch==2.7.1+cu118
# via
# -r requirements.in
# accelerate
tqdm==4.67.1
# via
# datasets
# huggingface-hub
# transformers
transformers==4.57.3
# via -r requirements.in
triton==3.3.1
# via torch
typing-extensions==4.15.0
# via
# huggingface-hub
# torch
tzdata==2025.2
# via pandas
urllib3==2.6.2
# via requests
xxhash==3.6.0
# via datasets
yarl==1.22.0
# via aiohttp
# The following packages are considered to be unsafe in a requirements file:
# setuptools
Insall Requirements #
# Install pinned requirements and dependencies
pip install -r requirements.txt
Python App #
Hugging Face Model #
Model Link: https://huggingface.co/Salesforce/blip-image-captioning-base
File and Folder Structure #
The file and folder structure of the project looks like this:
hf-project-1
├── app.py
├── Dockerfile
├── .env
├── image1.jpg
├── image2.jpg
├── image3.jpg
├── requirements.in
└── requirements.txt
.env File & HF API Token #
Create Hugging Face API token:
-
Go to: “Settings” > “Access Tokens” > “Create new token”
-
Select “Read”
-
Click “Create token”
Add the token to the .env file:
- .env
# Hugging Face read token
HUGGINGFACEHUB_API_TOKEN=mysecuretoken
Python app.py #
from dotenv import find_dotenv, load_dotenv
from transformers import pipeline
import sys
import os
# Load variables from .env file
load_dotenv(find_dotenv())
# Function to load image2text model
def load_model_img_to_text():
return pipeline(
"image-to-text",
model="Salesforce/blip-image-captioning-base",
device="cuda:0"
) # Omit device to let it auto-detect
# Call function for image2text model > become pipeline object
model_img_to_text = load_model_img_to_text()
# Function to use image2text model
def img_to_text(image_path):
textresult = model_img_to_text(image_path)
print(textresult) # Print output to the terminal
return textresult # Return the result if further process is needed
# Run model
if __name__ == "__main__":
if len(sys.argv) < 2:
sys.exit(1)
image_path = sys.argv[1]
if not os.path.isfile(image_path):
print(f"File not found: {image_path}")
sys.exit(1)
# Run image2text function
img_to_text(image_path)
- Command
python app.py foo.jpg> sys.argv: [“app.py”, “foo.jpg”]
Run App #
# Run app: Define image
python app.py image2.jpg
# Shell output:
[{'generated_text': 'a man feeding seaguls on a boat'}]
Hugging Face Cache #
The model is downloaded to the local Hugging Face cache:
# Verify model cache
ls ~/.cache/huggingface/hub/models--Salesforce--blip-image-captioning-base
# Shell output:
drwxr-xr-x - cachyos 6 Dez 18:14 .no_exist
drwxr-xr-x - cachyos 6 Dez 18:15 blobs
drwxr-xr-x - cachyos 6 Dez 18:14 refs
drwxr-xr-x - cachyos 6 Dez 18:14 snapshots