Run local model: Image 2 Text Model with Streamlit based UI, Docker Container Version

Table of Contents

Setup Overview
#

My setup consists of an NVIDIA GTX 1060 GPU with CachyOS Linux.

Docker Setup
#

# Install NVIDIA Container Toolkit
sudo pacman -S nvidia-container-toolkit

# Configure Docker to use the NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Verify Docker configuration
sudo cat /etc/docker/daemon.json

# Shell output:
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

# Restart Docker daemon
sudo systemctl restart docker

# Check Docker status
sudo systemctl status docker

# Verify the container can access the GPU
docker run --rm --gpus all nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi

# Shell output:
==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Thu Dec 11 15:47:20 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off |   00000000:09:00.0  On |                  N/A |
|  0%   32C    P8              6W /  156W |    1459MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Python Prerequisites
#

Create Python Venv
#

CachyOS uses fish as the default shell, adapt activate.fish if necessary.

# Create virtual environment
python3 -m venv .venv-hf-project-1-sl

# Active environment
source .venv-hf-project-1-sl/bin/activate.fish

# Upgrade pip
pip install --upgrade pip

# (Deactivate venv)
deactivate

Pip Requirements
#

Create Pinned Version List
#

Find CUDA version: https://download.pytorch.org/whl/

# List available versions
pip index versions torchaudio \
  --index-url https://download.pytorch.org/whl/cu118

# Shell output:
torch (2.7.1+cu118)
Available versions: 2.7.1+cu118, 2.7.0+cu118, 2.6.0+cu118, 2.5.1+cu118, 2.5.0+cu118

Create a file with the intented packages:

requirements.in

--index-url https://download.pytorch.org/whl/cu118
--extra-index-url https://pypi.org/simple

torch==2.7.1+cu118

python-dotenv
pillow
streamlit

transformers
accelerate
datasets
huggingface_hub

# Install pip tools
pip install pip-tools

# Create a txt file with the exact pinned package and dependencies versions
pip-compile -o requirements.txt requirements.in

The pinned down requirements and their dependencies look like this:

requirements.txt

#
# This file is autogenerated by pip-compile with Python 3.13
# by the following command:
#
#    pip-compile --output-file=requirements.txt requirements.in
#
--index-url https://download.pytorch.org/whl/cu118
--extra-index-url https://pypi.org/simple

accelerate==1.12.0
    # via -r requirements.in
aiohappyeyeballs==2.6.1
    # via aiohttp
aiohttp==3.13.2
    # via fsspec
aiosignal==1.4.0
    # via aiohttp
altair==6.0.0
    # via streamlit
anyio==4.12.0
    # via httpx
attrs==25.4.0
    # via
    #   aiohttp
    #   jsonschema
    #   referencing
blinker==1.9.0
    # via streamlit
cachetools==6.2.2
    # via streamlit
certifi==2025.11.12
    # via
    #   httpcore
    #   httpx
    #   requests
charset-normalizer==3.4.4
    # via requests
click==8.3.1
    # via streamlit
datasets==4.4.1
    # via -r requirements.in
dill==0.4.0
    # via
    #   datasets
    #   multiprocess
filelock==3.20.0
    # via
    #   datasets
    #   huggingface-hub
    #   torch
    #   transformers
frozenlist==1.8.0
    # via
    #   aiohttp
    #   aiosignal
fsspec[http]==2025.10.0
    # via
    #   datasets
    #   huggingface-hub
    #   torch
gitdb==4.0.12
    # via gitpython
gitpython==3.1.45
    # via streamlit
h11==0.16.0
    # via httpcore
hf-xet==1.2.0
    # via huggingface-hub
httpcore==1.0.9
    # via httpx
httpx==0.28.1
    # via datasets
huggingface-hub==0.36.0
    # via
    #   -r requirements.in
    #   accelerate
    #   datasets
    #   tokenizers
    #   transformers
idna==3.11
    # via
    #   anyio
    #   httpx
    #   requests
    #   yarl
jinja2==3.1.6
    # via
    #   altair
    #   pydeck
    #   torch
jsonschema==4.25.1
    # via altair
jsonschema-specifications==2025.9.1
    # via jsonschema
markupsafe==3.0.3
    # via jinja2
mpmath==1.3.0
    # via sympy
multidict==6.7.0
    # via
    #   aiohttp
    #   yarl
multiprocess==0.70.18
    # via datasets
narwhals==2.13.0
    # via altair
networkx==3.6.1
    # via torch
numpy==2.3.5
    # via
    #   accelerate
    #   datasets
    #   pandas
    #   pydeck
    #   streamlit
    #   transformers
nvidia-cublas-cu11==11.11.3.6
    # via
    #   nvidia-cudnn-cu11
    #   nvidia-cusolver-cu11
    #   torch
nvidia-cuda-cupti-cu11==11.8.87
    # via torch
nvidia-cuda-nvrtc-cu11==11.8.89
    # via torch
nvidia-cuda-runtime-cu11==11.8.89
    # via torch
nvidia-cudnn-cu11==9.1.0.70
    # via torch
nvidia-cufft-cu11==10.9.0.58
    # via torch
nvidia-curand-cu11==10.3.0.86
    # via torch
nvidia-cusolver-cu11==11.4.1.48
    # via torch
nvidia-cusparse-cu11==11.7.5.86
    # via torch
nvidia-nccl-cu11==2.21.5
    # via torch
nvidia-nvtx-cu11==11.8.86
    # via torch
packaging==25.0
    # via
    #   accelerate
    #   altair
    #   datasets
    #   huggingface-hub
    #   streamlit
    #   transformers
pandas==2.3.3
    # via
    #   datasets
    #   streamlit
pillow==12.0.0
    # via
    #   -r requirements.in
    #   streamlit
propcache==0.4.1
    # via
    #   aiohttp
    #   yarl
protobuf==6.33.2
    # via streamlit
psutil==7.1.3
    # via accelerate
pyarrow==22.0.0
    # via
    #   datasets
    #   streamlit
pydeck==0.9.1
    # via streamlit
python-dateutil==2.9.0.post0
    # via pandas
python-dotenv==1.2.1
    # via -r requirements.in
pytz==2025.2
    # via pandas
pyyaml==6.0.3
    # via
    #   accelerate
    #   datasets
    #   huggingface-hub
    #   transformers
referencing==0.37.0
    # via
    #   jsonschema
    #   jsonschema-specifications
regex==2025.11.3
    # via transformers
requests==2.32.5
    # via
    #   datasets
    #   huggingface-hub
    #   streamlit
    #   transformers
rpds-py==0.30.0
    # via
    #   jsonschema
    #   referencing
safetensors==0.7.0
    # via
    #   accelerate
    #   transformers
six==1.17.0
    # via python-dateutil
smmap==5.0.2
    # via gitdb
streamlit==1.52.1
    # via -r requirements.in
sympy==1.14.0
    # via torch
tenacity==9.1.2
    # via streamlit
tokenizers==0.22.1
    # via transformers
toml==0.10.2
    # via streamlit
torch==2.7.1+cu118
    # via
    #   -r requirements.in
    #   accelerate
tornado==6.5.3
    # via streamlit
tqdm==4.67.1
    # via
    #   datasets
    #   huggingface-hub
    #   transformers
transformers==4.57.3
    # via -r requirements.in
triton==3.3.1
    # via torch
typing-extensions==4.15.0
    # via
    #   altair
    #   huggingface-hub
    #   streamlit
    #   torch
tzdata==2025.2
    # via pandas
urllib3==2.6.2
    # via requests
watchdog==6.0.0
    # via streamlit
xxhash==3.6.0
    # via datasets
yarl==1.22.0
    # via aiohttp

# The following packages are considered to be unsafe in a requirements file:
# setuptools

Python App
#

Hugging Face Model
#

Model Link: https://huggingface.co/Salesforce/blip-image-captioning-base

File and Folder Structure
#

The file and folder structure of the project looks like this:

hf-project-1-sl
├── app.py
├── Dockerfile
├── .env
├── requirements.in
└── requirements.txt

.env File & HF API Token
#

Create Hugging Face API token:

Go to: “Settings” > “Access Tokens” > “Create new token”
Select “Read”
Click “Create token”

Add the token to the .env file:

.env

# Hugging Face read token
HUGGINGFACEHUB_API_TOKEN=mysecuretoken

Python app.py
#

from dotenv import find_dotenv, load_dotenv
from transformers import pipeline
from PIL import Image
import streamlit as st

# Load variables from .env file
load_dotenv(find_dotenv())

# Function: load model
@st.cache_resource  # Cache the model
def load_model_img_to_text():
    return pipeline(
        "image-to-text",
        model="Salesforce/blip-image-captioning-base",
        device="cuda:0"
    )  # Omit device to let it auto-detect

# Call function for image2text model > become pipeline object
model_img_to_text = load_model_img_to_text()

# ---- Streamlit UI ----
st.title("Image to Text Demo")
st.write("Upload an image and output the description")

uploaded_file = st.file_uploader(
    "Upload image:",
    type=["jpg", "jpeg", "png"]
)

if uploaded_file is not None:
    # Open and show the image
    image = Image.open(uploaded_file).convert("RGB")
    st.image(image, caption="Uploaded image", width=700)

    if st.button("Generate caption"):
        with st.spinner("Running model..."):
            img_to_text = model_img_to_text(image)
            caption = img_to_text[0]["generated_text"]

        st.success("Caption:")
        st.write(caption)

Dockerfile
#

Dockerfile

FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive

# Install Python
RUN apt-get update && apt-get install -y --no-install-recommends \
      ca-certificates curl gnupg \
      software-properties-common \
    && add-apt-repository ppa:deadsnakes/ppa \
    && apt-get update \
    && apt-get install -y --no-install-recommends \
      python3.13 python3.13-venv python3.13-dev \
    && rm -rf /var/lib/apt/lists/*

# Create venv, update pip
RUN python3.13 -m venv /opt/venv \
    && /opt/venv/bin/python -m pip install --no-cache-dir --upgrade pip setuptools wheel

ENV PATH="/opt/venv/bin:$PATH"

# Set working directory
WORKDIR /app

# Copy requirement files
COPY requirements.txt ./

# Install requirements
RUN pip install --no-cache-dir -r requirements.txt
#RUN python3.13 -m pip install --no-cache-dir -r requirements.txt

# Copy Python app
COPY app.py ./
COPY .env ./

# Create non-root user
RUN useradd -m appuser && chown -R appuser /app
USER appuser

# Streamlit configuration
ENV STREAMLIT_SERVER_HEADLESS=true \
    STREAMLIT_BROWSER_GATHER_USAGE_STATS=false

# Expose Streamlit port
EXPOSE 8501

# Run Python app
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

This container is intended for testing purposes, the Hugging Face model is downloaded at runtime rather than baked into the image.

Build Docker Container Image
#

# Build container image
docker build -t hf-image2text:0.1.0 .

# Verify the image
docker images

# Shell output:
IMAGE                                    ID             DISK USAGE   CONTENT SIZE   EXTRA
hf-image2text:0.1.0                      eb1ef3f4ac88       13.9GB         4.69GB    U
nvidia/cuda:11.8.0-runtime-ubuntu22.04   eaaccb3528ce       4.13GB         1.47GB

Run Docker Container
#

# Run the Docker container
docker run -d \
  --gpus all \
  --name hf-image2text \
  -p 8501:8501 \
  --env-file .env \
  hf-image2text:0.1.0

# Verify the running container
docker ps

# Shell output:
ONTAINER ID   IMAGE                 COMMAND                  CREATED          STATUS          PORTS                                         NAMES
0ef7882d58dc   hf-image2text:0.1.0   "/opt/nvidia/nvidia_…"   14 minutes ago   Up 14 minutes   0.0.0.0:8501->8501/tcp, [::]:8501->8501/tcp   hf-image2text

Open App in Browser
#

# Open app in browser
http://192.168.70.21:8501

The Python Streamlit UI looks like this:

Verify GPU Usage
#

# Watch NVIDIA GPU usage
watch -n 1 nvidia-smi

# Shell output:
Fri Dec 12 15:54:34 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off |   00000000:09:00.0  On |                  N/A |
|  0%   35C    P2             28W /  156W |    1467MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1102      G   /usr/bin/kwin_wayland                     2MiB |
|    0   N/A  N/A            1115      G   /usr/bin/sddm-greeter-qt6               292MiB |
|    0   N/A  N/A           15446      C   /opt/venv/bin/python3.13               1070MiB |
+-----------------------------------------------------------------------------------------+

Python venv inside the container: /opt/venv/bin/python3.13

Setup Overview #

Docker Setup #

Python Prerequisites #

Create Python Venv #

Pip Requirements #

Create Pinned Version List #

Python App #

Hugging Face Model #

File and Folder Structure #

.env File & HF API Token #

Python app.py #

Dockerfile #

Build Docker Container Image #

Run Docker Container #

Open App in Browser #

Verify GPU Usage #