Production Config and Gunicorn

Uvicorn alone (single process) is fine for development. For production, Gunicorn manages multiple Uvicorn workers, handles graceful restarts, and improves stability under load.

Learning Focus

By the end of this lesson you can: calculate the right number of Gunicorn workers, configure timeouts and logging, write a gunicorn.conf.py, and set up graceful shutdown for zero-downtime deploys.

Gunicorn with UvicornWorker

production-start.sh
gunicorn app.main:app \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 \
  --timeout 60 \
  --graceful-timeout 30 \
  --keep-alive 5 \
  --access-logfile - \
  --error-logfile - \
  --log-level info

Worker Count Formula

workers = (2 × CPU cores) + 1

Server	CPUs	Recommended workers
t3.micro	2	5
t3.medium	2	5
c5.xlarge	4	9
c5.2xlarge	8	17

note

For I/O-bound FastAPI apps with async endpoints, fewer workers are needed than for sync CPU-bound apps. Start with (CPUs × 2) + 1 and tune with load testing.

`gunicorn.conf.py`

gunicorn.conf.py
import multiprocessing

# Worker count
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000

# Binding
bind = "0.0.0.0:8000"

# Timeouts
timeout = 60             # Worker killed if doesn't respond in 60s
graceful_timeout = 30    # Time for workers to finish after SIGTERM
keepalive = 5            # Keep-alive connection timeout

# Logging
accesslog = "-"          # stdout
errorlog = "-"           # stderr
loglevel = "info"
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'

# Restart workers after this many requests (prevents memory leaks)
max_requests = 1000
max_requests_jitter = 50

# Preload app (faster startup, but shares state)
preload_app = False  # Set True if your app is read-only after init

start-with-config.sh
gunicorn app.main:app --config gunicorn.conf.py

Systemd Service

/etc/systemd/system/fastapi.service
[Unit]
Description=FastAPI Application
After=network.target postgresql.service
Requires=postgresql.service

[Service]
Type=notify
User=appuser
Group=appuser
WorkingDirectory=/var/app
EnvironmentFile=/var/app/.env
ExecStart=/var/app/.venv/bin/gunicorn app.main:app --config /var/app/gunicorn.conf.py
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=fastapi

[Install]
WantedBy=multi-user.target

manage-service.sh
sudo systemctl enable fastapi
sudo systemctl start fastapi
sudo systemctl status fastapi
sudo systemctl reload fastapi   # Zero-downtime reload
sudo journalctl -u fastapi -f   # Tail logs

Zero-Downtime Deployment

deploy.sh
#!/bin/bash
set -e

APP_DIR="/var/app"
VENV="$APP_DIR/.venv"

echo "Pulling latest code..."
cd $APP_DIR && git pull

echo "Installing dependencies..."
$VENV/bin/pip install -r requirements.txt --quiet

echo "Running migrations..."
$VENV/bin/alembic upgrade head

echo "Reloading Gunicorn workers gracefully..."
sudo systemctl reload fastapi

echo "Deployment complete."

systemctl reload sends SIGHUP to Gunicorn, which starts new workers with the updated code and gracefully finishes existing requests on old workers.

Environment-Based Settings

app/core/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8")

    DATABASE_URL: str
    SECRET_KEY: str
    DEBUG: bool = False
    ALLOWED_ORIGINS: list[str] = []
    LOG_LEVEL: str = "info"
    WORKERS: int = 4

settings = Settings()

Common Pitfalls

Pitfall	Cause / Symptom	Fix
Worker timeout killing slow requests	`timeout` too short for long queries	Increase `timeout` or offload to background tasks
Memory growing indefinitely	Long-running workers accumulating state	Set `max_requests` to recycle workers
`preload_app=True` with async engine	Engine created in parent process, workers inherit broken state	Keep `preload_app=False` for async apps
Gunicorn using sync workers	Forgot `--worker-class uvicorn.workers.UvicornWorker`	Always specify UvicornWorker for FastAPI
SIGTERM kills requests	Not waiting for in-flight requests	Set `graceful_timeout` to a value > your slowest endpoint

Hands-On Practice

load-test.sh
# Install hey (HTTP load tester)
go install github.com/rakyll/hey@latest

# Test with 4 Gunicorn workers, 200 concurrent users, 10000 requests
gunicorn app.main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 &

hey -n 10000 -c 200 http://localhost:8000/health

# Compare with single Uvicorn process
uvicorn app.main:app --port 8001 &
hey -n 10000 -c 200 http://localhost:8001/health

Gunicorn with UvicornWorker​

Worker Count Formula​

gunicorn.conf.py​

Systemd Service​

Zero-Downtime Deployment​

Environment-Based Settings​

Common Pitfalls​

Hands-On Practice​

What's Next​