Skip to main content

Production Config and Gunicorn

Uvicorn alone (single process) is fine for development. For production, Gunicorn manages multiple Uvicorn workers, handles graceful restarts, and improves stability under load.

Learning Focus

By the end of this lesson you can: calculate the right number of Gunicorn workers, configure timeouts and logging, write a gunicorn.conf.py, and set up graceful shutdown for zero-downtime deploys.

Gunicorn with UvicornWorker

production-start.sh
gunicorn app.main:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 60 \
--graceful-timeout 30 \
--keep-alive 5 \
--access-logfile - \
--error-logfile - \
--log-level info

Worker Count Formula

workers = (2 × CPU cores) + 1
ServerCPUsRecommended workers
t3.micro25
t3.medium25
c5.xlarge49
c5.2xlarge817
note

For I/O-bound FastAPI apps with async endpoints, fewer workers are needed than for sync CPU-bound apps. Start with (CPUs × 2) + 1 and tune with load testing.

gunicorn.conf.py

gunicorn.conf.py
import multiprocessing

# Worker count
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000

# Binding
bind = "0.0.0.0:8000"

# Timeouts
timeout = 60 # Worker killed if doesn't respond in 60s
graceful_timeout = 30 # Time for workers to finish after SIGTERM
keepalive = 5 # Keep-alive connection timeout

# Logging
accesslog = "-" # stdout
errorlog = "-" # stderr
loglevel = "info"
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'

# Restart workers after this many requests (prevents memory leaks)
max_requests = 1000
max_requests_jitter = 50

# Preload app (faster startup, but shares state)
preload_app = False # Set True if your app is read-only after init
start-with-config.sh
gunicorn app.main:app --config gunicorn.conf.py

Systemd Service

/etc/systemd/system/fastapi.service
[Unit]
Description=FastAPI Application
After=network.target postgresql.service
Requires=postgresql.service

[Service]
Type=notify
User=appuser
Group=appuser
WorkingDirectory=/var/app
EnvironmentFile=/var/app/.env
ExecStart=/var/app/.venv/bin/gunicorn app.main:app --config /var/app/gunicorn.conf.py
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=fastapi

[Install]
WantedBy=multi-user.target
manage-service.sh
sudo systemctl enable fastapi
sudo systemctl start fastapi
sudo systemctl status fastapi
sudo systemctl reload fastapi # Zero-downtime reload
sudo journalctl -u fastapi -f # Tail logs

Zero-Downtime Deployment

deploy.sh
#!/bin/bash
set -e

APP_DIR="/var/app"
VENV="$APP_DIR/.venv"

echo "Pulling latest code..."
cd $APP_DIR && git pull

echo "Installing dependencies..."
$VENV/bin/pip install -r requirements.txt --quiet

echo "Running migrations..."
$VENV/bin/alembic upgrade head

echo "Reloading Gunicorn workers gracefully..."
sudo systemctl reload fastapi

echo "Deployment complete."

systemctl reload sends SIGHUP to Gunicorn, which starts new workers with the updated code and gracefully finishes existing requests on old workers.

Environment-Based Settings

app/core/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8")

DATABASE_URL: str
SECRET_KEY: str
DEBUG: bool = False
ALLOWED_ORIGINS: list[str] = []
LOG_LEVEL: str = "info"
WORKERS: int = 4

settings = Settings()

Common Pitfalls

PitfallCause / SymptomFix
Worker timeout killing slow requeststimeout too short for long queriesIncrease timeout or offload to background tasks
Memory growing indefinitelyLong-running workers accumulating stateSet max_requests to recycle workers
preload_app=True with async engineEngine created in parent process, workers inherit broken stateKeep preload_app=False for async apps
Gunicorn using sync workersForgot --worker-class uvicorn.workers.UvicornWorkerAlways specify UvicornWorker for FastAPI
SIGTERM kills requestsNot waiting for in-flight requestsSet graceful_timeout to a value > your slowest endpoint

Hands-On Practice

load-test.sh
# Install hey (HTTP load tester)
go install github.com/rakyll/hey@latest

# Test with 4 Gunicorn workers, 200 concurrent users, 10000 requests
gunicorn app.main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 &

hey -n 10000 -c 200 http://localhost:8000/health

# Compare with single Uvicorn process
uvicorn app.main:app --port 8001 &
hey -n 10000 -c 200 http://localhost:8001/health

What's Next