Production Config and Gunicorn
Uvicorn alone (single process) is fine for development. For production, Gunicorn manages multiple Uvicorn workers, handles graceful restarts, and improves stability under load.
Learning Focus
By the end of this lesson you can: calculate the right number of Gunicorn workers, configure timeouts and logging, write a gunicorn.conf.py, and set up graceful shutdown for zero-downtime deploys.
Gunicorn with UvicornWorker
production-start.sh
gunicorn app.main:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 60 \
--graceful-timeout 30 \
--keep-alive 5 \
--access-logfile - \
--error-logfile - \
--log-level info
Worker Count Formula
workers = (2 × CPU cores) + 1
| Server | CPUs | Recommended workers |
|---|---|---|
| t3.micro | 2 | 5 |
| t3.medium | 2 | 5 |
| c5.xlarge | 4 | 9 |
| c5.2xlarge | 8 | 17 |
note
For I/O-bound FastAPI apps with async endpoints, fewer workers are needed than for sync CPU-bound apps. Start with (CPUs × 2) + 1 and tune with load testing.
gunicorn.conf.py
gunicorn.conf.py
import multiprocessing
# Worker count
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000
# Binding
bind = "0.0.0.0:8000"
# Timeouts
timeout = 60 # Worker killed if doesn't respond in 60s
graceful_timeout = 30 # Time for workers to finish after SIGTERM
keepalive = 5 # Keep-alive connection timeout
# Logging
accesslog = "-" # stdout
errorlog = "-" # stderr
loglevel = "info"
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'
# Restart workers after this many requests (prevents memory leaks)
max_requests = 1000
max_requests_jitter = 50
# Preload app (faster startup, but shares state)
preload_app = False # Set True if your app is read-only after init
start-with-config.sh
gunicorn app.main:app --config gunicorn.conf.py
Systemd Service
/etc/systemd/system/fastapi.service
[Unit]
Description=FastAPI Application
After=network.target postgresql.service
Requires=postgresql.service
[Service]
Type=notify
User=appuser
Group=appuser
WorkingDirectory=/var/app
EnvironmentFile=/var/app/.env
ExecStart=/var/app/.venv/bin/gunicorn app.main:app --config /var/app/gunicorn.conf.py
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=fastapi
[Install]
WantedBy=multi-user.target
manage-service.sh
sudo systemctl enable fastapi
sudo systemctl start fastapi
sudo systemctl status fastapi
sudo systemctl reload fastapi # Zero-downtime reload
sudo journalctl -u fastapi -f # Tail logs
Zero-Downtime Deployment
deploy.sh
#!/bin/bash
set -e
APP_DIR="/var/app"
VENV="$APP_DIR/.venv"
echo "Pulling latest code..."
cd $APP_DIR && git pull
echo "Installing dependencies..."
$VENV/bin/pip install -r requirements.txt --quiet
echo "Running migrations..."
$VENV/bin/alembic upgrade head
echo "Reloading Gunicorn workers gracefully..."
sudo systemctl reload fastapi
echo "Deployment complete."
systemctl reload sends SIGHUP to Gunicorn, which starts new workers with the updated code and gracefully finishes existing requests on old workers.
Environment-Based Settings
app/core/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8")
DATABASE_URL: str
SECRET_KEY: str
DEBUG: bool = False
ALLOWED_ORIGINS: list[str] = []
LOG_LEVEL: str = "info"
WORKERS: int = 4
settings = Settings()
Common Pitfalls
| Pitfall | Cause / Symptom | Fix |
|---|---|---|
| Worker timeout killing slow requests | timeout too short for long queries | Increase timeout or offload to background tasks |
| Memory growing indefinitely | Long-running workers accumulating state | Set max_requests to recycle workers |
preload_app=True with async engine | Engine created in parent process, workers inherit broken state | Keep preload_app=False for async apps |
| Gunicorn using sync workers | Forgot --worker-class uvicorn.workers.UvicornWorker | Always specify UvicornWorker for FastAPI |
| SIGTERM kills requests | Not waiting for in-flight requests | Set graceful_timeout to a value > your slowest endpoint |
Hands-On Practice
load-test.sh
# Install hey (HTTP load tester)
go install github.com/rakyll/hey@latest
# Test with 4 Gunicorn workers, 200 concurrent users, 10000 requests
gunicorn app.main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 &
hey -n 10000 -c 200 http://localhost:8000/health
# Compare with single Uvicorn process
uvicorn app.main:app --port 8001 &
hey -n 10000 -c 200 http://localhost:8001/health