Skip to main content

For ML APIs

FastAPI Hosting for ML APIs

A FastAPI ML API has a specific shape: long-lived container, model weights cached in memory, persistent volume for the weights so they do not redownload, async streaming for token-by-token responses. Hostim runs this shape natively. No cold starts, no per-request billing, predictable monthly cost.

# docker-compose.yml
services:
  api:
    image: my-fastapi-app
    command: uvicorn main:app --host 0.0.0.0 --port 8000
  db:
    image: postgres:16
  • 🇪🇺 Hosted in Germany, GDPR by default
  • 🐳 Run Docker apps (Compose supported)
  • 🗄️ Built-in MySQL, Postgres, Redis & volumes
  • 🔐 HTTPS, metrics, and isolation per project
  • 💳 Per-project cost tracking · from €2.5/month

Why FastAPI on Hostim for ML APIs

Serverless platforms are a bad fit for ML inference: cold starts reload the model, per-request billing punishes long generation, and the function timeout caps long completions. Hostim runs your FastAPI container as a normal long-lived process. Mount a persistent volume at /models and load the weights once at startup; they survive every redeploy. Async streaming responses (SSE, token-by-token) work because the container stays connected for the duration of the request. CPU and RAM scale per app — GPU support is roadmap.

What ML APIs need from a host

Teams running model inference behind a FastAPI or Flask endpoint, typically with model weights cached on disk.

  • Long-lived containers (no cold starts that reload models)
  • Persistent volumes for model weights, several GB each
  • GPU access (if the model needs it) — or fast CPUs with enough RAM
  • Streaming responses for token-by-token output
  • Predictable cost per inference, not a per-request bill

Hostim runs containers as long-lived processes. Model weights stay in memory. Persistent volumes hold the weights so a redeploy does not redownload 5 GB. CPU and RAM scale per app — GPU support is roadmap.

How Hostim runs FastAPI

FastAPI hosting means running a Uvicorn or Gunicorn-Uvicorn worker process and exposing it over HTTPS. The framework is async by default, so the host has to support long-lived connections — websockets, server-sent events, streaming responses.

Deploy model

Hostim runs your FastAPI Docker image as a normal container. Long-lived connections work. Managed PostgreSQL is attached at runtime. If you serve an ML model, mount a persistent volume for the weights so they do not redownload on every deploy.

Common pitfalls

Cold starts from serverless platforms are not a fit for ML inference workloads. Hostim runs a permanent container, so model weights stay in memory across requests.

Typical env vars

DATABASE_URL, OPENAI_API_KEY, MODEL_PATH, LOG_LEVEL

FAQ

How do I keep model weights across deploys?

Mount a persistent volume at /models. Download or build weights into the volume once; subsequent deploys mount the same volume — no redownload.

Are streaming token responses supported?

Yes. FastAPI on Hostim runs as a long-lived process behind HTTPS. SSE and token-by-token streaming work without extra config.

Is there GPU support?

Not yet. CPU inference with enough RAM works for many smaller models (embeddings, classical ML). GPU is on the roadmap; ask if you need it for evaluation.

How is per-request billing avoided?

Hostim charges by reserved CPU, RAM and storage — flat per month. Inference cost does not scale with the number of requests; the bill is predictable.

Ready to deploy FastAPI?

Spin up an app in minutes. Managed database on the free tier, custom domain included.