Machine Learning Model as a Dockerized API

Machine Learning Model as a Dockerized API

Machine Learning Model as a Dockerized API using FastAPI

Deploying machine learning models as APIs is a powerful way to make your models accessible to other applications and users. This tutorial will guide you through the process of deploying a machine learning model using FastAPI and Docker, making it accessible via a RESTful API.


Before we begin, ensure you have the following:

  • Python and pip installed on your system.
  • Basic understanding of machine learning and Python.
  • Docker installed on your system.

Step 1: Setting Up the FastAPI Application

First, we need to create a FastAPI application. FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.

Install FastAPI and Uvicorn

pip install fastapi uvicorn

Create the FastAPI App

Create a file named and add the following code:

from fastapi import FastAPI
from pydantic import BaseModel
import joblib

app = FastAPI()

# Load the pre-trained model
model = joblib.load("model.joblib")

class Iris(BaseModel):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float"/predict/")
def predict(iris: Iris):
    data = [[iris.sepal_length, iris.sepal_width, iris.petal_length, iris.petal_width]]
    prediction = model.predict(data)
    return {"prediction": prediction[0]}

In this script, we define a FastAPI app and a /predict/ endpoint that takes in iris flower measurements and returns a prediction from a pre-trained scikit-learn model.

Step 2: Train and Save Your Model

For this tutorial, we’ll use a simple scikit-learn model to classify iris flowers. Train and save your model using the following code:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import joblib

# Load dataset
iris = load_iris()
X, y =,

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = RandomForestClassifier(), y_train)

# Save the model
joblib.dump(model, "model.joblib")

Step 3: Create a Dockerfile

To containerize our FastAPI app, we’ll create a Dockerfile. Docker is a containerization platform that packages the application with its dependencies into a lightweight, portable unit.

Create a file named Dockerfile and add the following content:

# Use the official Python image from the Docker Hub
FROM python:3.9

# Set the working directory

# Copy the requirements file
COPY requirements.txt .

# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the FastAPI app code
COPY . .

# Expose the port the app runs on

# Command to run the app
CMD ["uvicorn", "app:app", "--host", "", "--port", "8000"]

Create a requirements.txt file with the following content:


Step 4: Build and Run the Docker Container

Build the Docker image using the following command:

docker build -t fastapi-ml-model .

Run the Docker container:

docker run -d -p 8000:8000 fastapi-ml-model

Your FastAPI app is now running in a Docker container. You can test it by making POST requests to the /predict/ endpoint.

Step 5: Testing the API

You can test the API using curl or any API testing tool like Postman. Here is an example using curl:

curl -X POST "http://localhost:8000/predict/" -H "Content-Type: application/json" -d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'

This should return a JSON response with the prediction.


You’ve successfully deployed a machine learning model using FastAPI and Docker, creating a RESTful API that can be accessed from anywhere. This approach leverages the efficiency and simplicity of FastAPI and Docker to make your machine learning models readily available for real-world applications.

By following this tutorial, you can deploy any machine learning model as a Dockerized API using FastAPI, making it accessible and scalable for various applications

Related Posts

Retrieval-Augmented Generation

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a cutting-edge technique in machine learning that combines retrieval-based and generation-based approaches to improve the quality and coherence of text generation.

Read More
How to Set Up a Basic MLOps Pipeline with TensorFlow and Kubernetes

How to Set Up a Basic MLOps Pipeline with TensorFlow and Kubernetes

In the era of data-driven decision-making, machine learning (ML) has become an indispensable tool for businesses to gain valuable insights and make informed decisions.

Read More
AWS Boto3

AWS Boto3

In today’s digital landscape, cloud computing has revolutionized the way businesses operate.

Read More