Machine Learning Model as a Dockerized API
- Chaba
- Best practices
- April 4, 2022
Machine Learning Model as a Dockerized API using FastAPI
Deploying machine learning models as APIs is a powerful way to make your models accessible to other applications and users. This tutorial will guide you through the process of deploying a machine learning model using FastAPI and Docker, making it accessible via a RESTful API.
Prerequisites
Before we begin, ensure you have the following:
- Python and pip installed on your system.
- Basic understanding of machine learning and Python.
- Docker installed on your system.
Step 1: Setting Up the FastAPI Application
First, we need to create a FastAPI application. FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.
Install FastAPI and Uvicorn
pip install fastapi uvicorn
Create the FastAPI App
Create a file named app.py
and add the following code:
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
app = FastAPI()
# Load the pre-trained model
model = joblib.load("model.joblib")
class Iris(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
@app.post("/predict/")
def predict(iris: Iris):
data = [[iris.sepal_length, iris.sepal_width, iris.petal_length, iris.petal_width]]
prediction = model.predict(data)
return {"prediction": prediction[0]}
In this script, we define a FastAPI app and a /predict/
endpoint that takes in iris flower measurements and returns a prediction from a pre-trained scikit-learn model.
Step 2: Train and Save Your Model
For this tutorial, we’ll use a simple scikit-learn model to classify iris flowers. Train and save your model using the following code:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import joblib
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Save the model
joblib.dump(model, "model.joblib")
Step 3: Create a Dockerfile
To containerize our FastAPI app, we’ll create a Dockerfile. Docker is a containerization platform that packages the application with its dependencies into a lightweight, portable unit.
Create a file named Dockerfile
and add the following content:
# Use the official Python image from the Docker Hub
FROM python:3.9
# Set the working directory
WORKDIR /app
# Copy the requirements file
COPY requirements.txt .
# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the FastAPI app code
COPY . .
# Expose the port the app runs on
EXPOSE 8000
# Command to run the app
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Create a requirements.txt
file with the following content:
fastapi
uvicorn
scikit-learn
joblib
Step 4: Build and Run the Docker Container
Build the Docker image using the following command:
docker build -t fastapi-ml-model .
Run the Docker container:
docker run -d -p 8000:8000 fastapi-ml-model
Your FastAPI app is now running in a Docker container. You can test it by making POST requests to the /predict/
endpoint.
Step 5: Testing the API
You can test the API using curl
or any API testing tool like Postman. Here is an example using curl
:
curl -X POST "http://localhost:8000/predict/" -H "Content-Type: application/json" -d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
This should return a JSON response with the prediction.
Conclusion
You’ve successfully deployed a machine learning model using FastAPI and Docker, creating a RESTful API that can be accessed from anywhere. This approach leverages the efficiency and simplicity of FastAPI and Docker to make your machine learning models readily available for real-world applications.
By following this tutorial, you can deploy any machine learning model as a Dockerized API using FastAPI, making it accessible and scalable for various applications
Tags :
Related Posts
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is a cutting-edge technique in machine learning that combines retrieval-based and generation-based approaches to improve the quality and coherence of text generation.
Read MoreHow to Set Up a Basic MLOps Pipeline with TensorFlow and Kubernetes
In the era of data-driven decision-making, machine learning (ML) has become an indispensable tool for businesses to gain valuable insights and make informed decisions.
Read More