fbpx

How to build Data Science API using FastAPI

This tutorial will teach you how to build a Data Science API using FastAPI. Everything is not covered in this tutorial, but we are going to build a functional API and at the end of this tutorial, if you want to learn more about FastAPI, you can check out their documentation here.

Step 1: Choose your Virtual Environment for the Project

For this tutorial, you can use a docker environment. If you need to learn how to set up a docker environment, check out this tutorial on how to use docker for data science projects. You could also use a PIPENV environment for this project. In this tutorial, I showed you how to set up a PIPENV virtual environment for your data science project. Just use whatever you are comfortable with. But, for this project, I used docker for the environment.

Step 2: Download a template for this tutorial

To make this tutorial easier for you and I, I have gone ahead and created a template that you can download here. This template has everything you need to get started with building data science API using FastAPI. Here is the link to the template again –> https://github.com/EvidenceN/FastAPI-tutorial

There are 3 ways for you to get this Template Repo

  1. You can just click the link that says “Use This Template” to create your own repo using the template. Then, clone the new repo to your local computer
  2. You can go ahead and fork the repo, then clone this Template repo I created to your local computer. A reminder on how to clone a repo…(FORK the Repo BEFORE Cloning it)

Python1

git clone https://github.com/YOUR-GITHUB-USERNAME/YOUR-REPO-NAME.git> 
  
cd YOUR-REPO-NAME
  1. Alternative to cloning the repo, you can also just download the zip file containing the template.

Now that you have a copy of the API Template on your local computer, lets begin by first exploring the file structure since this will come in handy for understanding the code we are writing.

Step 3: Study the File Structure

This is the file structure for the “projects” folder

project
├── Dockerfile
├── requirements.txt
└── app
    ├── __init__.py
    ├── main.py
    └── api
       ├── __init__.py
       ├── airBnB_model_v3.pkl
	   ├── airbnb_predict.py
       ├── gas_model.pkl
       └── gas_price_prediction.py  

What the Files Mean

Requirements.txt – where you put packages needed for the project. When you add new packages to your requirements.txt file, then you need to run docker-compose build again to re-build your docker image and incorporate your new packages. If you are using pipenv and you install the new packages using pipenv, you don’t need to re-build any image.

App/main.py is where you edit the app’s title and description. To learn more about title & descriptions in fastAPI, check out this documentation. Edit this and add your own title and description (I will show you how to do that below).

The main.py file is also where you configure the Cross-Origin Resource Sharing (CORS). If you don’t know what that means and you want to learn more about CORS, then check out the documentation for it. You shouldn’t have to edit this section, but by all means edit it if you need to for whatever reason.

Under API folder, the files that end in .pkl are the pickled files from the data science models. Here, just replace those .pkl files with your own pickled files. In the future, I will create detailed tutorial on how to pickle different machine learning models using different libraries. But, for now I found this youtube tutorial on model pickling to be effective if you need to learn how to pickle a data science model. Also, Below under “step 7: pickle your model“, I showed you a brief overview of how to pickle your data science model. Just scroll down below to get that information

airbnb_predict.py defines the machine learning endpoint for the airbnb model. It accepts a post request and responds with airbnb price predictions. Replace this prediction file with your own prediction file as you see fit. The predictions in this endpoint is coming from the airBnB_model_v3.pkl file.

The gas_price_prediction.py file is similar to the airbnb_predict.py file even though the code inside them is slightly different, they are both Machine Learning Endpoints to do different predictions.

Step 4: Launch your API using Docker or Pipenv

Once you have a clone of the app on your local computer, you want to launch it to localhost so that you can preview changes to the API as they happen.

To get your app running using docker, navigate to the folder where the template repo is located and follow the instructions I outline in this how to use docker for data science projects blog post.

But, here is a quick refresher to get you started. Type in these commands

To build your docker image

docker-compose build

To launch the web application

docker-compose up

If you are using pipenv, you can follow the instructions I laid out here in this blog on using Pipenv for data science projects. But here is the command you need to launch your app

uvicorn app.main:app --reload

When you launch your app, you should see something that looks like this

FastAPI data science API example

Now that you have launched your app and can preview changes as you make them, let’s get to actually viewing & editing the content.

Step 5: How to use the APP you just launched

VIDEO ON HOW APP SHOULD WORK. This video will show you a preview of how the app is supposed to work.

Once you launch the app, you can

  • Click on “Post”
  • Click on “Try it out”
  • Input some values or use the default values
  • Click on “Execute” to get a prediction
  • If you enter incorrect value or there is another problem, you will get error message and explanation of why the error is occurring. The error will be printed on your command line too.

Now, let’s edit what you are seeing.

Step 6: Edit the Main File

This section will be focused on teaching you how to edit your main.py file. I am going to take the code in the main.py file and break it down into different sections, so that I can explain what each block of code is doing.

Import FastAPI and other Packages

Here, we are just importing the libraries we need for the app

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
import uvicorn

Import Machine Learning Endpoints

This code below is what you use to import the files where you do your model predictions. This is how the app homepage communicates with the endpoints.

from app.api import gas_price_prediction, airbnb_predict

Establish App Basics

In this sections, you just define what the title, description, version, and documentation url of the app should be..

app = FastAPI(
    title='Evidence.N FastAPI Tutorial',
    description='Creating Amazing Data Science Content',
    version='0.1',
    docs_url='/',
)

This is what the above code looks like in the API page.

FastAPI Title and description example

Establish connection between app to machine learning endpoints files

The code below just directs what happens when you go to the machine learning predictions endpoints we imported earlier. The router is basically saying that the app should include a link to those predictions and when you go to the predictions link, you will see a page for the predictions.

In the Airbnb code below, you will see code that looks like this @router.post('/airbnb_predict'). This code basically says that you will find the predictions at the page “homepage/airbnb_predict“. The code below is saying to include the page “airbnb_predict” to the app. This is where we establish connection between the app and your machine learning prediction endpoint files.

app.include_router(gas_price_prediction.router)
app.include_router(airbnb_predict.router)

Other code to make the app run as expected.

This code basically configures the Cross-Origin Resource Sharing (CORS) for the app. This is what allows the app in the web browser to communicate with the app on you local machine. It just gives localhost or wherever you are hosting your app permission to send request and receive a response.

app.add_middleware(
    CORSMiddleware,
    allow_origins=['*'],
    allow_credentials=True,
    allow_methods=['*'],
    allow_headers=['*'],
)

This code below is instructing the app to run whenever you type uvicorn app.main:app --reload in your terminal.

if __name__ == '__main__':
    uvicorn.run(app)

Now that you understand how the main file works, let’s move onto working with your machine learning endpoints.

Step 7: Pickle Your Data Science Model

Before you can use your model in your APP, you first need to pickle your model in your notebook to get a file that has your model in a saved format. Pickling your model is basically the process of saving your model and all its dependencies so that you can use your model for predictions outside your notebook and within your notebook too but without re-training your model.

Follow these instructions below to pickle your model inside your notebook. Pickle is a built in python library.

import pickle
with open('model_pickle', 'wb') as f:
	pickle.dump(model, f)

model_pickle is the name of the file

‘wb’ means write into file for binary mode —>More on python write files method

f is the representation of the model_pickle file

pickle.dump(model, f) means to save the “model” in the file F, but remember F is actually model_pickle.

When you go to your files folder where this notebook is located, you should see a new file called “model_pickle”

To load your pickled model,

import pickle
with open('model_pickle', 'rb') as f:
	mod = pickle.load(f)
# Then you can use your loaded model for prediction like this. 
mode.predict([Prediction_values])

Another method for saving and pickling your model is joblib. Which I consider easier process just because it is less code. Make sure you install joblib first using this link.

For more documentation on using joblib dump and load method, go here to read the documentation.

Use the instructions below to use joblib for pickling/saving data science models

# To save your model using joblib
joblib.dump(model, "model_joblib.pkl")
# To load the saved model. 
new_model = joblib.load("model_joblib.pkl")
# Then you can use your saved model for prediction. 
new_model.predict([Prediction_values])

If you need more information on model pickling, I suggest you follow this video tutorial.

Step 8: Edit the Airbnb predict file

The template has 2 prediction files. One for gas price predictions and another for airbnb price predictions. For this tutorial, I will use the airbnb price predictions for the lessons. But, what you learn here is also applicable to the gas predict file.

It is your duty to look at the gas predict file and see if you can understand the code using what you learned from here. Also, you can just apply these concepts to any code you want to write for your APP.

Once again like I did for the main.py file, I am going to break down each code block and explain what it is doing.

Import Packages

Importing the packages and libraries we will be using.

import logging
import os
import pickle
from fastapi import APIRouter
import pandas as pd
from pydantic import BaseModel, Field, validator

App Route

Establishing the app using the code below.

log = logging.getLogger(__name__)
router = APIRouter()

Type Validation.

The code below is just using Pydantic to create type validation. I am going to focus on what the code is doing, but feel free to follow the links below to learn more about how FastAPI processes the request body and how type validation works.

Go here to learn how FastAPI process request body and how it uses Pydantic’s BaseModel

Go here to learn how Pydantic Validators work

Here is a short video tutorial by FastAPI teaching you about type validation.

Basically, type validation is just making sure that somebody enters the correct value the model is expecting for predictions. If the model is expecting a negative number and you give it a positive number, it will throw an error. If the model is expecting an integer and you give a text value, it will not function properly.

In a nutshell, type validation is there to make sure you are providing the correct value to the model and if you don’t provide data in the expected format, the request will be rejected with an error message of why your request was rejected.

class Item(BaseModel):
    """Use this data model to parse the request body JSON."""
    room_type: str = Field(..., example = "Entire home/apt") 
    latitude: float = Field(..., example = 42.0)
    """positive value"""
    longitude: float = Field(..., example = -42.0)
    """negative value"""
    def to_df(self):
        """Convert pydantic object to pandas dataframe with 1 row."""
        return pd.DataFrame([dict(self)])
    @validator('latitude')
    def latitude_must_be_positive(cls, value):
        """Validate that latitude is positive integer."""
        assert value > 0, f'latitude == {value}, must be > 0'
        return value
    @validator('longitude')
    def longitude_must_be_negative(cls, value):
        """Validate that longitude is negative integer."""
        assert value < 0, f'longitude == {value}, must be < 0'
        return value

The result of the code above will be found at the bottom at the “Item” section and this is what it’s supposed to look like.

Pydantic BaseModel Type Validation in FastAPI Data Science APP

AirBnB Predictions Endpoint Description

This is where you define the endpoint. This is where you define what each parameter does and tell people how to use the app. Think of it like the description of the end point because that’s what it is.

# airbnb_predict is the link/page for this prediction. 
@router.post('/airbnb_predict')
async def predict(item: Item):
    """
    Make AirBnB price predictions using room type, longitude, and latitude
    On the web dev backend side, longitude and latitude information is converted into city. 
    On the front-end, user selects a city and roomtype, then web dev converts that city into longitude and latitude on the back end. Then the model receives room type, longitude, and latitude information as input, this input is then used to get a model prediction. 
    ### Request Body
    - `room type`: string
    - `latitude`: positive integer or float
    - `longitude`: negative integer or float
    ### Response
    - `prediction`: airbnb price
    ### RoomType Options:
    * Entire home/apt 
    * Private room 
    * Shared room 
    * Hotel room 
    ### Longitude and Latitude
    Longitude has to be negative numbers. Can be integer or float. This type is enforced.\\n 
    Latitude has to be positive numbers. Can be integer or float. This type is enforced.
    """

This is what the result of the code able look like.

API endpoint descriptions in FastAPI data science APP

AirBnB Predictions Endpoint

This is where you load your saved pickled data science model and use it for machine learning prediction

@router.post('/airbnb_predict')
async def predict(item: Item):
    """
    Make AirBnB price predictions using room type, longitude, and latitude
    On the web dev backend side, longitude and latitude information is converted into city. 
    On the front-end, user selects a city and roomtype, then web dev converts that city into longitude and latitude on the back end. Then the model receives room type, longitude, and latitude information as input, this input is then used to get a model prediction. 
    ### Request Body
    - `room type`: string
    - `latitude`: positive integer or float
    - `longitude`: negative integer or float
    ### Response
    - `prediction`: airbnb price
    ### RoomType Options:
    * Entire home/apt 
    * Private room 
    * Shared room 
    * Hotel room 
    ### Longitude and Latitude
    Longitude has to be negative numbers. Can be integer or float. This type is enforced.\\n 
    Latitude has to be positive numbers. Can be integer or float. This type is enforced.
    """
    data = item.to_df()
    THIS_FOLDER = os.path.dirname(os.path.abspath(__file__))
    my_file = os.path.join(THIS_FOLDER, 'airBnB_model_v3.pkl')
    with open(my_file, "rb") as f:
        model = pickle.load(f)
    prediction = round(model.predict(data)[0])
    return {
        'AirBnB Price Prediction': prediction
    }

This is what the result of the code above should look like.

Data science API prediction example using FastAPI

Step 9: Test your code

If you want to learn how to write tests for your data science API as well as learn additional concepts about FastAPI, I suggest you check out this video series on FastAPI Website that teaches you the basics of FastAPI as well as testing using fastAPI.

At this point, we have pretty much built a simple Data science API using FastAPI. You can keep exploring and add your own code and build cool API’s using FastAPI. But, I hope this gives you a good head start with building Data Science API’s using Fast API.

Deploy your API to AWS

After building your model, you may want to deploy it online using Amazon Web Services (AWS). Go here to this blog on How to deploy Data Science APP to AWS using Docker.

Leave a Comment

Scroll to Top