fbpx

How to use Docker for Data Science Project

Step 1: How to install docker in windows, mac, & linux

Before you start using docker as your environment for you data science project, you will need to install it first.

Now, How you install docker is different depending on your operating system.

If you are on Mac & Linux OS, then follow these link to install docker

If you are on Windows 10 Pro, you computer needs to meet specific requirements in order to install docker. And, you need to initiate some settings on your computer like Hyper-V. Don’t worry, I got you covered though. Follow these instructions on how to install docker on windows 10 pro.

If you are using Windows 10 Home, the requirements for your computer is different from the requirement for windows 10 Pro. You will need to install Windows Subsystem For Linux (WSL2), and initiate some windows settings like virtual platform feature before you can download and use docker on your windows 10 home computer. Once again, don’t worry. I got you covered. Go here to see my tutorial on how to install docker on windows 10 home.

After installing docker, you are ready to start using docker on our local computer for data science projects.

After installing docker, you will get some initial instructions of how things work from docker documentation the first time you open docker app. Feel free to liberally use those instructions provided by docker to help you get started.

Step 2: Clone an Existing Github Repo

To make this process easy for you and I, I went ahead and created a TEMPLATE Repo. This repo has the docker files to help you get started. You can just replace the APP folder with you own app as you build. Here is the Template Repo Link again –> https://github.com/EvidenceN/docker-tutorial

There are 3 ways for you to get this Template Repo

  1. You can just click the link that says “Use This Template” to create your own repo using the template. Then, clone the new repo to your local computer
  2. You can go ahead and fork the repo, then clone this Template repo I created to your local computer. A reminder on how to clone a repo…(FORK the Repo BEFORE Cloning it)
git clone https://github.com/YOUR-GITHUB-USERNAME/YOUR-REPO-NAME.git> 
  
cd YOUR-REPO-NAME
  1. Alternative to cloning the repo, you can also just download the zip file containing the template.

After getting your copy of the template repo, you are ready to start building your first docker image.

Step 3: Build docker image using docker compose

IMPORTANT: Before you run any docker command on your terminal, make sure you open the docker app and wait for the docker icon to stabilize. If you run any docker command below without the docker app running first, then it will throw an error.

Make sure you navigate to the folder where your repo is located before doing the following instructions.

Build your docker compose image using the dockerfile in the template repo using the following command.

docker-compose build

You won’t need to run this command again unless you update the requirements.txt file or the Dockerfile

Also like a said earlier, after downloading docker, they provide some introductory lessons, you may want to take advantage of that.

After building your docker image, you are ready to run your app for the first time.

Step 4: Run docker Image

So, when you clone the repo, you will see that I already have an APP built in. This is just a starter app. Feel free to delete it and add in your own app.

Running the docker image is how you will get the app up and running on your local host port.

To run docker image…type

docker-compose up

Then go to localhost:8000 , if its not showing up, then just go to localhost and it might be there. You should see something like this

Step 5: How to add new libraries to docker environment

That was just the starter tutorial to get you up and running with docker. Now, what if you want to add new python packages/libraries to your docker environment, how do you do that?

To add new packages like scipy, scikit-learn, numpy, you go to the requirements.txt file. In this template repo, you will see a requirements.txt file.

Just add the libraries you want to add as well as the version you want. You don’t have to add the version number, you can just have the library without specifying a version.

Requirements.txt file looks something like this….

After adding new libraries to your requirements.txt file, you need to re-build your image for the changes to take place. Just do

docker-compose build

Then

docker-compose up

Step 6: Build Your APP & Deploy to Amazon Web Services (AWS)

After you have your docker environment up and running, you can focus your time on building your data science APP. Like always, I won’t leave you hanging. Follow this instruction to learn —> How to build a data science API using FastAPI

After building your app, or even before you start building your app, you may want to deploy your APP to AWS to host it online. I also got you covered in this area. —> Follow this instructions to learn how to Deploy to AWS Elastic Beanstalk using Docker.

Step 7: Docker commands to accomplish different tasks.

If you want to improve your docker skills, these are a few docker commands that will help you out tremendously.

To clean up your docker containers and images, follow the instructions on this blog post on How to Prune unused Docker objects

General Docker Commands and their meaning. To use these docker commands, simply type docker command. “Command” is the command you want executed.

Docker Commands:

Docker CommandsCommand Meaning
attach Attach local standard input, output, and error streams to a running container
build Build an image from a Dockerfile
commit Create a new image from a container’s changes
cp Copy files/folders between a container and the local filesystem
create Create a new container
diff Inspect changes to files or directories on a container’s filesystem
events Get real time events from the server
exec Run a command in a running container
export Export a container’s filesystem as a tar archive
history Show the history of an image
images List images
import Import the contents from a tarball to create a filesystem image
info  Display system-wide information
inspect Return low-level information on Docker objects
kill  Kill one or more running containers
load  Load an image from a tar archive or STDIN
login  Log in to a Docker registry
logout Log out from a Docker registry
logs  Fetch the logs of a container
pause Pause all processes within one or more containers
port List port mappings or a specific mapping for the container
ps List containers
pull  Pull an image or a repository from a registry
push  Push an image or a repository to a registry
rename Rename a container
restart Restart one or more containers
rm  Remove one or more containers
rmi  Remove one or more images
run  Run a command in a new container
save  Save one or more images to a tar archive (streamed to STDOUT by default)
search  Search the Docker Hub for images
start Start one or more stopped containers
stats  Display a live stream of container(s) resource usage statistics
stop  Stop one or more running containers
tag  Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
top  Display the running processes of a container
unpause Unpause all processes within one or more containers
update  Update configuration of one or more containers
version Show the Docker version information
wait  Block until one or more containers stop, then print their exit codes
different docker commands and their meaning

To get the above commands, you can just type in docker help and it will give you the same list as above inside your command line.

In the above tutorial, we used docker compose quite a bit.

The commands below are a few additional docker-compose commands. Again, to use these commands below, simply type docker-compose -command. “Command” is the command you want executed.

Docker-compose Commands:

Docker-compose
Command
What the command means
build  Build or rebuild services
config  Validate and view the Compose file
create Create services
down  Stop and remove containers, networks, images, and volumes
events  Receive real time events from containers
exec  Execute a command in a running container
help  Get help on a command
images  List images
kill  Kill containers
logs  View output from containers
pause  Pause services
port  Print the public port for a port binding
ps  List containers
pull  Pull service images
push  Push service images
restart  Restart services
rm  Remove stopped containers
run  Run a one-off command
scale  Set number of containers for a service
start  Start services
stop  Stop services
top  Display the running processes
unpause  Unpause services
up  Create and start containers
version  Show version information and quit
docker-compose commands

To get more docker-compose help or commands, you can just type docker-compose —help and you will get the exact same list above in your command line.

I hope you liked this tutorial and it helped you.

Leave a Comment

Scroll to Top