fbpx

How to create a virtual environment for data science projects.

WARNING/NOTE – This tutorial ASSUMES you already know how to use github and the command line to do basic stuff. If you don’t know how to use github, check out this blog tutorial and youtube video I created to teach you how to get started with github.

Knowing how to use github is not a necessity for creating virtual environments, but this tutorial mentions github in steps 1 & 2 which is why I said it will be nice to know it.

You don’t need github to create a virtual environment, YES, you can CREATE VIRTUAL ENVIRONMENT WITHOUT knowing github.

What is the reason for creating a virtual environment?

Virtual environments are a way for you to isolate the project you are working on. For example, let’s say you are working on a project that requires python 3.7, pandas 2.0, numpy 1.0, and seaborn 6.5.

Then you are working on another project that requires python 2.0, pandas 3.0, numpy 2.0, seaborn 2.5. If you try to work on both projects from one environment, you will run into problems and conflicts between both projects because they require different versions of the same package and/or require different packages all together.

With a virtual environment, You can create an environment for the first project that just has the packages you need along with the correct versions. For example, you can create a virtual environment that has these packages and versions – python 3.7, pandas 2.0, numpy 1.0, and seaborn 6.5.

Then, for the second project, you can create ANOTHER environment that has the packages and versions specific to the second project. For example, you can create a second virtual environment that has these specific package versions – python 2.0, pandas 3.0, numpy 2.0, seaborn 2.5.

In general, Virtual environments are a way to isolate your projects to avoid package and version conflict between projects.

Step by Step Guide on how to create virtual environments.

With that said, let’s create a virtual environment together. Remember, you can always watch the video above which does a GREAT job of explaining everything below.

Step 1 – Create a Github Repo: Whenever you start a new project, it is best practice to create a github repo for it. A github repo will allow you to keep an online version of the project in case your computer crash and burn. If you don’t know how to create a github repo, check out this blog and video.

NOTE: Make sure to initialize your github repo with a readme file and git ignore. It is helpful but not a necessity.

Step 2 – Clone the project repo to local desktop: Cloning the github repo allows you to have a local version of the project that is linked to the online version of the project.

Step 1 & 2 are just preliminary stuff. Now let’s get to the real deal.

Step 3 – change into the folder that has your project readme file: From your command line, after cloning the github repo, navigate into the github folder you just cloned. THIS IS WHERE YOU WILL CREATE YOUR VIRTUAL ENVIRONMENT.

When you navigate into the folder cloned from github, you should see a readme, .gitignore, license files (if you initialized your github repo with those things). If you didn’t initialize your repo with those files, then you will just see an empty folder.

WARNING: Make sure the folder-level where you are creating the virtual environment is the folder level where you have your readme file. This ensures that you are creating the environment on the right level.

And most importantly, this will ensure that you don’t end up creating a virtual environment inside another virtual environment. Creating a virtual environment inside another one is the easiest way to corrupt virtual environment.

Step 4 – How to create the virtual environment: There are a couple of ways to actually create your virtual environment. I will focus on the first and must fundamental way to create your virtual environment. Then further than, I will show you another way to create a virtual environment.

  1. You can type in pipenv shell
    • This will create a virtual environment inside the folder where the command is called. During the creation of the virtual environment, a pipfile will be created. This pipfile will hold all the packages and package version that you are using for this project.
    • After the virtual environment is created, you will find yourself inside the virtual environment automatically. When you do dir or ls (depending on your operating system), you should see a pip file.
  2. To exit the virtual environment, just type in exit or exit() , to get back into the virtual environment, just type in pipenv shell again.

Step 5: Install the packages you want to use in the environment

How to know when you are outside or inside a virtual environment
  1. To install packages for the project inside the virtual environment, just type pipenv install package name and/or version.
    • For example. pipenv install python==3.7 (basically install python version 3.7). Or
    • pipenv install pandas (install the current version of pandas). So, you can install the current version of the package or install a specific version of the package.
    • You can also install multiple packages at the same time by doing pipenv install numpy seaborn scikit-learn (this will install all the packages listed after the install statement).
    • NOTE: When you are installing things in your local machine, you use pip install package. But when you are installing things in virtual environment, you use pipenv install package.
    • When you install packages in virtual environment, you can do so while inside or outside the virtual environment. To see what packages you have installed in your virtual environment, just open your pipfile using any code editor like VS Code, and look at what packages has been installed. Like I said earlier, your pipfile has the packages you have installed.

WARNING: If your virtual environment doesn’t lock after you install packages inside your virtual environment, your code files won’t run properly when executed.

Step 6: Lock your virtual environment. If your environment doesn’t lock after installing packages, you have to lock your environment manually.

  1. When you do pipenv install, the virtual environment installs the package and CREATES A PIPFILE LOCK file. If there is an existing PIPFILE LOCK file, then it just updates it.
    • Sometimes, when you install packages, it will give a message that says pipfile lock failed to lock. In this situation, just type in pipenv lock . Usually, this will lock your virtual environment.
    • If typing in pipenv lock doesn’t lock your virtual environment, there is a good chance your virtual environment is corrupted.
    • Some of the reasons why a virtual environment corrupts are
      • when you try to install a virtual environment in a sub folder located inside another virtual environment. So, basically trying to create a virtual environment inside another virtual environment.
      • Another reason a virtual environment corrupts is if you try to install a package that doesn’t exist. This can happen easily when you mistype the name of a package. Make sure your virtual environment locks.
    • WARNING: If your virtual environment doesn’t lock, your files won’t run properly when executed. When you try to import packages, it might throw an error saying that “module is not found or doesn’t exist.” When you try to execute your files, the files might not execute properly.
    • Your virtual environment has to lock for your virtual environment to work properly.
  2. WHY DO YOU NEED PIPFILE LOCK FILE? The pipfile lock file kinda acts like an integrity key for the packages installed in your environment. You can open the pipfile lock file and look at it, but whatever you do DON’T EVER EDIT THE PIPFILE LOCK.

    Don’t even add a space, comma, or anything to your virtual environment. If you edit anything in your pipfile lock file, there is a good chance you will corrupt your virtual environment. Like I said earlier, virtual environments are easy to corrupt.

Step 7: Dealing with corrupted virtual environments. The easiest way I have found to deal with a corrupted virtual environment is just to copy all the files you need (like the files and sub-folders that has the code you have written for this project). Copy JUST THE FILES for the project into a new folder and create a new virtual environment there.

DON’T Copy your pipfile or your pipfile lock file or your git files into the new folder. Because those files has in their “memory” the corrupted virtual environment. Copying them into the new folder will corrupt your new virtual environment even before you create it. Just link your new folder into your github repo or delete the previous repo you created and create another one with the exact same name.

Another way to deal with corrupted virtual environment is just to remove it using pipenv --rm

Exit environment once you are done with it by typing exit or exit()

Step 8 : Another way to create a virtual environment is just type in pipenv install packages. That will automatically create the virtual environment and install the packages you identified.

For example, if you do pipenv install pandas numpy seaborn, it will create a virtual environment in the folder you are currently at and install the packages specified (if you don’t already have a virtual environment). If you already have a virtual environment, then it won’t create a new virtual environment, it will just install the specified packages.

If you have a virtual environment that is already created, you can install packages inside it without been inside the virtual environment (without typing pipenv shell to enter the environment). To do so, just navigate to the folder where the virtual environment is and do pipenv install packages. That will install the specified packages inside the virtual environment even though you are not inside the virtual environment.

Step 9: Here is a list of Helpful and useful virtual environment commands. If you need more help with your virtual environment, you can get more help from your command line. Just do pipenv shell to navigate into the virtual environment and then type pipenv help. This will show you a list of pipenv commands and what they do.

Code to type in Command lineWhat it does
pipenv –python 3.7Create a new project using Python 3.7, specifically
pipenv –rmRemove project virtualenv (inferred from current directory)
pipenv install –devInstall all dependencies for a project (including dev)
pipenv lock –preCreate a lockfile containing pre-releases
pipenv graphShow a graph of your installed dependencies. Displays currently-installed dependency graph information.
pipenv checkCheck your installed dependencies for security vulnerabilities
pipenv install -e .Install a local setup.py into your virtual environment/Pipfile
pipenv run pip freezeUse a lower-level pip command. Will create a requirements.txt file that contain all the packages int eh current environment and their versions.
pipenv installInstalls provided packages and adds them to Pipfile, or (if no
packages are given), installs all packages from Pipfile.
pipenv lockGenerates Pipfile.lock
pipenv runSpawns a command installed into the virtualenv.
pipenv shellSpawns a shell within the virtualenv.
pipenv syncInstalls all packages specified in Pipfile.lock.
pipenv uninstallUn-installs a provided package and removes it from Pipfile.
pipenv updateRuns lock, then sync.
pipenv checkChecks for security vulnerabilities and against PEP 508 markers
provided in Pipfile.
pipenv cleanUninstalls all packages not specified in Pipfile.lock.
pipenv openView a given module in your editor.
List of commands and resources that will help you with virtual environment

I hope this tutorial helped you to create a virtual environment.

Here are the latest post. Check them out

Leave a Comment

Scroll to Top