How to Engineer a New Feature in Python Using Pandas

How to Engineer a New Feature in Python Using Pandas

This tutorial will teach you how to create a new column using pandas and python. Creating a new column is what we call feature engineering in data science. Now, we call it feature engineering because a lot of thought and work has to go into creating this new column.

You can’t just go and start adding a bunch of columns to your dataset just because you feel like it. NO! Doing that will most likely not help your dataset or model.

Before you engineer a new feature, you need to ask yourself questions like, what is the purpose of this new feature, why am I adding it to my dataset, what am I trying to accomplish with this new feature, what experiment am I running, how do I know and how can I test if this new feature improved my dataset and model. How do I know if this new feature is useful, useless, or makes no difference.

The purpose of engineering a new feature is to improve your dataset and model and you do that by using existing data to create new features that adds useful information to your dataset and model. If the new feature doesn’t add value to your dataset, then there is no point in engineering it or adding it to your dataset.

It is ok to experiment and engineer new features and then see the impact of that engineering. Many times, I have engineered new features only to find out that it doesn’t improve my model and doesn’t add any value to the dataset. That is perfectly okay. I just removed the newly engineered feature and continued with the original dataset.

That being said, this is how you engineer a new feature…

Step 1: Import pandas library.

Step 2: Read the dataset you are using with pandas read CSV function.

Step 3: Verify that the data is loaded correctly using this code.

Step 4: In this feature engineering tutorial, we will do a simple addition between two features. So, this is the code to add two columns together using pandas.

The code on the left side is how you define a new column. The new column is called “new_feature”

The code on the right side is how you define the value(s) for the new column. The new column will be a combination of the other two columns “Beer” and “Wine”

Step 5: Check out your dataset and verify that the new feature was added and the datapoints in the new feature is what you expected using this dataset. Pull out your calculators and add up “Beer” and “Wine” columns, you will see that they add up to the “new_feature” column.

Step 6: Let’s engineer a new feature where we subtract multiple columns. This code below shows you how to subtract columns with pandas.

Step 7: Verify that your feature engineering worked as expected. Again, you can pull out your calculator and do the subtraction in the code above and you will see that it matches the new “Subtraction” column.

That’s it, that is the fundamentals of feature engineering. Of course, the features we engineered here is for experimental purposes only. But, it should give you a foundational knowledge of what feature engineering is all about.

Now, go ahead and be creative and engineer new features, your imagination and knowledge is the limit. While you are having fun hacking away at your computer, make sure whatever feature you are engineer] adds value to your dataset and model.

Leave a Comment

Scroll to Top