Cross Functional Data Science Team Project Experience
Project: Data-Driven Cross Country Journey
Resfeber Team Inspiration – Desired Product & Why
The project I worked with was Resfeber which is a travel research product for planning travels. The main problem this project was trying to solve is to create a one-stop product where you can plan all your travel. You will be able to create a profile, research different destinations, pin destinations, get weather information about your destination, create a travel itinerary, add restaurants, excursions, housing reservations information to your itinerary, get historical & current house prices, gas prices for your travel route, plan your travel route, all within a single platform. The primary fear and concern I had going into this project was that we don’t have access to the data we need in order to build the desired product.
High Level Overview Down to the Grassroots – Breaking down the Products to Tasks
We broke down the product roadmap by looking at the deliverables for different release schedules. The deliverable was “Users should be able to estimate a road trip’s cost.” We asked, what were the costs associated with road trips? The primary cost associated with road trips was gas prices. So we decided we were going to get data that shows gas price by location. The next task after that was cleaning the data and getting gas price prediction by location using the historical gas price data. After building the model that can predict gas prices, the next objective was to build an API for predicting gas prices so that web development & iOS teams will incorporate it into their apps.
Another deliverable we had was predicting Airbnb price in advance, based on location, date, and room type, other factors. In order to accomplish this deliverable, we had to find Airbnb price data. After obtaining Airbnb price data, the next step was predicting housing prices based on location, room type, and other factors available in the data. After getting price prediction, we had to create an API for airbnb prices. Then deliver this API to web development & iOS teams to add to their apps.
Here is an example Trello card we made that I worked on. I built the API that shows the Airbnb & gas prices. With this API, web & iOS teams can show lodging and gas prices to the user.
Data Driven Features for Traveling
The main feature I built was the Data Science API. The first task was to install Docker and create a docker environment for the project. The next task was deploying the API to AWS using elastic beanstalk. After deploying the API to AWS, I added SSL/https security to the website using AWS Route 53. After deploying the basic template API, I focused on learning how to build an API using FastAPI framework. After learning how to work with FastAPI, I focused on learning how to transform data science models into an API.
This is the link to the API where you can get Airbnb house price predictions and gas price predictions. Below is the demo on how the API works. The API main purpose is to predict airbnb housing prices & gas prices. The API also has built in data type check to make sure the data being imputed is a valid data type. If the data entered is not valid, it will throw and error and inform you on the type of data required by the model.
Challenges of Data Driven Features
The primary challenge I faced was pickling the model. Pickling the data science model was the process I used to package the model built by other data scientists on the team to something that can do predictions using an API. I ran into a problem specifically with XGBoost Model and Docker. In order for XGBoost library to work in docker on a windows machine, I needed something called libgomp in the docker environment as well as other dependencies. After hours of searching, I found instructions on how to install a library that includes libgomp in a linux based mac machine inside docker environment. But, there was no way to install a libgomp library in docker on a windows machine.
I solved this problem by informing the other data scientists we needed to change the model used for Airbnb house predictions. We changed the model library from XGBoost to Random Forest and didn’t run into further problems pickling the data science model for the housing price predictions. Here is the link to the API Code in github if you are interested in checking out the code.
Another problem my team faced was figuring out where to find the data to accomplish the project. As data scientists, without the data, we can’t do much. We had ideas of what we wanted to build, but we didn’t have the data to build it. We solved this problem by reaching out to company CEO’s asking them if they would give us access to their API data. We reached out to government organizations asking for access to their API which contained the data we wanted. We did a scavenger hunt across the internet trying to find various data sets for the project. We combined data from multiple sources to create one large dataset to get all the data information we need. At the end, we got the data we needed for the project.
Results and Potential Future Improvements to Data Driven Journey APP
As of right now, the API is functional and it has been incorporated into the web & iOS applications. In the future, we might improve the data science models, might add visualizations to the API for web & iOS to show on the front end, maybe build other models for providing the user with different predictions related to their travel like average food cost by location. We might also provide them with information about Tesla charging stations where they can stop to recharge their car and rest simultaneously. Might be able to provide the customer with weather data regarding their desired location and route. There are a few more things we can definitely do to improve the product.
The technical challenges I foresee if we are to continue this project is getting access to the data. We need to somehow get access to food prices by location data, get tesla charging stations data, get weather forecast API, and get access to data that is not readily available to the public. So, the primary challenge will be getting access to the data. If we get access to the data, then everything else will be doable.
Two or more Heads is better than One – Benefits of Teamwork
The primary benefit of working with others was being able to plan and execute the project as a team. We came up with ideas, bounced ideas off each other, got on zoom calls and helped each other out individually, did pair programming, and got feedback from our peers on how to improve our code. When any of us had questions about the piece of the project we were working on, we always had somebody ready to help. We could just message each other and get on video-call to find the solutions to our problems. We were able to break the project down into smaller sections and put different people in charge of different parts of the project. This way, we all contributed equally, we worked alone & in group while working towards a common goal.
This project will help me further my career a little bit because going into the project, I didn’t know how to build a data science api. That was my main focus for this project and I learned how to build an API. So, this project definitely helped me further my career by giving me an opportunity to learn a new skill set that I desired to acquire.
Leave a Comment