Data science as a role means different things at different companies. As my friend Sergio Martinez-Ortuno put it, “imagine if we called all the different people who write software computer scientists. That’s where we’re at with data science.” The role has really gone through the hype cycle over the last 8-10 years, and we don’t seem to have reached a stable equilibrium point yet. As Director of Data Science at KeepTruckin, I thought I’d weigh in on how we’ve aligned the team, providing a couple of examples of the kinds of projects that we tackle. Hopefully this can be helpful for prospective employees to set expectations.
Data scientists are responsible for taking large amounts of raw data and building models that provide customer or business value. For context, KeepTruckin is a modern fleet management platform – we help commercial vehicle fleets coordinate their operations, improve the safety of their drivers and reduce their operating costs. Our data comes from hundreds of thousands of connected vehicles that beam up high dimensional, high frequency sensor data. Data science creates customer value from raw data by building models that provide insights and enable automation. It also feeds a flywheel: our models naturally get better and better as we attract more customers, and in turn our data products help to provide differentiation in the market and aid sales. The following products are a couple recent examples from my team:
Facility Insights
Our goal was to provide customers with useful data about facilities, such as when they are open/closed, availability of overnight parking, and a view on how efficient are their operations.
We started with raw GPS pings from hundreds of thousands of connected vehicles. From these, we built a model to understand when they stopped and were loaded and unloaded (as compared to stops at red lights, fuel, food, rest, etc). Next, we built another model to infer where the most visited facilities were, and after doing some annotation to build training data, built another model to geofence these. This gave us the ability to then infer a record of visits to each facility from the raw GPS pings, giving us entry and exit times for each visit. From the inferred record of visits, we could build a final model to predict the expected dwell time (the total in/out time) of future visits. This then became a product that we offer to our customers to help them plan their operations.

DRIVE Safety Score
Our goal was to infer how safely drivers are operating their vehicles, and to provide them with actionable insights for how to improve.
Again starting from raw GPS pings alongside telematics data coming off of the vehicle’s engine, we partitioned the entire North American road geometry into discrete segments. We then built a model to define an embedding for each segment– a numeric characterization of what typical driving behavior looks like. We used Bayesian methods to clamp down on variance in the segment estimation where data was sparse. This gave us a hyper local baseline for driving behavior. Next, we used a model to compare each driver’s individual performance against baseline for the tens of thousands of segments that they drive every day. This gives us a rich context under which to interpret their behavior. Finally, we fit a model to estimate safety incident propensity (eg. the rate of collisions and near-misses) from the contextual driver data. This model allows us to provide numeric scores to each driver every week with industry-leading accuracy.

Conclusion
The mandate of our data science organization is to deliver core technology to power novel data products. This mandate has kept the team focused on work that drives the company’s long-term value. I’ll get into how to enable a data science team to be successful (and by inversion, how to avoid the most common failure modes) in another post. For now though, I hope that this was a useful overview of how we are set up. I’m always looking to continue to refine my views though, so please comment and/or share!