• Big Data
  • Data Analytics
  • Automated Vehicles
  • Automated Driving
  • Machine Learning

Like Mining Gold, Extracting Data Value Takes Effort

Sam Abuelsamid
Oct 04, 2018


Data is the new gold. It’s the mantra you often hear these days. Or sometimes it’s the new oil. Data is undoubtedly a valuable resource, and once extracted it can become the raw material of other products. But as with gold or oil, locating the resource is just the beginning. Getting to the heart of the find takes a lot more effort.

Most gold in the earth is mixed up with minerals such as quartz and requires a substantial amount of effort to separate and purify. Similarly, much of the oil outside of the Middle East is produced from shale or tar sands deposits.

Sifting Data

Much the same holds true when it comes to data. One major use of data is to train the deep neural networks at the heart of most of the automated driving systems under development. Training a neural network requires feeding in large quantities of data that enable the network to begin understanding the patterns it is expected to recognize.

Gathering virtual mountains of data is actually quite straightforward. Take a vehicle equipped with sensors that record the world around it, drive it in a variety of environments, and record everything to racks of hard drives in the cargo area. A typical automated driving test vehicle can generate 4 TB or more of raw data per day, a virtual mountain if ever there was one.

However, just as the gold is mixed in with quartz, silver, or other materials, someone needs to make sense of the oodles of megabytes of images and other data before they can be fed to a neural network. The data must be curated, labeled, and annotated to be useful in the training process. Someone must go through each frame, identifying pedestrians, cyclists, lane markings, signs, vehicles, and anything else of interest. The network must be told what a pedestrian looks like from many different angles in many conditions in order to assemble the mathematical model of that pattern.

Some Developers Rely on Data Prep Companies for Support

Depending on the scale and complexity of the project, developers may opt to do the data processing in house. Alternately, it can farm it out to a company such as Mighty AI that specializes in preparing raw data for use in training and validation of machine learning systems. Outsourcing the labeling and annotation work to others can offer advantages in both speed and quality.

Quality assurance is critical to success, since inaccurately labeled data will lead to incorrect pattern recognition and neural networks that misidentify road users. If your photo app mischaracterizes a dog as a wagon, the consequences are relatively trivial. If a pedestrian is misidentified as a lamp post, an automated driving system may assume the person isn’t going to suddenly move and fail to react when that person steps into the road.

Learn More
More information on how both captured and synthetic data is used to train and validate machine learning systems for automated driving is available in the free sponsored Guidehouse Insights white paper The Data Driving Automated Vehicles. Mighty AI CEO Daryn Nakhuda, Danny Shapiro, senior director of automotive at Nvidia, and Ryan Eustice, senior vice president of automated driving at Toyota Research Institute, also discuss the needs and uses of data in a Guidehouse Insights Webinar.