Designing Effective Supervised Machine Learning Systems

5 min readDec 9, 2019

A few months ago, I read a wonderful blog post simply titled “Advantage Flywheels” about how competitive advantages in business can usually be drawn as one or more positive feedback loops. The post resonated with me, in part because it’s just a great framework for thinking about how great businesses work, but also because I talk a lot about feedback loops in my job consulting with companies who are setting up machine learning systems. I think feedback loops are the most important characteristic of any well-designed machine learning system, and I’ve taken a crack at drawing what I think most supervised ML systems look like in practice (or aspire to look like):

The most important part of any ML system is the core feedback loop. Manual feature generation is the prime mover for the entire system, but once the system has data flowing through it, model predictions will eventually supplant manual features as the principal force for improvements to the output of the system (mostly as a matter of economics since they can be generated so much more quickly than manual features). Therefore, in production ML systems you should expect to put most of your resources into manual feature generation until you are able to generate model predictions on par with your manual features, at which point, you should shift your focus to validation which will likely become the new bottleneck. You can play with percentages — not every model prediction needs to be validated, but at least a meaningful percentage do or else the value of the system will erode over time rather than compound (the world changes over time, so too should your models).

If you’re new to machine learning and some of the terms in the diagram above look like utter nonsense, check out the glossary of terms at the end of this post for a quick primer.

Diagraming Great Companies

To illustrate how this diagram could be applied in the real world, I’ve filled it out for two (very different) ML-powered companies that I admire.

Textio

Textio is an “augmented writing” platform that helps people write more thoughtfully. Their flagship product, Textio Hire, helps HR professionals write better job postings that are simultaneously less biased and more likely to convert candidates:

Textio’s CEO, Kieran Snyder, has written eloquently on the idea of feedback loops in ML-focused companies, which she calls “learning loops.” Here’s what Textio’s diagram might look like:

Notice in my generic diagram, there’s a dotted line between Application and Test Data. That’s because not every ML system powers an application that also produces test data — however, in Textio’s case, customers use Textio Hire to create job postings, so it’s a source of additional test data (along with the data they acquire or scrape from the internet). This is the ideal design pattern — every incremental engaged customer of the product produces data that makes the product better for every existing customer. In other words, Textio enjoys one of the principal characteristics of a network effect.

CARMERA

CARMERA has an interesting business — they collect street level imagery using cameras attached to a fleet of service vehicles in cities across the U.S. and turn it into “real-time HD maps” that autonomous vehicles can use to navigate (among other products).

CARMERA’s feedback loop looks something like this:

Notice that unlike Textio, the application CARMERA is powering does not itself generate test data (at least, as far as I know CARMERA’s HD maps customers do not contribute back street level imagery to the platform). But they do have a uniquely proprietary source of test data: the distributed fleet of service vehicles equipped with their cameras. They’ve even gone so far as to create fleet monitoring software to entice companies to install their cameras on existing service fleets.

Thinking through an Effective ML System

If you’re in charge of establishing a machine learning practice in your company or you’re considering starting a company whose first product will be powered by an ML process, it’s worth thinking through each of the stages in the diagram above and at least having an answer for the following common pitfalls:

What is the ultimate application we are powering and who does is it helping?
How will we acquire test data?
What specific training or expertise will be required for feature generation?
Who will be in charge of each stage of the process?
How will data move from one step to the next?
What tools will we use at each step?

Glossary of Terms

Supervised machine learning is the process of training a model to mimic human intuition by instructing it with correct examples of the predictions you would like it to make. The alternative is unsupervised ML, where you allow the model to find connections/patterns in data without instruction. There’s also such a thing as semi-supervised ML.
Test data is the fuel that makes the engine run. It can be structured (e.g. images, text) or unstructured (e.g. stock trading activity, software usage patterns). Regardless, test data will eventually get processed into a prediction or turned into structured training data.
Feature generation is the process of taking test data and augmenting it so that it’s machine-readable for the model. “Features” are relationships in the data that a model can learn (e.g. the relationship between the shape of a car in aerial imagery and the classification “car”). With unstructured data, feature generation means hand-labeling test data (e.g. drawing bounding boxes on an image) whereas with structured data it means doing something called feature engineering.
Validation is just what it sounds like — it’s a system for validating the correctness of either human- or model-generated features. In its simplest form, validation is a just a single trained reviewer approving or rejecting/editing each model prediction or human annotation.
Training data is structured, validated data that can be used to train a model.
Model training is the process of teaching a supervised machine learning model to make inferences that match the judgment represented in the training data as closely as possible.
Model predictions are the judgments that a model makes when presented with test data
Application refers to a software application in this case. Usually, ML systems support an application that aids users in making a decision of some kind and often attempt to automate part or all of that decision.
NLP stands for Natural Language Processing, the field of ML focused on interpretation of text.
CV stands for Computer Vision, the field of ML focused on interpretation of images.