Building the machine learning interface

5. Building the machine learning interface#

Until this point, we have focused on the theoretical aspects of machine learning without spending time building a modular interface.

Linear and logistic regression are so simple that we can easily get by with a few Python functions, but the code complexity rapidly shoots up with model complexity. Defining and training a neural network in the good old procedural style is a nightmare.

Python allows us to manage complexity via classes and methods. Let’s see an example. Read the following snippet line by line.

import numpy as np

class Sigmoid:
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))
    
    def grad(self, x):
        return self(x) * (1 - self(x))

The class Sigmoid is an object-oriented representation of the sigmoid function \( \sigma(x) = (1 + e^{-x})^{-1} \), encapsulating all the essential sigmoid-related functionalities: calling the function and its derivative. Nothing more, nothing less.

sigma = Sigmoid()

f"The value of the sigmoid at 0 is {sigma(0)}, while its derivative is {sigma.grad(0)}."

'The value of the sigmoid at 0 is 0.5, while its derivative is 0.25.'

Trust me when I say this: going from the procedural style to object-oriented is like firing up an FTL jump drive. As a math graduate student, learning object-oriented programming (OOP) supercharged my knowledge and performance when I started in machine learning.

A good design can go a long way, and in this chapter, we’ll set the framework for all of the machine learning models, with PyTorch and scikit-learn as our inspirations. (scikit-learn for classical models, and PyTorch for neural networks.)

5.1. Linear regression in OOP style#

Let’s move from concrete to abstract, ramping up the difficulty one step at a time. The textbook example of machine learning is linear regression, so we’ll start with that.

From a user standpoint, we only need two functionalities: fit the model to the data and use it for prediction — implementation first, discussions later. (Check out the original implementation if needed.) Here we go:

class LinearRegressor:
    def __init__(self, a=0, b=0):
        self.a = a
        self.b = b

    def __call__(self, x):
        return self.a * x + self.b
    
    def _grad_L(self, X, Y):
        n = len(X)
        da = sum([2*x*(self(x) - y) for x, y in zip(X, Y)])/n
        db = sum([2*(self(x) - y) for x, y in zip(X, Y)])/n
        return da, db
    
    def fit(self, X, Y, lr=0.01, n_steps=1000):
        for _ in range(n_steps):
            da, db = self._grad_L(X, Y)
            self.a, self.b = self.a - lr*da, self.b - lr*db

(According to software engineering best practices, the proper name of the class would be LinearRegressorWithMeanSquaredErrorOptimizedByGradientDescent, but that doesn’t exactly roll off the tongue. So, let’s stick with the basic yet descriptive LinearRegressor. Apologies to all enterprise developers here.)

In OOP terminology, the methods of a class form its interface. Let’s take a look at them one by one.

The __init__ method, present in all Python objects, is responsible for initializing the object. In this case, initializing means storing the two model parameters, \( a \) and \( b \).

model = LinearRegressor(a=0.5, b=-0.1)

The mathematical function defined by the model parameters is accessible via the __call__ magic method. (In Python, methods with leading and trailing double underscores are called magic methods.) We can access this by simply calling the object.

model(2.19)

0.995

Let’s fit the model with the good old mean-squared error + gradient descent combo! Although the gradient of the loss is not explicitly needed for a practitioning data scientist, it’s best to add it for convenience. This is the purpose of the _grad_L method.

As we don’t recommend the gradient to be called externally, we prefix the name with an underscore to hint this. Which is like giving admin rights to every user in your company, and politely asking them not to abuse it. Could work in a small company with a few trusted employees but fails in large organizations. (To make a class attribute/method truly private, prefix it with a double underscore.)

The model fitting is done by the fit method, requiring

the ground truth, that is, the feature X and the target Y,
the learning rate lr (defaulting to lr=0.01),
and the number of iterations n_steps (defaulting to n_steps=1000).

Let’s generate some data and do a test run.

n_train = 100
X_train = np.random.rand(n_train)
Y_train = 0.8*X_train + 1.2 + np.random.normal(scale=0.3, size=n_train)

model.fit(X=X_train, Y=Y_train, lr=0.1, n_steps=1000)

f"The true parameters are 0.8 and 1.2, while the fitted ones are {model.a} and {model.b}."

'The true parameters are 0.8 and 1.2, while the fitted ones are 0.7199183355249991 and 1.2378444847685972.'

Pretty good. Think about this: there are about 5+ chapters worth of knowledge wrapped inside these fifteen lines that are the implementation of LinearRegressor. (Not counting the empty lines.) The LinearRegressor class is user-friendly, clean, modular, and reusable.

Can we improve reusability? Yes, we can! Meet inheritance, the foundation of object-oriented programming.

5.1.1. Linear regression in OOP style, take two#

Although we’ve mainly used linear regression to get our feet wet with gradient descent, recall that there is an analytic solution to optimize the mean-squared error, requiring nothing but linear algebra.

Here it is, in its new object-oriented clothes.

class LinearLeastSquares:
    def __init__(self, a=0, b=0):
        self.a = a
        self.b = b

    def __call__(self, x):
        return self.a * x + self.b

    def fit(self, X, Y):
        n = len(X)

        X_dot_Y = np.dot(X, Y)
        X_dot_1 = np.dot(X, np.ones_like(X))
        Y_dot_1 = np.dot(Y, np.ones_like(Y))
        X_dot_X = np.dot(X, X)

        a_new = (n * X_dot_Y - X_dot_1 * Y_dot_1) / (n * X_dot_X - X_dot_1**2)
        b_new = (X_dot_X * Y_dot_1 - X_dot_1 * X_dot_Y) / (n * X_dot_X - X_dot_1**2)

        self.a = a_new
        self.b = b_new

The first thing that comes to mind is that the __init__ and __call__ methods are the same as for LinearRegressor. Duplication is a big yikes in coding circles. We’ll fix this faster than it would take Stack Overflow to downvote this snippet into oblivion.

What’s common in LinearRegressor and LinearLeastSquares? The underlying model. Let’s isolate it!

class LinearModel:
    def __init__(self, a=0, b=0):
        self.a = a
        self.b = b

    def __call__(self, x):
        return self.a * x + self.b

LinearModel purely represents a parametric linear function \( h(x) = ax + b \), without any way to fit the model to the data. These are added by the children classes.

class LinearLeastSquares(LinearModel):
    def fit(self, X, Y):
        n = len(X)

        X_dot_Y = np.dot(X, Y)
        X_dot_1 = np.dot(X, np.ones_like(X))
        Y_dot_1 = np.dot(Y, np.ones_like(Y))
        X_dot_X = np.dot(X, X)

        a_new = (n * X_dot_Y - X_dot_1 * Y_dot_1) / (n * X_dot_X - X_dot_1**2)
        b_new = (X_dot_X * Y_dot_1 - X_dot_1 * X_dot_Y) / (n * X_dot_X - X_dot_1**2)

        self.a = a_new
        self.b = b_new

The new version of LinearLeastSquares is a a fully functional machine learning model, with __init__ and __call__ methods inherited from the parent class LinearModel.

Checking it out:

lls = LinearLeastSquares()
lls.fit(X=X_train, Y=Y_train)

f"The parameters of our fitted linear least squares model is {lls.a} and {lls.b}."

'The parameters of our fitted linear least squares model is 0.7199176876380874 and 1.2378448393645893.'

f"By the magic of inheritance, we are also able to call our model: {lls(0.5)}."

'By the magic of inheritance, we are also able to call our model: 1.597803683183633.'

Here’s the new version of LinearRegressor, a children class of LinearModel.

class LinearRegressor(LinearModel):
    def _grad_L(self, X, Y):
        n = len(X)
        da = sum([2*x*(self(x) - y) for x, y in zip(X, Y)])/n
        db = sum([2*(self(x) - y) for x, y in zip(X, Y)])/n
        return da, db
    
    def fit(self, X, Y, lr=0.01, n_steps=1000):
        for _ in range(n_steps):
            da, db = self._grad_L(X, Y)
            self.a, self.b = self.a - lr*da, self.b - lr*db

Can we abstract away all the details of machine learning models to obtain a parent class for all machine learning models? We’ll deal with this problem next.

5.2. The base class#

Our journey will be long, and we’ll take inheritance pretty far. (Especially when dealing with neural networks.)

For now, the goal is to construct a convenient interface for machine learning algorithms. Unfortunately, there is no one-size-fits-all solution: the more general we aim, the less commonalities we find. OOP is a slippery slope, and we don’t want to overdo it. Let’s stick with the three interface methods:

__init__ for initializing the parameters,
__call__ for model evaluation,
and fit to fit the model to the data.

Thus, our interface is born:

class Model:
    def __init__(self):
        """
        Initializes the model parameters.
        """
        pass

    def __call__(self, x):
        """
        Evaluates the model at x.
        """
        pass

    def fit(self, X, Y):
        """
        Fits the model to the data.
        """
        pass

The Model class provides no implementation details, only the interface. This is the price we pay for the generality. Think of it as a mental blueprint for us to guide our development.

So, here is the final version of our LinearRegressor.

class LinearRegressor(Model):
    def __init__(self, a=0, b=0):
        self.a = a
        self.b = b

    def __call__(self, x):
        return self.a * x + self.b
    
    def _grad_L(self, X, Y):
        n = len(X)
        da = sum([2*x*(self(x) - y) for x, y in zip(X, Y)])/n
        db = sum([2*(self(x) - y) for x, y in zip(X, Y)])/n
        return da, db
    
    def fit(self, X, Y, lr=0.01, n_steps=1000):
        for _ in range(n_steps):
            da, db = self._grad_L(X, Y)
            self.a, self.b = self.a - lr*da, self.b - lr*db

With the interface under our belt, we are ready to turn complexity up a notch! The next stop: linear regression in multiple variables.

5.3. Problems#

Problem 1. Implement the logistic regression model using the interface of the Model base class.

5.4. Solutions#

Problem 1. First, we define the utility function sigmoid, then we implement the logistic regression model.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

class LogisticRegressor(Model):
    def __init__(self, a=0, b=0):
        self.a = a
        self.b = b
    
    def __call__(self, x):
        return sigmoid(self.a * x + self.b)
    
    def _grad_L(self, X, Y):
        n = len(X)
        da = sum([x*(self(x) - y) for x, y in zip(X, Y)])
        db = sum([self(x) - y for x, y in zip(X, Y)])
        return da, db

    def fit(self, X, Y, lr=0.01, n_steps=1000):
        for _ in range(n_steps):
            da, db = self._grad_L(X, Y)
            self.a, self.b = self.a - lr*da, self.b - lr*db