The progression from a simple linear equation to its matrix form and then to a hypothesis function is designed to make the relationship between inputs and outputs scalable and efficient. A basic linear equation describes how a single input influences an output, but in real-world problems, we often deal with multiple inputs and data points. Representing the data in matrix form allows us to handle large datasets and multiple variables simultaneously, leveraging the power of linear algebra for computations. The hypothesis function further generalizes this idea by encapsulating the relationship between inputs and outputs into a concise, reusable formula, making it suitable for machine learning models that predict outcomes based on patterns in the data. This progression simplifies the process of learning from and making predictions on complex datasets.
Start with equation of a line in it’s simplest form (slope intercept form)
For n data points, it becomes:
Use matrix notation for multiple points
Multiple values of y can be represented by a n x 1 matrix. Same applies to multiple values of x but let’s add a row of 1’s, so that when X is multiplied with \( \theta \), we get mx + b
\(\mathbf{y} = \mathbf{X} \boldsymbol{\theta}
\)
Expanded form:
Let’s introduce a popular notation for predicted values, \( h_\theta(x) \) instead of y. This \( h_\theta(x) \) is called a hypothesis function. It means that it will output values based on input x and slope and intercept
Where feature vector ():
Parameter vector (
For multiple data points:
Predicted outputs:
This notation is more popular in the data science field \( \theta \) represents the model parameters. There an can be many model parameters and they need to be determined to predict values of y ( or hypothesis function).
In a subsequent post we will find the close form for hypothesis function. It means a closed form equation in matrix form, that can be used to calculate a vector of values for vector \( \theta \)
The closed form for refers to the mathematical solution used to directly compute the parameters () of a linear regression model. In the context of machine learning, it provides a formula to find the optimal that minimizes the error between the predicted and actual values. This is achieved by solving for using matrix operations, without the need for iterative optimization.