*supervise*, we guess that we’ll use a data which is identified and separated properly (we know what result we get when we give specific input) to create a model. Some examples of supervised learning can be house prediction problems, understanding a handwritten text, spam mail detection. There are various algorithms in supervised learning. But as the first thing in supervised learning, we’ll start with Linear Regression.

*w*is called a weight/parameter.

*x*is the input/feature and

*b*is the bias.

#### Parameters

*are the values that control the behavior of the system. In machine learning, they are mostly called weights because the value of the parameters affects the system. So, an increase in the specific part of the weight vector increases the effect of that part on the prediction. And also, a decrease in a specific part of the weight vector decreases the effect of that part on the prediction. When the amount of the weight value is huge, it has a greater effect on the prediction. If it’s zero, there is no effect.*

**Parameters**#### Inputs/Features

*or generally called*

**Inputs***in ML are the values which represented by*

**features***x*in our equation. Features are the direct affecting values in our function. For example, if we would like to predict house prices, our features can be the number of rooms, size of the house and age. We’ll predict the price of a house by looking into those features with help from our parameters.

#### Bias

*is generally included in the definition of parameters. This is another value we would like to estimate. We can think of this value as the representation of our estimates when all features are equal to zero. In ML, we would like to learn weights and bias so-called parameters of our model (for now, linear regression function).*

**Bias**The red dots in the graph above represents our data set. Let’s guess some parameters and bias and draw our linear line into our data set like above. This seems like a pretty good guess when you think the whole data. But we will have errors shown with gray vertical lines in the graph. Those errors are our cost of not being able to fit all the data. So, every approach will come with a cost and we’ll measure our prediction’s performance according to the cost.

## Cost/Error Function (measuring the performance)

As we said, we are using the cost function to measure the performance of the model. But how do we calculate the error in general? As stated in the graph as gray vertical lines, the difference between actual value and the predicted value gives the error. By calculating the error we can measure the accuracy of our prediction function. After finding the error, our main idea will change and we’ll try to choose the parameter (w) and bias (b) values so that our cost will be minimum. Calculating the error comes from the simple mathematical approach; calculating the distance between two points. Therefore, we’ll use mean squared error.

We can see in the formula that we subtract the actual value from the predicted value. And we take squares to get rid of negativity. Lastly, we take the average of our error in the whole of our data set. So, this measures the error. When the and *y* are equal, the error will be 0. When the Euclidian distance between predicted value and actual value increases, the error increases as well.

#### Last Words

*w*and

*b*to decrease the cost. Machine learning and deep learning is based on minimizing the cost with different approaches. We hear the terms like curve fitting, global minimum in the curve many times. It comes from the error function. As we saw above the error function takes squares to get rid of negativity. So, if we try to draw the graph of error function it will be in a curve shape (in simple linear regression case, parabolic). Finding the minimum point of a curve is the main idea.

Featured Image: Photo by Alex Perez on Unsplash