I am doing a project in deep learning and I have been taking Andrew's machine learning course from youtube. I am having difficulty in understanding the working of cost function. given the equation below
J(θ)=minθ 1/2 i=1∑m (hθ(x(i))−y(i))2
where m is #of training examples lets say 20
this cost function calculates the error in prediction due to parameters in hypothesis
hθ(x)=θ0+θ1x1
where x1 is lets say number of bedrooms in a house and we want to find the cost of the house my question is why is x0=1 here
secondly, what is the initial value of θ here, is it random at first just like in gradient descent?
what i understand is suppose the hθ(x) predicts the cost of house, it predicts it by using different values of θ (keeping x input same) until when hypothesis value is put into cost function the cost is minimum. and the cost function works like it takes hθ(x) and subtracts it with (actual) y(i) and sums up the difference for all 20 training examples. meaning that it will calculate difference with all the values of training examples.
so y(i) here is the training set (all the cost of house values) and y(i) will be changed and different values will be used until all 20 training examples are checked or when minimum value of cost function is achieved?
in short what i am really confused about is that we will calculate cost by comparing hypothesis with every training example? and calculate hypothesis by changing values of θ and then use it in cost to get minimum prediction error?
please let me know if my concepts are correct and correct me if i am wrong.