Chapter 8.7 of the Deep Learning Book says "Suppose our cost function has put a gradient of 1 on $\hat y$ …" Does it mean the derivative of the cost function w.r.t $\hat y$ is equal to 1?
For example, my model has 2 hidden layers and 1 output layer, so
$\hat y=xw_1w_2w_3$
Suppose the model is using mean squared error as loss function, then the loss for n data points is defined as
$loss{(\hat y, y)} = {\dfrac {1}{n}}\sum _{i=1}^{n}(y_{i}-{\hat y_{i}})^{2}$
where both $\hat y$ is the output vector the model predicts and $y$ denotes the corresponding ground truth label.
Does "put a gradient of 1 on $\hat y$" mean the following?
$\dfrac{\partial \ loss{(\hat y, y)}}{\partial \hat y} = 1$
The part cited above comes from Page 314 of the book
