I think an explanation around standard ridge regression might help understand whats going on.
we need to learn $f(x) = \sum_{i=1}^N x_i,w_i = \langle x,w\rangle=x^Tw$
and with a ridge penalty our minimisation problem becomes
$w= argmin_w \|y-Xw \|^2 + \lambda \|w\|^2 $
Which when you do the whole taking derivative of $w$ and setting to zero gives the solution
$w=(X^TX + \lambda I)^{-1} X^Ty$
Through some simple rearranging we can arrive at a form $w=\sum \alpha x$. Here's how
$w=(X^TX + \lambda )^{-1} X^Ty$
$wX^TX + w\lambda = X^Ty$
$w = \frac{1}{\lambda } ( X^Ty - wX^TX) = X^T \frac{1}{\lambda } (y-Xw) $
lets set $ \alpha= \frac{1}{\lambda } (y-Xw)$
and therefore $w = \sum_i \alpha_i x_i $
Now if we substitute this into the above original solution for $w$ and arrive what is known as the dual solution (as opposed to the original primal solution)
$\alpha = (XX^T + \lambda I)^{-1} y$
ok so if we go back to our regression basis $x^T w$ we can write this as
$x^T w = x^T (X^T\alpha) = \langle x, \sum_i \alpha_i x_i \rangle = \sum_i \alpha_i \langle x, x_i \rangle$
So now we have arrived at a form that involves only inner products. And so in a simplistic sense we can replace $\langle x, x_i \rangle$ with $k(x,x_i)$. Obviously there are many important details but this is just some intuition about whats going on.
So the steps now are
compute alpha
$\alpha = (K + \lambda I)^{-1} y$ where $K$ is your kernel/gram matrix
evaluate a new point by $\sum_i \alpha_i \langle x, x_i \rangle$