This is quite an interesting question.
From the point where you are stuck, the main insight is that one can construct using
the assumption of separate continuity a new sequence which then yields a contradiction
to the condition that $f(K)$ is always compact if $K$ is.
Let us dive into the details.
Step 1:
I claim that if $Q \subset \mathbb{R}^2$ is a rectangle, then $f(Q)$ is an interval.
This follows more or less easily by noting that in a rectangle, you can always connect two
points by a path which consists of two parts: in the first part, only the first coordinate
changes, and in the second part only the second coordinate changes.
One then uses the separate continuity of $f$ and the intermediate value theorem.
For completeness, here is a formal proof.
By definition of an interval, what we need to show is that if $y = f(\xi), z = f(\eta) \in f(Q)$,
and if $y < t < z$, then also $t \in f(Q)$.
To see this, write $\xi = (\xi_1,\xi_2), \eta = (\eta_1,\eta_2)$, and consider the paths
$\varphi : [0,1] \to Q, t \mapsto \big( (1-t) \xi_1 + t \, \eta_1, \xi_2 \big)$ and
$\psi : [0,1] \to Q, t \mapsto \big( \eta_1, (1-t) \xi_2 + t \, \eta_2 \big)$,
which are both well-defined (they really take values in $Q$) since $Q$ is a rectangle.
Now, since $f$ is separately continuous, it follows that $f \circ \varphi$ and $f \circ \psi$
are continuous.
By the intermediate value theorem, it follows that $f(\varphi([0,1]))$
and $f(\psi([0,1]))$ are both intervals, which intersect since $\varphi(1) = \psi(0)$.
Therefore, their union is again an interval.
We have thus shown $f(\xi), f(\eta) \in I := f(\varphi([0,1])) \cup f(\psi([0,1])) \subset f(Q)$,
from which we get $t \in I \subset f(Q)$, since $I$ is an interval.
Step 2:
Assume that $f$ is not continuous at $x_0 \in \mathbb{R}$, so that there is $\epsilon_0 > 0$
and a sequence $(x_n)_{n \in \mathbb{N}}$ such that $x_n \to x_0$, but $|f(x_n) - f(x_0)| > \epsilon_0$
for all $n \in \mathbb{N}$.
By taking a subsequence, we can assume that either $f(x_n) \geq f(x_0) + \epsilon_0$ for all $n$,
or that $f(x_n) \leq f(x_0) - \epsilon_0$ for all $n$.
For simplicity, I only consider the first case in what follows.
Note that $x_n \neq x_0$ and hence $\epsilon_n := \| x_n - x_0 \|_{\ell^\infty} > 0$ for all $n$.
Define $Q_n := x_0 + [-\epsilon_n, \epsilon_n]^2$ for all $n \in \mathbb{N}$, noting that $Q_n$ is a rectangle
with $x_n, x_0 \in Q_n$.
By Step 1, this implies that $f(Q_n) \supset [f(x_0), f(x_0) + \epsilon_0]$ for all $n \in \mathbb{N}$.
We can therefore choose $y_n \in Q_n$ satisfying $f(y_n) = f(x_0) + \frac{\epsilon_0}{2} + \frac{1}{n}$,
at least for $n$ so large that $n^{-1} < \epsilon_0 / 2$.
Because of $y_n \in Q_n = x_0 + [-\epsilon_n, \epsilon_n]^2$ and $\epsilon_n \to 0$,
we have $y_n \to x_0$, so that $K := \{ y_n \colon n \in \mathbb{N} \} \cup \{x_0\}$
is compact.
But by construction, we have
$$
f(K)
= \Big\{ f(x_0) + \frac{\epsilon_0}{2} + \frac{1}{n} \colon n \in \mathbb{N} \Big\} \cup \{ f(x_0) \},
$$
from which it is easy to see that $f(K)$ is not compact.
This is the desired contradiction.