In one of the steps for the proof of Theorem 2.3.20, Grafakos shows that you can pass a continuous, linear functional through an integral by considering the convergence (in the topology of the Schwartz space on $\mathbb{R}^n$) of the Riemann sum of the inner integral. In particular, Grafakos shows that
$$ \sum_{m=1}^{(2N^2)^n} x^{\alpha} \partial_x^{\beta} \widetilde{\varphi}(x - y_m) \psi(y_m) |Q_m| \to \int_{\mathbb{R}^n} x^{\alpha} \partial_x^{\beta} \widetilde{\varphi}(x - y) \psi(y) dy $$
as $N \to \infty$ (in $L^{\infty}$), where we consider a partition on $[-N, N]^n$ into $(2N^2)^n$ cubes $Q_m$ of side length $1/N$ and with $y_m$ as the center of each cube $Q_m$.
Now, for the actual question. In the proof, Grafakos states that
$$ x^{\alpha} \partial_x^{\beta} \widetilde{\varphi}(x - y_m) \psi(y_m) |Q_m| - \int_{Q_m} x^{\alpha} \partial_x^{\beta} \widetilde{\varphi}(x - y) \psi(y) dy = \int_{Q_m} x^{\alpha} (y - y_m) \nabla(\partial_x^{\beta} \widetilde{\varphi}(x - \cdot) \psi)(\xi) dy $$
for some $\xi = y + \theta(y_m - y)$, where $\theta \in [0, 1]$.
I'm stuck on Grafakos' derivation of the estimation above. In particular, I am not sure on where the gradient operator comes from (I suspect it's an integration by parts argument, but then I'm not sure where $\xi$ comes from).
If anyone could shed some light on the thought process behind the estimation, I would greatly appreciate it.