An example of this is with regard to the variation of the Lagrangian density $\mathcal{L}(\phi(x^{\mu}),\partial_{\mu}\phi)$:
$$ \delta\mathcal{L}=\frac{\partial{\mathcal L}}{\partial\phi}\delta\phi+\frac{\partial\mathcal{L}}{\partial(\partial_{\mu}\phi)}\delta(\partial_{\mu}\phi). $$
My question is when and why is it appropriate to say that $\delta(\partial_{\mu}\phi)=\partial_{\mu}(\delta\phi)$ and is there a proof to show this?