This answer is underway. I am posting this version to get some feedback and improve it based on the comments and the current examples and arguments, I am developing. I would like to have good examples which do not have the same shortcomings as those we can find in lecture notes or the like.
This is a very interesting question and it has a lot to do with a terminology issue. The term causality here is not necessarily wisely chosen. The issue here is more about realizability of a transfer function and its implementation in terms of passive and active network elements (i.e. resistance, capacities, inductance, and amplification). In fact, a necessary and sufficient condition for a transfer function to be realizable is that it is proper. In many textbooks, causality is actually an axiom; see e.g. Hinrichsen and Prichard, "Mathematical Control Theory I".
Non-causal systems are still interesting as they can serve as benchmarks when they are optimal solutions of certain control/filtering/estimation problems that need to be approximated
Causality is defined for dynamical systems is that the output at a certain $t$ only depends on the present and the past, that is, the consequence must only follow from the cause. In this regard, if you consider the system $y(u)=\dot{u}(t)$ and you consider $u$ to be a continuously differentiable function, then this system/operator is causal on that space because we have that
$$\dot{u}(t)=\lim_{h\to 0^+}\dfrac{u(t+h)-u(t)}{h}=\lim_{h\to0^+}\dfrac{u(t)-u(t-h)}{h}.$$
So, future is not needed to compute $\dot{u}$ by virtue of the last expression. Note that this ceases to be true if the signal is not differentiable at some points (i.e. nonsmooth or discontinuous). Another example is the system
$$y(t)=\dot{u}(t-\tau),\ \tau>0$$
which is obviously causal since the present only depends on the past. Yet, the transfer function is $H(s)=se^{-s\tau}$ and has one zero at zero, but the difference here is that the transfer function is not rational. Note, however, that in spite of that, it is still difficult to implement this operator using analog components.
Argument 1. This is not fully correct. If we start with a transfer function of the form
$$H(s)=\dfrac{b_ns^n+\ldots+b_0}{s^n+\ldots+a_0},$$
where the poles and zeros are distinct, then the corresponding controllable canonical form is given by $(A,B,C,D)$ where
$$A=\begin{bmatrix} 0 & 1 & 0 & \ldots & 0\\
0 & 0 & 1 & \ldots & 0\\
\vdots & \vdots & \vdots & \ddots & \vdots\\
0 & 0 & 0 & \ldots & 1\\
-a_0 & -a_1 & -a_2 & \ldots & -a_{n-1}
\end{bmatrix},\ B=\begin{bmatrix}0\\ 0 \\\vdots\\0 \\1\end{bmatrix}$$
$$C=\begin{bmatrix}
b_0-a_0b_n & b_1-a_1b_{n} & b_2-a_2b_{n} & \ldots & b_{n-2}-a_{n-1}b_n
\end{bmatrix},\ D=b_n.$$
Clearly, this representation shows that we do not need to differentiate the input to get the output. We just integrate the input and then select the right linear combination of the states in order to get the output from the state and the input. If you take the transfer function $H(s)=s$, or any non-proper transfer function, then you can see that this procedure is not possible.
Another issue is that a non-proper transfer function cannot be BIBO stable due to the presence of a pole at $\infty$. Just take the derivative operator with transfer function $H(s)=s$ and consider the input $u(t)=\sin(\omega t)$. Clearly, we have that $y(t)=\dot{u}(t)=\omega \cos(\omega t)$ which verifies $||y||_{L_\infty}=\omega$. This means that the system has no finite $L_\infty$-gain. Even worse, a bounded input that exponentially converges to zero can lead a unbounded one. Take, for instance, $$u(t)=e^{-t}\sin(e^{2t}),$$ which verifies $|u(t)|\le e^{-t}\to0$ as $t\to\infty$. Its derivative is given by
$$\dot{u}(t)=-e^{-t}\sin(e^{2t})+2e^t\cos(e^{2t}),$$
which obviously grows without bound. It is a common fact that non-proper systems can not be BIBO stable due to the presence of poles at $\infty$. So, in the end, non-proper systems are not very convenient to work with (causal or not) because they are not as nicely behaved than proper ones.