Indeed $\mathbf{S}$ is typeable:
- $ x : \alpha \rightarrow \beta \rightarrow \gamma $.
- $ y : \alpha \rightarrow \beta $.
- $ z : \alpha $.
- $ (x\ z) : \beta \rightarrow \gamma\ $ (from 1,3).
- $ (y\ z) : \beta\ $ (from 2,3).
- $ (x\ z)(y\ z) : \gamma\ $ (from 4,5).
- $
\lambda x. \lambda y . \lambda z . (x\ z)(y\ z)
\ : \
(\alpha \rightarrow \beta \rightarrow \gamma)
\rightarrow
(\alpha \rightarrow \beta)
\rightarrow
\alpha
\rightarrow
\gamma
\ $ (from 1,2,3,6).
If you start from variable occurrences and propagate the information you infer from their usage (it is applied to something, then it must be an arrow of something; or it is the argument of something, then it must have the same type of the leftmost type of the something's arrow type), then you either find conflicting constraints, or conclude the process, thus having a counterexample or a type.
Roughly, it is the idea behind Hindley–Milner W algorithm for type inference.
Let us see how to do it by considering $(\mathbf{S I})$ and generalising our previous argument.
- Let $ \lambda w. w : \delta \rightarrow \delta $.
- Then we need to unify our previous type of $x$ in 1 with those of 8,
i.e. $\alpha \rightarrow (\beta \rightarrow \gamma)$
with $\delta \rightarrow \delta$.
This means that $ \alpha = \delta $
and $\beta\rightarrow\gamma\ =\ \delta $.
- Hence we found the right type unifier;
let us substitute $ \beta \rightarrow \gamma $
for $ \alpha $ in 7, and also for $ \delta $ in 8.
We obtain
$$
\mathbf{S}
\ : \
((\beta \rightarrow \gamma) \rightarrow \beta \rightarrow \gamma)
\rightarrow
((\beta \rightarrow \gamma) \rightarrow \beta)
\rightarrow
(\beta \rightarrow \gamma)
\rightarrow
\gamma
$$
and
$$
\mathbf{I} \ :\ (\beta \rightarrow \gamma) \rightarrow \beta \rightarrow \gamma
\text{.}
$$
We conclude:
$$
(\mathbf{SI})
\ :\
((\beta \rightarrow \gamma) \rightarrow \beta)
\rightarrow
(\beta \rightarrow \gamma)
\rightarrow
\gamma
\text.
$$