Rules for converting lambda calculus expressions to SKI combinator calculus expression? Which rule(s) is/are incorrect?

Question

learnxinyminutes.com defines $I$, $K$, and $S$ as follows:

I x = x
K x y = x
S x y z = x z (y z)

Then they give the following correspondences to aid in the conversion between lambda calculus and SKI combinator calculus:

λx.x = I
EDIT: λx.c = Kc provided that x does not occur free in c (see Mark's answer)
λx.(y z) = S (λx.y) (λx.z)

I tried to expand on these correspondences as follows:

3'. λx.(y z) = S (λx.y) (λx.z) = S (Ky) (Kz) (apply 2. to λx.y and to λx.y)

λx.(y z) = K(y z) (apply 2. with c = yz)

But that lead to incorrect results when I used 3'. and 4. to convert the number $2$ from its lambda calculus representation $λf.λx.f(f x)$ to its SKI combinator calculus representation (see my previous question for the incorrect result). So that means that 3'. and/or 4. are incorrect. Which step or steps in 3.' and/or 4. are incorrect?

Mark Saving · Accepted Answer · 2021-11-12T19:41:43.843

The rules are phrased imprecisely.

To be precise, consider the language of the $\lambda$ calculus extended with $S, K,$ and $I$. The rules provide an algorithm for taking a term $\lambda x . M$ in this language, where $M$ has no $\lambda$ abstractions in it, and outputting an equivalent term with no abstractions at all in it, where the free variables in the resulting term all occur free in $\lambda x . M$.

Rule 2 says that when $x$ and $c$ are distinct variables or when $c$ is either $S, K$ or $I$, we should rewrite $\lambda x . c$ as $Kc$. The equality in rule 2 only holds when $x$ does not occur free in $c$, so applying this rule out of context can lead to an incorrect result.

To simplify $2 \equiv \lambda f . \lambda x . f(fx)$, we would first simplify $\lambda x . f(fx)$. We would write this according to rule 3 as $S (\lambda x . f) (\lambda x . fx)$, which we would then rewrite as $S (Kf) (S (\lambda x . f)(\lambda x . x))$, which we would then rewrite as $S (Kf)(S (Kf) I)$. Note that we have eliminated the variable $x$ and all instances of $\lambda$ abstraction.

We then apply the rules to $\lambda f . S (Kf) (S (Kf) I)$, which would produce a lengthy but correct term. We would convert to $S (\lambda f . S (Kf)) (\lambda f . S (Kf) I)$. Note that $\lambda f . S (Kf)$ becomes $S (\lambda f . S) (\lambda f . Kf)$, which becomes $S (K S) (S (KK) I)$. And $\lambda f . S (Kf) I$ becomes $S (\lambda f . S (Kf)) (\lambda f . I) = S (S (K S) (S (KK) I)) (KI)$. So the final result should be $S (S (K S) (S (KK) I)) (S (S (K S) (S (KK) I)) (KI))$ unless I've messed something up (which is very plausible).

Edit: this is definitely not the most efficient algorithm for taking a closed term in the $\lambda$ calculus and producing an equivalent term in the SKI calculus, either in running time or in the size of the output. One optimisation which reduces output size is looking for opportunities to use $\eta$-reduction, such as with the term $\lambda x . fx$, which can be rewritten as $f$. Our algorithm instead rewrites it as $S (Kf) I$. But this technique is limited.

If you want to use SKI as a step in compiling a language which will be used in real-world software, it's critical that the translation not blow up the size of the code super-linearly. So it's a good thing that there is a linear time and linear space algorithm for doing this translation.

The trick is to add new combinators. The idea is this: First, extend the SKI combinators to a new, larger finite set $J$ of combinators. Then, come up with a linear time and linear space algorithm for taking $\lambda$ closed terms and turning them into $J$ closed terms. Then, substitute each term in $J$ for an equivalent term in the SKI calculus. Since there are only finitely many terms in $J$, the substitution step is also linear time and linear space.

In fact, this technique means that once we have found one complete set of combinators $J$ and a good technique for writing $\lambda$-terms as $J$-terms, we can use this technique to get the representation in terms of any other finite, complete set of combinators.

Thanks! That makes sense. Is .. another representation of 2? (I was working with ..().) If .. is not a representation of 2, you can just edit out the $2 ≡$. Re: "We then simplify .≡(.)(.)≡()", that's one place where an reduction could have been used, right? To simplify . to instead of (). — joseville, Nov 12 '21 at 17:12
Should rule 2 be "λx.c = Kc provided that c is free of x"? Or "λx.c = Kc provided that x does not occur free in c? The latter would allow c to rebind x as in c = λx.x s.t. λx.λx.x could be written as K (λx.x) via rule 2.? — joseville, Nov 12 '21 at 17:19
@joseville You are correct that I accidentally used the wrong term for $2$. I've rewritten it. — Mark Saving, Nov 12 '21 at 17:21
@joseville There are two ways to view rule 2. One is as a statement of equality. In this case, "$\lambda x . c = Kc$ provided that $x$ does not occur free in $c$" is the best statement. The other is as part of a pattern match in an algorithm. In this case, rule 1 covers the case where $c$ is $x$ and rule 3 covers the case where $c = yz$, so the only case left is the one where $c$ is either $S, K, I$, or another variable. — Mark Saving, Nov 12 '21 at 17:22
gotcha re: rule 2. Also, I double checked your new edit and the steps seem correct to me. Indeed a lengthy term. Thanks for churning through it. You should edit back your note about eta reduction. It was illuminating: "Note that we can optimise this algorithm by using reductions, which gives us .≡. Using reductions when possible can give us much shorter resulting terms. But keep in mind that there is no algorithm to produce the shortest possible term which is equivalent to the original, since this would allow us to solve the halting problem." — joseville, Nov 12 '21 at 17:31

NinjaDarth · Answer 2 · 2024-03-07T19:15:27.093

The method I used on Combo, which I recently put up on GitHub, is the following: $$ λx.x = I,\quad λx.a = Ka,\quad λx.xx = D,\quad λx.xb = Tb,\quad λx.ax = a,\\ λx.xv = Ub,\quad λx.ux = Wa,\quad λx.av = Bab,\quad λx.ub = Cab,\quad λx.uv = Sab, $$ where $a$ and $b$ are terms containing no free occurrences of $x$ in them, and where $u$ and $v$ are terms containing free occurrences of $x$ in them in which $a = λx.u$ and $b = λx.v$ (thus $u = ax$ and $v = bx$ under the β-rule). This is equivalent to taking the simplified abstraction rule $$λx.x = SKK,\quad λx.y = Ky,\quad λx.uv = Sab,$$ where, this time, $u$ and $v$ can be any terms, but $y$ is restricted to being just a variable other than $x$ ... and applying the optimization rules: $$ I = SKK,\quad K(ab) = S(Ka)(Kb),\quad SII = D,\quad SI(Kb) = Tb,\quad S(Ka)I = a,\\ SIb = Ub,\quad SaI = Wa,\quad S(Ka)b = Bab,\quad Sa(Kb) = Cab. $$ If you run Combo on $S\ (K\ a)\ (K\ b)$ it will say it's in normal form already. However, if you run Combo on it, with the extensionality axiom (i.e. the η-rule: $λx.ax = a$ where $a$ has no free occurrences of $x$ in it), then it will reduce it to $K\ (a\ b)$.

The weak normal forms are obtained by applying reductions with only the β-rule, while strong normal forms also use the η-rule. To reduce a term $N$ in weak normal form to strong normal form, tack on an extra variable $x$ that does not occur freely in $N$, reduce $Nx$ to (strong) normal form - which I'll denote $[Nx]$ - and then abstract out the $x$: $λx.[Nx]$.

A full set of axioms to embody the η-rule could be obtained by taking the above-mentioned optimization rules, along with the results obtained by abstracting them with respect to $a$ and $b$. For instance, from $SaI = Wa$, one also obtains $λa.(SaI) = λa.Wa$, i.e. $CSI = W$. For $T$, one would get $λb.SI(Kb) = Tb$, i.e. $B(SI)K = T$. For $U$, one gets $SI = U$. Similarly, for $B$, one gets $S(Ka) = Ba$, and then $BSK = B$ ... or $S(KS)K = B$ if applying the simplified abstraction rule. Likewise, for $C$, one gets $B(Sa)K = Ca$ and $C(BBS)K = C$, or $S(S(KB)S)(KK) = C$ if applying the simplified abstraction rule, which yields the equation $C = S(BBS)(KK)$, if also applying the rule $S(Ka) = Ba$ to $S(KB)$ to get $BB$.

Abstracting out the optimization rule for $K$ - which is what your question was about - would get $BKa = B(S(Ka))K$ and then $BK = C(BB(BSK))K$.

I think these all suffice, together, to capture the η-rule.

You can't apply the abstraction rule for $K$ consistently without the qualifiers I listed, because of a conflict with the $S$ rule on the term $a b$ - unless you adopt the rule $K(ab) = S(Ka)(Kb)$ as an extra axiom - i.e. apply an instance of the η-rule. So, the η-rule is needed to get the different weak normal forms to match up and yield the same strong normal form.

Rules for converting lambda calculus expressions to SKI combinator calculus expression? Which rule(s) is/are incorrect?

2 Answers2

Linked