A regular expression for the words that don't contain the sequence $ab$ over $\{a,b,c\}$

Question

The following is an exercise in a book I am reading:

Let $\Sigma=\{a,b,c\}$, define $L$ to be the language of all words over $\Sigma$ that do not contain $ab$ as a sub-word. Find a regular expression for $L$.

I am unable to solve this problem, I would like to know how to tackle this type of question, what is the solution and what is the thought process to get to it

You have a group generated by these letters and with the relation $ab=1$, since that you cancel out all the subwords $ab$.
For example, a word would be $w=a(ab)c^2a^3=ac^2a^3$. — Sigur, Aug 21 '12 at 00:09
@Sigur - this is a nice way to look at this, but how does this help ? — Belgi, Aug 21 '12 at 00:27
I am not sure if it is possible to obtain a regular expression, since this group is infinite. Maybe you can try to read about finitely generated groups and its quotient by the subgroup generated by $ab$. — Sigur, Aug 21 '12 at 00:33
@GerryMyerson, yes. I saw the keywords. Sorry, but I only know the algebra. — Sigur, Aug 21 '12 at 00:34

Brian M. Scott · Accepted Answer · 2012-08-22T02:10:40.640

One easy if tedious approach is to design a finite-state automaton that recognizes $L$ and then apply the algorithm that converts such an automaton to a corresponding regular expression.

Going about it directly, I observe that if $w\in L$, every $a$ in $w$ must be immediately followed by another $a$ or a $c$ or be at the end of the word; otherwise there are no restrictions. Similarly, every $b$ must be immediately preceded by another $b$ or a $c$ or be at the beginning of the word. Assume for the moment that $w$ does not begin with $b$ or end with $a$. If $w$ begins with $a$, it must begin $aa^*c(b\mid c)^*$. This pattern may be repeated any number of times: $\big(aa^*c(b\mid c)^*\big)^*$. To allow it to end with $a$, just tack on $a^*$: $\big(aa^*c(b\mid c)^*\big)^*a^*$. To allow it to begin with $b$, prefix $b^*$: $b^*\big(aa^*c(b\mid c)^*\big)^*a^*$. In fact, a little thought reveals that we can instead prefix $(b\mid c)^*$ to cover all cases:

$$(b\mid c)^*\big(aa^*c(b\mid c)^*\big)^*a^*\tag{1}$$

Added: Gerry Myerson’s approach in the comments is more elegant, though I’d carry it out a little differently. $\Sigma^*=(a\mid b\mid c)^*$, and we want to keep all of it that doesn’t have an $a$ followed immediately by a $b$. Thus, except at the end of a word we want to replace the selection $a$ by one of $ac,aac,aaac$, etc. If we allowed infinite disjunctions, this would give us $$(b\mid c\mid ac\mid aac\mid aaac\mid\dots)^*a^*\tag{2}$$ for $L$, where the last $a^*$ is to allow the word to end with $a$. We don’t allow such expressions, but $ac\mid aac\mid aaac\mid\dots$ can be written legitimately as $aa^*c$, and $(2)$ can then be written legitimately as $$(aa^*c\mid b\mid c)^*a^*\;.\tag{3}$$

It’s not hard to see that the languages described by $(1)$ and $(3)$ are subsets of $L$: neither regular expression permits $ab$. To see that these languages include all of $L$, note first $\lambda$, the empty word, is in both. Now let $w=x_1^{n_1}\dots x_m^{n_m}\in L$ be non-empty, where $x_k\in\Sigma$ and $n_k\in\Bbb Z^+$ for $k=1,\dots,m$, and $x_k\ne x_{k+1}$ for $k=1,\dots,m-1$. Assume further that $w$ is minimal in length among all words of $L$ not matching one of the regular expressions $(1)$ and $(3)$.

If $a$ does not occur in $w$, $w$ clearly matches both $(1)$ and $(3)$, so let $i$ be the first index such that $x_i=a$, and let $u=a^{n_i}x_{i+1}^{n_{i+1}}\dots x_m^{n_m}$. To show that $w$ matches $(1)$, it suffices to show that $u$ matches $\big(aa^*c(b\mid c)^*\big)^*a^*$; to show that $w$ matches $(3)$, it suffices to show that $u$ matches $(3)$. Both of these are clear if $u=a^{n_i}$, since in that case $u$ matches the final $a^*$ of each regular expression. Otherwise, $x_{i+1}=c$, and $a^{n_i}c$ matches $aa^*c$ in both regular expressions. Now let $v$ be what’s left of $u$ after the initial $a^{n_i}c$ is removed. Then $w$ matches $(1)$ iff $v$ matches $(1)$, and $w$ matches $(3)$ iff $v$ matches $(3)$. But $v\in L$, and $|v|<|w|$, so by hypothesis $v$ matches both $(1)$ and $(3)$, and hence so does $w$. Thus, each of these regular expressions represents $L$.

Can you please explain the sentence: If $w$ begins with $a$, it must begin $aa^c(b\mid c)^$ ? — Belgi, Aug 21 '12 at 00:23
In essence, I changed the alphabet to ${{b, c, ac, aac, aaac, \dots}}$. But now I see it's not quite right since it doesn't include words that end in $a$. So maybe $(a^c+b^+c^)^a^*$. — Gerry Myerson, Aug 21 '12 at 00:32
@GerryMyerson - what do you mean "changed the alphabet" ? did you just wrote words in $L$ and tried to fit a regular expression the 'catch' them all ? — Belgi, Aug 21 '12 at 00:41
I mean I noticed that you couldn't have an $a$ in a word without having it followed by (possibly some more copies of $a$ and then) a $c$, so that's the same as replacing the $a$ in the alphabet with all the ways $a$ could occur - that's the $ac,aac,aaac,\dots$. — Gerry Myerson, Aug 21 '12 at 02:06
@Belgi: Suppose that $w$ begins with $a$ and does not end with $a$. Certainly it can start with any positive number of $a$’s; that gives me $aa^$. Since it doesn’t end with $a$, the next letter must be $c$, and fter that I can have any number of $b$’s and $c$’s in any order: $aa^c(b\mid c)^$. And that’s all that can happen unless* there’s another $a$ later in the word. — Brian M. Scott, Aug 21 '12 at 17:11
In your addition, is anything missed if you use $a^c$ instead of $aa^c$? — Gerry Myerson, Aug 22 '12 at 01:55
@Gerry: Not that I can see, but I was deliberately sticking as close to $(2)$ as I legally could in order to simplify the explanation. — Brian M. Scott, Aug 22 '12 at 02:10

A regular expression for the words that don't contain the sequence $ab$ over $\{a,b,c\}$

1 Answers1

Linked