2

Let B be the language over alphabet {a, ... z} consisting of those words occuring in the Bible. Thus, B = {in,the,beginning,god,created,...}.

Describe an NFA whose language is B. Describe what happens when you apply subset construction to this NFA to get a DFA, ignoring unreachable states. Roughly, how do the number of states in each compare? How does each number compare to the # of distinct words in the Bible?


Of course, this is homework so I'm not asking for answers so much as a bit of a shove in the right direction. I'm pretty thoroughly stumped here.

I'm picturing the NFA as a series of letters that go down an ordered path in the same order that the letters show up in the Bible. Which not only seems really impractical but a little too stupid of an answer for it to be correct. What would an NFA of this even look like? Would it really be huge?

Thanks for any help.

Rich
  • 21
  • To make an automaton that recognize one word is easy. If you merge the initial state of two of this automaton you get one that recognize 2 words. sot to get your automaton you can just merge all the initial state of all the automaton for all word. It will give you a big star shaped automaton. Not that big if you have $n$ word of size $l$ you will have $n.l+1$ states – wece Apr 22 '15 at 19:43

1 Answers1

2

According to this page, there are 788,258 words in the Bible but only 14,565 distinct words. Now, it is shown in [1] that a full Scrabble lexicon with 94,240 entries can be represented be a DFA with 19,853 states. So you can expect the DFA of the Bible to have roughly $19,853 \times \frac{14,565}{94,240} \approx 3068$ states.

[1] A. W. Appel et G. J. Jacobson, The world fastest scrabble program, Commun. ACM 31 (1988), 572–585.

J.-E. Pin
  • 40,163