I am carrying out analysis on a corpus of data and I am currently investigating the frequency of words appearing in that corpus.
What I am looking for is a function which penalises large and small values so that, instead of a graph of decreasing values as words become more infrequent, I will be left with an approximation of a bell shaped curve.
Any help would be greatly appreciated.
Patrick
