4

How could I/is it possible to take a fourier transform of text? i.e. What domain would/does text exist in? Any help would be great.

NOTE: I do not mean text as an image. I understand it's value, but I'm wondering if it is possible to map text to some domain and transform text on the basis of letters. This is in hopes of performing frequency filtering on said text.

eatonphil
  • 151
  • 4
    you can treat the text body as discrete time discrete signal (e.g. ascii values of the letters or some other char -> int mapping). Discrete Fourier transform can follow. I'm not sure what kind of a meaningful filtering can be applied though. – karakfa Jun 17 '13 at 17:31
  • I'm curious if there is any transform that would result in some kind of useful representation... Could you figure out the language of the text or something maybe? – Hobblin Oct 18 '13 at 08:28
  • I just stumbled upon the idea of using Fourier transforms to compare text. For instance, if we give a discrete value to each word (I think that would be an easy mapping), then apply a Fourier transform, we get a value that can be easily compared (as in "similarity" or "distance" calculation) to the transforms of other text paragraphs. Do you know of any forum where I could take this idea and get help implementing? Is this Stack Exchange a good place to ask a question like this? – Esteban Jul 28 '16 at 17:23

3 Answers3

1

You could take the text as a 2-D image and use a 2-D Fourier transform. This could be useful e.g. to find the orientation of the text and subsequently - if necessary - apply an appropriate rotation, which makes it easier for text recognition methods to give satisfactory results.

Matt L.
  • 10,636
  • Nope, I don't mean as a picture. I'd like to perform frequency filtering on text. Actual text. I imagine I'd have to pseudo map the text characters linearly as a function of time and transform that? – eatonphil Jun 17 '13 at 17:17
  • What do you mean by 'frequency filtering on text'? What type of information do you want to derive from the text? – Matt L. Jun 17 '13 at 17:19
  • I'm not entirely sure, but it was a question posed in class and I'm trying to find an answer. Similar to how a low pass frequency filter cuts out noise in an image or recording, I'd like to see the effect of that on text... If it's possible. – eatonphil Jun 17 '13 at 17:20
  • 3
    Sorry, but I can't seem to make sense out of this. It's important to know what type of information one is interested in, otherwise you just get some random results. – Matt L. Jun 17 '13 at 17:30
1

I had a similar idea last night when I was trying to explain the concept of FFTs for fundamental analysis and synthesis of sounds to someone, and the analogy that popped into my head was of the prevalence of lowercase letters, uppercase letters, and punctuation in a page of text corresponding to signals that occur with high, medium and low frequency.

I haven't tried this yet, but I was thinking of converting the symbols to numbers (using their ASCII values might be enough) and feeding the resulting sequence into an FFT analysis to see if a paragraph of text could be decomposed into the sum of a reasonably finite series of sine waves such that the list of coefficients would be smaller than the original text.

I don't think it would have any meaning as such; it certainly wouldn't be useful to count word frequencies or to synthesize texts, but it's a very interesting question!

0

Not quite in the frequency domain, but there is a way to look for periodic structures in text -- the Index of Coincidence. For normal text the IoC will be pretty much flat. But for text encrypted with, say, an 8-letter key and the Vigenère cipher, the IoC will show a pattern of 7 low values and a spike every 8th.

That tells you to take every 8th character and look for a key for that position, then try the next position. This also works with XOR and other ciphers.

Seth
  • 130