I want to write a program to analyse sounds – voice, music, etc. I have a good understanding of how the Fourier transform can break a sound into frequencies, amplitudes and phases but I don't know what a good sample size would be. I believe one of the tricks to the Fast Fourier Transform is picking sample sizes that are exact powers of two.
The Wikipedia article on human hearing says that hearing is normally stated to be between 20 Hz and 20 000 Hz, although it may go as low as 12 Hz. If I record sound at 44 100 samples per second (CD sample rate) this gives me a wavelength of 2205 — 3675 samples at 20 and 12 Hz respectively. Does this mean that 4096 ($2^{12}$) would be a good sample size to analyse voice and music?
On the other hand, this article on speech perception states that we can hear up to 30 phonemes per second, although some may be in parallel. I remember reading at some point what the duration of the shortest phoneme is and that it is supposed to relate to lowest sound we can hear, but I can't find this information at the moment. I imagine it would be useful to have a sample size small enough to fit at least twice into any phoneme.