0

I want to write a program to analyse sounds – voice, music, etc. I have a good understanding of how the Fourier transform can break a sound into frequencies, amplitudes and phases but I don't know what a good sample size would be. I believe one of the tricks to the Fast Fourier Transform is picking sample sizes that are exact powers of two.

The Wikipedia article on human hearing says that hearing is normally stated to be between 20 Hz and 20 000 Hz, although it may go as low as 12 Hz. If I record sound at 44 100 samples per second (CD sample rate) this gives me a wavelength of 2205 — 3675 samples at 20 and 12 Hz respectively. Does this mean that 4096 ($2^{12}$) would be a good sample size to analyse voice and music?

On the other hand, this article on speech perception states that we can hear up to 30 phonemes per second, although some may be in parallel. I remember reading at some point what the duration of the shortest phoneme is and that it is supposed to relate to lowest sound we can hear, but I can't find this information at the moment. I imagine it would be useful to have a sample size small enough to fit at least twice into any phoneme.

CJ Dennis
  • 654
  • do you know the wavelet transform, and more generally the constant-Q transform ? it is what graphic equalizers typically do : higher bitrate and bandwith in the highest frequency bands. and when computing the STFT for a 44Khz .wav signal, we typically choose a 2^10,2^11,2^12 or 2^13 (hanning or hamming) window size. and the STFT/wavelet/constant-Q transform all are special cases of "filter-bank transforms" – reuns Mar 01 '16 at 11:01
  • @user1952009 I'm not familiar with the wavelet transform or the constant-Q transform. I've never implemented a Fast Fourier Transform before, although I've had good success with a very slow method. Are you saying that you use different sample windows for different frequencies? So a large window for low frequencies and a small window for high frequencies? – CJ Dennis Mar 01 '16 at 11:11
  • no, but make your own experiments, you'll see. I'm just saying that the STFT (I didn't speak about the FFT) is a filter-bank whose output has been downsampled by a factor still allowing perfect reconstruction – reuns Mar 01 '16 at 11:22
  • and if you really want to write such a program, you have to download and play with matlab ($\ge$ 2007) – reuns Mar 01 '16 at 11:26

0 Answers0