Speech Decomposition

The idea is to approximate .wav files with sets of sine waves. Then we use the sine wave coefficients as basis functions to describe and recognize speech contained in new .wav files.

Math summary as a MATLAB script (I’ll be using Octave):

clear;clc
xdata=1:0.1:10; % X or Independant Data
ydata=sin(xdata+0.2)+0.5*sin(0.3*xdata+0.3)+ 2*sin( 0.2*xdata+23 )+...
    0.7*sin( 0.34*xdata+12 )+.76*sin( .23*xdata+.3 )+.98*sin(.76 *xdata+.56 )+...
    +.34*sin( .87*xdata+.123 )+.234*sin(.234 *xdata+23 ); % Y or Dependant data 
x0 = randn(36,1);  % Initial Guess
fun = @(x,xdata)x(1)*sin(x(2)*xdata+x(3))+... 
                x(4)*sin(x(5)*xdata+x(6))+...
                x(7)*sin(x(8)*xdata+x(9))+...
              x(10)*sin(x(11)*xdata+x(12))+...
              x(13)*sin(x(14)*xdata+x(15))+...
              x(16)*sin(x(17)*xdata+x(18))+...
              x(19)*sin(x(20)*xdata+x(21))+...
              x(22)*sin(x(23)*xdata+x(24))+...
              x(25)*sin(x(26)*xdata+x(27))+...
              x(28)*sin(x(29)*xdata+x(30))+...
              x(31)*sin(x(32)*xdata+x(33))+...
              x(34)*sin(x(35)*xdata+x(36)); % Goal function which is Sum of 12 sines
options = optimoptions('lsqcurvefit','Algorithm','trust-region-reflective');% Options for fitting 
x=lsqcurvefit(fun,x0,xdata,ydata) % the main instruction
times = linspace(xdata(1),xdata(end));
plot(xdata,ydata,'ko',times,fun(x,times),'r-')
legend('Data','Fitted Sum of 12 Sines')
title('Data and Fitted Curve')

To run this in Octave requires the octave-dev and the Optimization package.