understanding the convolution in signals and systems

Question

Hi : I've been reading introductions to signals and systems but my background is probability and statistics. In probability, the concept of convolution makes perfect sense to me. If $t$ is a random variable with density $g(t)$ and $g(t) = f(x)h(y)$, then $P(T = t) = \int f(\tau)h(t-\tau) d \tau$. This is because the integral represents all the various ways that $x$ and $y$ can add to become the value $t$. A similar argument holds for the discrete case.

But when I'm reading the various "intros to signals and systems texts", they describe a step by step process where you flip the h function over the y-axis, so that its $h(-\tau)$, add t, and then keep moving it over to the right step by step and calculating the area ( i.e.: the value of the integral at each time $\tau$ ) by calculating the area covered by the overlapped values of $f$ and $h$. This doesn't make sense to me. In other words, is there some analog in signals and systems to the "various ways that $x$ and $y$ can add to become the value $t$ in probability . Clearly, in the signals and systems framework, $t$ is not a random variable so the interpretation has to be totally different. But I'm wondering if there is some way of REALLY understanding why this is done. Maybe there is some physical reason due to what the function $h$ actually represents. They keep referring to it as the impulse response function so maybe my lack of understanding is because I'm not understanding what the impulse response function really represents ? I just don't get why you do the flip over the y-axis thing and then move it slowly to the right step by step and keep figuring out the overlapped areas.

My take is that it's really essential to understand convolution in signals and systems or else you cannot go an further. So I stopped and decided to ask here because every book seems to give the same step by step overlap explanation and I'm continuously stumped by it. I realize it's a lot to ask for an explanation but maybe someone knows of a text that explains the WHY part of the process or possibly relates it to the convolution in probability. Thank you very much for any help, wisdom, references, links etc.

                                                                   Mark

copper.hat · Accepted Answer · 2014-03-28T16:11:33.267

2

It might help to look at a discrete time system.

Suppose you have a linear time-invariant system with 'impulse' response $t \mapsto h_t$, that is, with input $u = 1_{\{0\}}$ (that is, one for $t = 0$ and zero everywhere else).

By linearity, if the input is $u = \sum u_k 1_{\{k\} }$ (that is, $u=(u_0,u_1,...)$), then the output will have the combined responses from each separate $ u_k 1_{\{k\} }$, appropriately delayed.

At time $t$, the input $ u_0 1_{\{0\} }$ will contribute $u_0 h_{t-0}$.

At time $t$, the input $ u_1 1_{\{1\} }$ will contribute $u_1 h_{t-1}$.

At time $t$, the input $ u_k 1_{\{k\} }$ will contribute $u_k h_{t-k}$.

Etc, etc.

Combining gives the response $y_t = \sum h_{t-k} u_k$.

For continuous systems, we can informally think of $u(t) = \int u(\tau) \delta(t-\tau) d \tau$. For a fixed $\tau$, the 'input' $t \mapsto u(\tau) \delta(t-\tau)$ results in a contribution $t \mapsto u(\tau) h(t-\tau)$, hence the total combined response is $y(t) =\int u(\tau) h(t-\tau) d \tau$.

edited Mar 28 '14 at 16:11

answered Mar 28 '14 at 16:02

copper.hat

172,524

Thank you but could you explain for a dummy what $u = \sum u_{k} 1_{k}$ ( that is $u = (u_{o}, u_1, \ldots$ ))actually means physically. I think it might be the physical interpretation of u that's confusing my brain. I'm sorry for being dense. Also, is $u$ here denoted in the textbooks as $x$ because I usually see $\sum x_k h_{t-k}$. Maybe it's the various letters and their meaning that gets me. Thanks. – mark leeds Mar 28 '14 at 16:38
$u$ is the input function, so at time $k=0$ the input is $u_0$, at time $k=1$ the input is $u_1$ ,etc. Is this what you are asking? – copper.hat Mar 28 '14 at 16:41
I think so. I think what's going on is that I don't understand physically what's going on. You have an original signal starting at time $t = 0$. So $u_0$ is the values of the signal at time $t = 0$, $u_1$ is the value of the input signal_ at time $t=1$, $u_2$ is the value of the signal at time t=2. But if we froze everyting at time $t=2$ and looked at a picture of the time axis, then the signal has 3 values. Is the value u_2 put on the x axis at $t= 0$ or $t=2$. I think this where I get confused. Thanks for your patience. You must think I'm a real idiot. – mark leeds Mar 28 '14 at 16:54
Think of a simple discrete time system that consists of a one cycle delay (think of a queue, or similar) of a sequence of real valued inputs. If the input at time $t$ is $u_t$ then the output at time $t+1$ will be $u_t$. In this case, the input is the sequence of input values and the output is the same sequence delayed by one cycle. So if the input is $(1,2,3,4,...)$, the output will be $(,1,2,3,4,...)$, where the $$ represents an unknown initial value. Is this what you are asking? – copper.hat Mar 28 '14 at 17:02
that's a great explanation and makes perfect sense. in the above example, the delay is one time unit. so, in your original derivation, of the convolution formula , what was the delay there ? I think we're getting somewhere. If you could just stick with me, it's really appreciated. and if you send me your email address, I will send you something as thanks. I've been fighting with this for weeks. – mark leeds Mar 28 '14 at 17:22
I am glad to help while I have time (my email is in my bio). Well, in the example two comments above, you have $h_k = \begin{cases} 1, & k=1 \ 0, & \text{otherwise}\end{cases}$. The unit delay is a simple example. The underlying idea is that $t \mapsto h(t)$ describes the response to an impulse ($(1,0,0,...)$ in discrete time, $\delta$ in continuous time). Think of an input to the system as a combination (sum for discrete, integral for continuous) of impulses delayed by different times (for example in the discrete case, $(u_0,u_1,...) = u_0(1,,0,...)+ u_1(0,1,0,0,...)+\cdots$). – copper.hat Mar 28 '14 at 18:51
Then the response will be the sum (or integral) of the (appropriately delayed) impulse response multiplied by the appropriate value of the input at that time. So, $u_0(1,0,0,...)$ will generate a response $u_0(h_0,h_1,h_2,...)$, $u_1(0,1,0,0,...)$ will generate a response $u_1(0,h_0,h_1,...)$, etc, etc, so the eventual response will be $(u_0 h_0, u_0 h_1+u_1 h_0,...)$. (Note that I am assuming that the system is causal in that $h_k = 0 $ for $k<0$. This applies to many systems and simplifies the discussion.) – copper.hat Mar 28 '14 at 18:52
thanks copper.hat. that was beautiful. I think I'm getting it. – mark leeds Mar 28 '14 at 19:50

score 0 · Answer 2 · edited Apr 13 '17 at 12:20

$\def\nR{\mathbb{R}}\def\rmC{{\mathrm C}}\def\rmu{{\rm u}}\def\sS{{\Sigma}}\def\sF{{\mathcal{F}}}\def\l{\left}\def\r{\right}\def\di{d}\def\sinc{\operatorname{sinc}}\def\ltag#1{\tag{#1}\label{#1}}$ We speak of signals as time functions $x:\nR\rightarrow\nR$ (defined on the whole time axis). Actually, they are not just any real functions but must have some nice properties (such as square integrability, else we had difficulties with convolution). We do not go deeper into that here. We just collect all admissible signals into a signal set $\sS$.

A system $S$ maps input signals $x$ uniquely to output signals $y$, we write $y=S(x)$.

We concentrate here on the special class of time-invariant linear systems.

Linear means not only that for two signals $x_1,x_2\in \sS$ and a scalar $\lambda\in\nR$ there holds $S(x_1+\lambda x_2)=S(x_1)+\lambda S(x_2)$ but also if we have a parameterized set of signals $\l(x_\tau\r)_{\tau\in\nR}\in\sS$ and a coefficient function $\lambda(\tau)$ then $$ S\l(\int_\nR \lambda_\tau x_\tau \di\tau\r) = \int_\nR \lambda_\tau S(x_\tau)\di\tau $$

For time invariance we need the notion of shifted time signals. For a given time-shift $\tau\in\nR$ and a signal $x\in\sS$ we denote with $x(\bullet-\tau)$ the shifted signal defined by $x(\bullet-\tau)(t)=x(t-\tau)$ for all $t\in\nR$.

Time invariant systems respond to a shifted input signal with a shifted output signal, i.e., $S(x(\bullet-\tau)) = S(x)(\bullet-\tau)$.

Beside linearity this time invariance is the special property of models for physical systems that makes convolution such a useful tool. And the set of linear time-invariant systems is large. All systems that can be described through linear ordinary and partial differential equations with constant coefficients belong to this set.

Now, suppose we have a family of base functions $\l(\delta(\bullet-\tau)\r)_{\tau\in\nR}$ for our signal space $\sS$ which just differ by a shift.

Just to recap: Base functions of $\sS$ means that we can represent every signal $x\in\sS$ as linear combination \begin{align} \ltag{base} x(t) &= \int_{-\infty}^\infty \lambda_\tau \delta(t-\tau) d\tau \end{align} of the base functions $\delta(\bullet-\tau)$ with a corresponding family of coefficients $\lambda_\tau$ which represent the signal $x$ uniquely.

The nice thing with base functions which just differ by shift is that because of the time-invariance of the considered physical systems $S$ the system responses to these base functions also just differ by a shift: $$ S(\delta(\bullet-\tau))= S(\delta)(\bullet-\tau) $$ Let us denote the system repsonse to our unshifted base function $\delta$ as $g:=S(\delta)$.

To calculate the answer of the linear operator we just need the mapped base functions (just like in linear algebra). \begin{align} y=S(x) &= S\l(\int_\nR \lambda_\tau \delta(\bullet-\tau) d\tau \r)\\ &= \int_\nR\lambda_\tau S(\delta(\bullet-\tau)) d\tau&&\text{ just using linearity}\\ &= \int_\nR\lambda_\tau S(\delta)(\bullet-\tau) d\tau&&\text{ using time invariance}\\ &= \int_\nR\lambda_\tau g(\bullet-\tau) d\tau \end{align} The system response at a certain time $t$ can be computed as \begin{align} \ltag{impulseResponse} y(t) &= \int_\nR \lambda_\tau g(t-\tau)d\tau. \end{align} Now, the only problem remains to find the appropriate base functions. Hm..., we would have to extend our signal space $\sS$ to a space of distributions for that purpose and we would have to define the linear system $S$ on this extended signal space.

We do not want to go into these details here.

Instead we switch to Rieman-Stieltjes integrals in \eqref{base}. That means we use a family of base functions $\bar h(\bullet-\tau)$ (left-continuous, with bounded variation) that only differ in a shift such that we can represent our signal as \begin{align} \ltag{baseStieltjes} x(t) &= \int_{\tau=-\infty}^\infty \lambda_\tau d \bar h(t-\tau). \end{align} with a family of coefficients $\lambda_\tau$ continuously depending on $\tau\in\nR$. Riemann Stieltjes integrals are known from distribution functions in statistics. Especially, discontinuities in the integrator function $h$ result to sum terms \begin{align} x(t) &= \sum_{\scriptsize\begin{matrix}\tau\in\nR\\h(t-(\tau-))\\\neq h(t-(\tau+))\end{matrix}} \lambda_\tau\cdot (\bar h(t-(\tau+))-\bar h(t-(\tau-)) +\\ \ltag{sumStieltjes} &\phantom{= \sum_{\scriptsize\begin{matrix}\tau\in\nR\\h(t-(\tau-))\\\neq h(t-(\tau+))\end{matrix}}\lambda_\tau\cdot \bar h(t-(\tau+)) }+ \int_{\tau=-\infty}^\infty \lambda_\tau (-\bar h'_\rmC(t-\tau))d\tau. \end{align} where $h_\rmC$ is the continuous part of $h$, i.e., \begin{align} \bar h_\rmC(t)&=\bar h(t)-\sum_{\scriptsize\begin{matrix}\tau\leq t\\h(\tau-)\\\neq h(\tau+)\end{matrix}}\bar h(\tau+)-\bar h(\tau-). \end{align} Note, that the differentiation of $\bar h_\rmC$ in \label{sumStieltjes} is carried out w.r.t. $\tau$. This explains the negative sign.

With the help of \label{sumStieltjes} it is easy to construct nice base functions. We just choose $\bar h'_\rmC=0$, i.e. a piecewise constant function and let $h$ have one jump of height -1 from 0 to -1 for $\tau=t$. Such a signal would be the negative Heaviside function \begin{align} \bar h(t) &=\l\{\begin{matrix} 0&\text{ for }t<0\\ -1&\text{ for }t\geq 0 \end{matrix} \r\} = -h(t) \end{align} with the Heaviside function $h$.

This way we get just $\lambda_\tau = x(\tau)$ and the representation \label{baseStieltjes} reads as \begin{align} x(t) =\int_{\tau=-\infty}^\infty x(\tau)\cdot d\bar h(t-\tau). \end{align} Applying the linear time invariant system to this signal means \begin{align} y(t) &= S\l(\int_{\tau=-\infty}^{\infty} x(\tau)d\bar h(\bullet-\tau)\r)(t)\\ &=\int_{\tau=-\infty}^{\infty} x(\tau)d S\l(\bar h(\bullet-\tau)\r)(t)&&\text{ using linearity}\\ &=\int_{\tau=-\infty}^{\infty} x(\tau)d S\l(\bar h\r)(t-\tau)&&\text{ using time invariance} \end{align} We have again one signal $\bar G:= S(\bar h)$ that characterizses the system completely.

Often the system has smoothing properties such that $\bar G$ is continuous even with the discontinuous input $\bar h$.

In that case one can derive the integrator function in the Riemann-Stieltjes integral w.r.t. $\tau$ and obtains \begin{align} y(t) &= \int_{\tau=-\infty}^\infty x(\tau) (-\bar G'(t-\tau))d\tau. \end{align} There is the connection to \label{impulseResponse}. We can choose \begin{align} \lambda_\tau &= x(\tau)\\ g(t) &=-\bar G'(t)\\ &= - S(-h)'(t) = S(h)'(t) \end{align} and write \begin{align} y(t) &= \int_{\tau=-\infty}^\infty x(\tau)g(t-\tau)d\tau. \end{align}

Mark Leeds asked some good questions in the comments to this Answer which I want to address here:

At the point where you introduce the stieltjes integral, what was the integral called before that point? I know reimann and lebesgue but that's it.
If you had to give intuition for why it's the second term is g(t−τ) rather than just g(t), what would it be?

Answer to 1.: One could at first try Lebesgue integrals. But, pityingly there is no family of Lebesgue integrable base functions $\delta(\bullet-\tau)$ $(\tau\in\nR)$ which only differ by shifts. The problem is that the convolution integral $y(t)=\int_{-\infty}^{+\infty}x(\tau)\delta(t-\tau)d\tau$ "smears" the support of the integrands. This leads relatively fast to the conclusion that the base functions $\delta(\bullet-\tau)$ should have a single point as support to satisfy $y(t)=x(t)$. This in turn would mean that $\delta$ would be the zero function in the Lebesgue sense. Therefore, there are no such base functions in $L^2$. I have shown one relatively simple way around this difficulty with Stieltjes-Riemann integrals.

Another general way would be to start with the space of piecewise smooth L2-functions which form a ring with the convolution product as multiplication. The algebraic structure of this set is like that one of the ring of integers. This ring can be extended to a field. This method is known from the extension of the ring of integers to the field rational numbers). The unit element in that field is called Dirac-delta distribution. You find this approach in the book Differential Algebraic Equations of Kunkel and Mehrmann.

A further alternative would be to reduce the signal space to the space of frequency-limited signals. This way even works with classical integrals. I think you know the Fourier transformation $X(\omega) = \sF(x)(\omega) = \int_{-\infty}^\infty x(t) e^{-i\omega t} dt$ from statistics or at least from the video. Furthermore, you know that convolution in the time-domain $(x*y)(t)=\int_{\tau=-\infty}^\infty x(\tau)y(t-\tau)d\tau$ translates to multiplication $\sF(x*y)(\omega) = X(\omega)\cdot Y(\omega)$ in the frequency domain.

We are looking for the decomposition of signals $x$ in linear combinations $x(t) = \int_{-\infty}^\infty x(\tau) \delta(t-\tau) d\tau = (x*\delta)(t)$.

That means that $\delta$ should retain the signal under convolution. In the frequency domain the spectrum $\Delta:=\sF(\delta)$ of $\delta$ should retain the spectrum $X$ of the signal $x$ under multiplication. If our signal space $\sS$ contains signals $x$ whose spectra $X$ have unbounded support this implies that the spectrum of $\Delta$ must be the unit function, i.e., $\Delta(\omega)=1$ for all $\omega\in\nR$. But, if there exists an upper limit frequency $\omega_\rmu>0$ such that the spectra $X$ for all signals $x\in\sS$ are zero outside of $[-\omega_\rmu,\omega_\rmu]$ then $\Delta(\omega)$ must only be 1 for $\omega\in[-\omega_\rmu,\omega_\rmu]$. We can construct an appropriate generator for our base as inverse Fourier transformed of \begin{align} \Delta(\omega) &= \begin{cases} 1 &\text{ for } \omega \in [-\omega_\rmu,\omega_\rmu]\\ 0 &\text{ else } \end{cases} \end{align}

\begin{align} \delta(t) &:= \frac1{2\pi}\int_{-\infty}^\infty \Delta(\omega)e^{i\omega t} d\omega\\ &= \frac1{2\pi}\int_{-\omega_\rmu}^{\omega_\rmu} e^{i\omega t} d\omega\\ &= \begin{cases} \frac{\omega_\rmu}{\pi} &\text{ for } t=0\\ \frac1{t\pi} \sin(\omega_\rmu t)&\text{ for } t\neq 0 \end{cases}\\ &= \frac{\omega_\rmu}{\pi} \sinc(\omega_\rmu t) \end{align} with the sinc function.

So, for frequency-bounded signals you can use normal Lebesgue integration and you have an explicit representation of the base functions $\delta(\bullet-\tau) = \frac{\omega_\rmu}{\pi} \sinc(\omega_\rmu (\bullet-\tau))$.

Pityingly, this is a rather restricted signal space only suited for educational purposes. You do not have a step signal since this is not frequency limited and not $L^2$. Furthermore, all those signals are acausal since their support is unlimited in both directions.

Note, that for increasing upper frequency bound $\omega_\rmu$ the sequence $\delta_{\omega_\rmu}$ $(\omega_\rmu\rightarrow\infty)$ represents in the limit the Dirac delta distribution. This limit is one way to understand the Dirac delta distribution.

Answer to 2.: The intuitive picture is already contained in the text. The LTI system responses to a time-shifted input signal by a time-shifted output signal. That is the reason why we represent our signal $x$ as superposition of time-shifted basis signals $x(t)=\int_{\tau=-\infty}^\infty x(\tau)\delta(t-\tau) d\tau.$ Under the integral only $\delta$ is dependent on the real time $t$ and $x$ only appears as a weighting function which ist "indexed" through $\tau$. Thus, the system $S$ only acts on the shifted time signals $\delta(t-\tau)$ with the shifted system response $g(t-\tau)$. One obtains the system response on the input signal $x$ as superposition $y(t) = S(x)(t) = \int_{-\infty}^\infty x(\tau)g(t-\tau) d\tau$ of the system responses $g(t-\tau)$ on the shifted $\delta(t-\tau)$. Notably, the impulse response starting at time $\tau$ is weighted with the value $x(\tau)$ of the input signal at time $\tau$. It contributes to the output $y(t)$ at time $t$ with the $x(\tau)$-weighted impulse response of the system after the elapsed time $t-\tau$ of the time interval from $\tau$ to $t$.

You compose your input signal as a sequence of shifted "special signals" and look what the system delivers for the shifted special signals after the elapsed time from the start time of the special signal to the current time.

I do not know why only the first label is set as hyperlink:-(. — Tobias, Mar 28 '14 at 21:28
Thank you Tobias. I will print out and read carefully. a nice link for people who are confused like me is given below. http://ocw.mit.edu/courses/mathematics/18-03-differential-equations-spring-2010/video-lectures/lecture-21-convolution-formula/ — mark leeds, Mar 29 '14 at 00:02
@markleeds Is the text readable? It would be nice if you could show me the points that are difficult to understand. — Tobias, Apr 01 '14 at 12:56
Hi Tobias: Thank you for writing what you wrote. I read it once and found it pretty difficult. But let me read it again carefully and figure out where I get confused and let you know. But your efforts are appreciated. With some other people's help off line, i think that I'm starting to understand things intuitively. But the theory that you wrote is tough for me. I'll give it another read and get back to you. thanks again. — mark leeds, Apr 02 '14 at 13:37
Hi Tobias: I think that I mostly understand what you're doing. You build up to the the equation by showing that the function S can be applied to the derivative of the heavside function and dS(h) ends up being the g function in the equation we were trying to derive. I understood most of it except for a couple of things. 1) at the point where you introduce the stieltjes integral, what was the integral called before that point ? I'm I know reimann and lebesgue but that's it. 2) if you had to give intuition for why it's the second term is $g(t-\tau)$ rather than just $g(t)$, what would it be ? — mark leeds, Apr 04 '14 at 05:44
@markleeds: I have included the answers to your last comment at the end of the main answer above. I think especially the explicit representation of base functions $\delta$ for frequency limited signals should make you happy. There you can work with Lebesgue integrals which you are used to. I am a bit in trouble with your second question since from my point of view the stuff I wrote on the first ride is already very intuitive to me. I tried to formulate it again in other words. — Tobias, Apr 05 '14 at 15:25
wow tobias. I appreciate your persistence. I'll read it carefully again, probably many times and let you know. I don't know what else to say except thank you. — mark leeds, Apr 06 '14 at 18:20

understanding the convolution in signals and systems

2 Answers2

Linked