What is the right terminology / methods for this standard signal processing task?

Question

Say we have a signal s = [1,2,2,1,3,3,2,2,1,1] or similar (only 1,2,3 can occur). I want to look for a particular pattern, e.g. [1,3,3,2], which can occur anywhere (0 or many times) in the signal. I would like to perform the computation as a convolution with some filter f.

I've convinced myself that f = [100,1000,1000,10] will do the job, i.e. sending the pattern p = [1,3,3,2] to f = 10.^fliplr(p), in matlab syntax. That is, the convolution s*f will take the value 1*10 + 3*1000 + 3*1000 + 2*100 = 6210 if and only if we have the pattern p in the right place, at least as long as the pattern we are looking for is short enough.

For a long enough pattern, e.g. [1,1,1,1,1,1,1,1,1,1,2] (ten 1s followed by a 2), the corresponding filter f = [100,10,..,10] would give a "false positive" for the pattern [2,2,2,2,2,2,2,2,2,2,1], since the convolution would give the values 10*10 + 2*100 = 300 for the first, and 2*10*10 + 1*100 = 300 for the second. (That I needed 10 1s to "trick" the filter is obviously related to the fact that my filter "encodings" of the 1s and 2s differ by a factor 10. Changing the encoding to e.g. {1,2,3} -> {10,1000,100000} should let me handle correspondingly longer patterns).

I have a strong feeling that this is a very standard signal processing task. What is the standard terminology and methods concerning this problem? (I.e. what key words should I google.)

"I would like to perform the computation as a convolution with some filter f" why? — msm, Apr 18 '17 at 11:13
The reason is that I'm in matlab so I want to avoid for-loops. I thought of convolutions as an operation which is pretty fast and "vectorises" the computations, if I can just choose the appropriate filter. — svangen, Apr 18 '17 at 11:25
You can use Matlab with other algorithms as well. Look for "pattern matching", "string searching", "sequence alignment", or similar things. — msm, Apr 18 '17 at 11:30

Andreas H. · Accepted Answer · 2017-04-18T11:19:00.647

-1

The thing you are looking for is the matched filter, which is essentially the (optimal) linear approach to look for a known pattern in a signal (which may be impacted by additive zero mean noise).

The impulse response (i.e., the sequence you are convolving with) of the matched filter is just the pattern reversed, i.e. $f = [2, 3, 3, 1]$ will do the job. If the input signal is normalized in RMS amplitude, the matthed filter will give maximum output if and only if the pattern occurs.

edited Apr 18 '17 at 11:19

answered Apr 18 '17 at 10:47

Andreas H.

1,525

Say that I'm looking for the pattern p = [1,1,1,3], so convolve with f = [3,1,1,1]. Then I get the value 12 both for the correct pattern, and the "false positive" [3,3,3,1], right? What am I missing here – svangen Apr 18 '17 at 11:03
Yes, this is true. But only because [3,3,3,1] is not normalized in amplitude. Its RMS (root mean square) value is higher than the one for [1,1,1,3]. This needs to be taken into account. So you need to determine the RMS value for each 4 element bin of the input and take this into account when calculating the threshold after matched filter. – Andreas H. Apr 18 '17 at 11:13
Actually, I find your approach of convolving with $[1, M, M^2, M^3, ...]$ quite smart. I would prefer it if it is known that the input data is between 0 and M and contains no noise. Then it gives the unique answers. However it does not work in presence of noise or if there is negative data. – Andreas H. Apr 18 '17 at 11:16
On the other hand, if you want to search for a pattern in data that contains no noise, a filter approach may not be the most efficient in the first place. – Andreas H. Apr 18 '17 at 11:17
yeah, point taken. Thanks! – svangen Apr 18 '17 at 11:26

Stelios · Answer 2 · 2017-04-18T11:46:03.897

Assuming that the pattern $x$ you are looking for consists of $L$ samples (in your example, $x \in \{1,2,3\}^L$), the filtering operation with a filter of length $L$ essentially performs a weighted average of all length-$L$ sequences of the incoming signal with a weighting vector $w \in \mathbb{R}^L$. Therefore, if you want perfect detection, the following condition should hold $$ w^T x \neq w^T y, \text{ for all } y \in \{1,2,3\}^L, y \neq x. $$

The above set of equation can be compactly written as $$ A^Tw=b $$ where $A$ is an $L\times (3^L-1)$ matrix with columns equal to $x-y$, for all $y \in \{1,2,3\}^L, y \neq x$, and $b \in \mathbb{R}^{3^L-1}$ is a column vector with non-zero elements.

Your problem, therefore, boils down to finding such a $w$ (and $b$). Unfortunately, I am not aware of a systematic way to do this (someone else might). You could also try asking the signal processing SE.

What is the right terminology / methods for this standard signal processing task?

2 Answers2