# GSM receiver blocks: least-squares channel estimation part 3: thinking about model mismatch

Let’s look at the case when the actual channel is slightly longer than the model. If the channel is **much** longer than the model, then the model will be unable to capture most of the channel’s properties, and the channel estimation will probably be nonsense. But if the channel is only *slightly* longer than the model, then the model will be able to capture most of the channel’s information (with some inherent inaccuracy). In this case, the channel estimation might be good enough for practical purposes.

For a practical example, let’s say that we have a training sequence of length 10, and an assumed channel of length 4. As usual, the training sequence is sandwiched between unknown data symbols.

We call the training sequence symbols \(TS = t_{1}, \cdots, t_{10}\), the data symbols (presumed unknown here) \(d_{i}\) (for \(i<1\) or \(i>10\)), and the channel coefficients \(chan = c_1,\cdots,c_4\). The channel is assumed to be causal, so the channel output is given by convolving the channel coefficients with the transmitted symbols \(d_{-1}, d_{0}, t_{1}, \cdots, t_{10}, d_{11}, d_{12}\).

## observation model and least squares #

The observation model is a convolution of the signal against the channel and looks like this (we draw the matrices aligned that way to make it clear how the multiplication works): \[\begin{align*} & \begin{bmatrix} c_{1}\\ c_{2}\\ c_{3}\\ c_{4} \end{bmatrix} & \\ \begin{bmatrix} t_{1} & d_{0} & d_{-1} & d_{-2} \\ t_{2} & d_{1} & d_{0} & d_{-1} \\ t_{3} & t_{2} & t_{1} & d_{0} \\ \hdashline t_{4} & t_{3} & t_{2} & t_{1} \\ t_{5} & t_{4} & t_{3} & t_{2} \\ t_{6} & t_{5} & t_{4} & t_{3} \\ t_{7} & t_{6} & t_{5} & t_{4} \\ t_{8} & t_{7} & t_{6} & t_{5} \\ t_{9} & t_{8} & t_{7} & t_{6} \\ t_{10} & t_{9} & t_{8} & t_{7} \\ \hdashline d_{11} & t_{10} & t_{9} & t_{8} \\ d_{12} & d_{11} & t_{10} & t_{9} \\ d_{13} & d_{12} & d_{11} & t_{10} \\ \end{bmatrix} \ast&& = \begin{bmatrix} {r_{2}} \\ {r_{3}} \\ {r_{4}} \\ \hdashline {r_{5}} \\ {r_{6}} \\ {r_{7}} \\ {r_{8}} \\ {r_{9}} \\ {r_{10}} \\ {r_{11}} \\ \hdashline {r_{12}} \\ {r_{13}} \\ {r_{14}} \\ \end{bmatrix} \end{align*}\]

Everything above and below the dashed lines is influenced by unknown data symbols, so we focus on the middle section of the matrix and the corresponding produced received symbols.

We notice that the least squares process “tries to” estimate the received symbols as a linear combination of the four columns of this matrix:

\[\begin{bmatrix} t_{4} & t_{3} & t_{2} & t_{1} \\ t_{5} & t_{4} & t_{3} & t_{2} \\ t_{6} & t_{5} & t_{4} & t_{3} \\ t_{7} & t_{6} & t_{5} & t_{4} \\ t_{8} & t_{7} & t_{6} & t_{5} \\ t_{9} & t_{8} & t_{7} & t_{6} \\ t_{10} & t_{9} & t_{8} & t_{7} \\ \end{bmatrix}\]

The least squares process will try to find coefficients \([\hat{c}_{1}, \hat{c}_{2}, \hat{c}_{3}, \hat{c}_{4}]\) such that the received symbols are well approximated by the linear combination of the four columns. The first received symbol not affected by unknown data – the first received symbol we ingest for the least-squares – is \(r_5 = c_1 \cdot t_4 + c_2 \cdot t_3 + c_3 \cdot t_2 + c_4 \cdot t_1\), and this makes sense since the first row of the observation matrix is \([t_4, t_3, t_2, t_1]\).

## more channel coefficients, more problems #

If we add an extra channel coefficient \(c_5\) , then the observation model looks like this. We bold the \(\mathbf{d_{0}}\) in the fifth column since it is between the dashed lines but is an unknown data symbol. The red terms are the terms added by the extra channel coefficient.

\[\begin{align*} & \begin{bmatrix} c_{1}\\ c_{2}\\ c_{3}\\ c_{4} \\ \color{red}c_{5} \end{bmatrix} & \\ \begin{bmatrix} t_{1} & d_{0} & d_{-1} & d_{-2} & \color{red} d_{-3} \\ t_{2} & d_{1} & d_{0} & d_{-1} & \color{red} d_{-2} \\ t_{3} & t_{2} & t_{1} & d_{0} & \color{red} d_{-1}\\ \hdashline t_{4} & t_{3} & t_{2} & t_{1} & \color{red} \mathbf{d_{0}}\\ t_{5} & t_{4} & t_{3} & t_{2} & \color{red} t_{1} \\ t_{6} & t_{5} & t_{4} & t_{3} & \color{red} t_{2}\\ t_{7} & t_{6} & t_{5} & t_{4} & \color{red} t_{3}\\ t_{8} & t_{7} & t_{6} & t_{5} & \color{red}t_{4}\\ t_{9} & t_{8} & t_{7} & t_{6} & \color{red}t_{5}\\ t_{10} & t_{9} & t_{8} & t_{7} & \color{red}t_{6}\\ \hdashline d_{11} & t_{10} & t_{9} & t_{8} & \color{red}t_{7}\\ d_{12} & d_{11} & t_{10} & t_{9} & \color{red}t_{8}\\ d_{13} & d_{12} & d_{11} & t_{10} & \color{red}t_{9}\\ \end{bmatrix} \ast&& = \begin{bmatrix} {r_{2}} \\ {r_{3}} \\ {r_{4}} \\ \hdashline {r_{5}} \\ {r_{6}} \\ {r_{7}} \\ {r_{8}} \\ {r_{9}} \\ {r_{10}} \\ {r_{11}} \\ \hdashline {r_{12}} \\ {r_{13}} \\ {r_{14}} \\ \end{bmatrix} + c_{5} \begin{bmatrix} \color{red}d_{-3} \\ \color{red}d_{-2} \\ \color{red}d_{-1}\\ \hdashline \color{red}\mathbf{d_{0}}\\ \color{red}t_{1} \\ \color{red}t_{2}\\ \color{red}t_{3}\\ \color{red}t_{4}\\ \color{red}t_{5}\\ \color{red}t_{6}\\ \hdashline \color{red}t_{7}\\ \color{red}t_{8}\\ \color{red}t_{9} \end{bmatrix} \end{align*}\]

If \(c_5\) is significantly less than the other coefficients, we can treat its contribution (the new vector on the right) to the received symbols like noise (and hope that it won’t mess up the estimation too much).

If the energy is distributed pretty evenly across all five channel coefficients, we can’t do this, since it’ll significantly worsen the fit. However, if we have the opposite case – \(c_1\) significantly less than the other coefficients – we can do something similar to the previous case. We can treat the *first* column of the observation matrix as noise, and try to estimate the other four columns.

## it’s all off-by-ones? always was! #

Here’s the observation model if we’re trying to treat \(c_1\) as negligible:

\[\begin{align*} & \begin{bmatrix} \color{red}c_{1}\\ c_{2}\\ c_{3}\\ c_{4} \\ c_{5} \end{bmatrix} & \\ \left[ \begin{array}{c|cccc} \color{red}t_{1} & d_{0} & d_{-1} & d_{-2} & d_{-3} \\ \color{red}t_{2} & d_{1} & d_{0} & d_{-1} & d_{-2} \\ \color{red}t_{3} & t_{2} & t_{1} & d_{0} & d_{-1}\\ \hdashline \color{red}t_{4} & t_{3} & t_{2} & t_{1} & \mathbf{d_{0}}\\ \color{red}t_{5} & t_{4} & t_{3} & t_{2} & t_{1} \\ \color{red}t_{6} & t_{5} & t_{4} & t_{3} & t_{2}\\ \color{red}t_{7} & t_{6} & t_{5} & t_{4} & t_{3}\\ \color{red}t_{8} & t_{7} & t_{6} & t_{5} & t_{4}\\ \color{red}t_{9} & t_{8} & t_{7} & t_{6} & t_{5}\\ \color{red}t_{10} & t_{9} & t_{8} & t_{7} & t_{6}\\ \hdashline \color{red}d_{11} & t_{10} & t_{9} & t_{8} & t_{7}\\ \color{red}d_{12} & d_{11} & t_{10} & t_{9} & t_{8}\\ \color{red}d_{13} & d_{12} & d_{11} & t_{10} & t_{9}\\ \end{array}\right] \ast&& = c_{1} \begin{bmatrix} \color{red}t_{1} \\ \color{red}t_{2} \\ \color{red}t_{3} \\ \hdashline \color{red}t_{4} \\ \color{red}t_{5} \\ \color{red}t_{6} \\ \color{red}t_{7} \\ \color{red}t_{8} \\ \color{red}t_{9} \\ \color{red}t_{10} \\ \hdashline \color{red}d_{11} \\ \color{red}d_{12} \\ \color{red}d_{13} \\ \end{bmatrix} + \begin{bmatrix} {r_{2}} \\ {r_{3}} \\ {r_{4}} \\ \hdashline {r_{5}} \\ {r_{6}} \\ {r_{7}} \\ {r_{8}} \\ {r_{9}} \\ {r_{10}} \\ {r_{11}} \\ \hdashline {r_{12}} \\ {r_{13}} \\ {r_{14}} \\ \end{bmatrix} \end{align*}\]

However, we need to be careful with indices when we do this! Note that if we ignore the stuff in red, the submatrix in between the horizontal dashed lines is **not** the same column vectors as previously! This will cause trouble! In order to get the **same column vectors** (those only containing known training symbols and spanning the maximum possible length) as before *and* have them multiplied with the correct subset of the channel \([c_2,\cdots,c_5]\), we need to shift the indices by one.

Indeed, to have a least-squares that generates a sensible output, the first received symbol we use needs to be \(\textrm{noise} + c_2 \cdot t_4 + c_3 \cdot t_3 + c_4 \cdot t_2 + c_5 \cdot t_1\) (with the “noise” being the contribution from the first channel tap, which we are assuming is negligible). We need this since the first row of the least-squares matrix *still* is \([t_4, t_3, t_2, t_1]\). Looking at the indices of this desired received symbol, we see that this would be \(r_6\), not \(r_5\) in the previous case.

To write the correct observation model (which highlights the matrix we’ll use for least-squares), we simply move the horizontal dashed lines one row down!

\[\begin{align*} & \begin{bmatrix} \color{red}c_{1}\\ c_{2}\\ c_{3}\\ c_{4} \\ c_{5} \end{bmatrix} & \\ \left[ \begin{array}{c|cccc} \color{red}t_{1} & d_{0} & d_{-1} & d_{-2} & d_{-3} \\ \color{red}t_{2} & d_{1} & d_{0} & d_{-1} & d_{-2} \\ \color{red}t_{3} & t_{2} & t_{1} & d_{0} & d_{-1}\\ \color{red}t_{4} & t_{3} & t_{2} & t_{1} & \mathbf{d_{0}}\\ \hdashline \color{red}t_{5} & t_{4} & t_{3} & t_{2} & t_{1} \\ \color{red}t_{6} & t_{5} & t_{4} & t_{3} & t_{2}\\ \color{red}t_{7} & t_{6} & t_{5} & t_{4} & t_{3}\\ \color{red}t_{8} & t_{7} & t_{6} & t_{5} & t_{4}\\ \color{red}t_{9} & t_{8} & t_{7} & t_{6} & t_{5}\\ \color{red}t_{10} & t_{9} & t_{8} & t_{7} & t_{6}\\ \color{red}d_{11} & t_{10} & t_{9} & t_{8} & t_{7}\\ \hdashline \color{red}d_{12} & d_{11} & t_{10} & t_{9} & t_{8}\\ \color{red}d_{13} & d_{12} & d_{11} & t_{10} & t_{9}\\ \end{array}\right] \ast&& = c_{1} \begin{bmatrix} \color{red}t_{1} \\ \color{red}t_{2} \\ \color{red}t_{3} \\ \color{red}t_{4} \\ \hdashline \color{red}t_{5} \\ \color{red}t_{6} \\ \color{red}t_{7} \\ \color{red}t_{8} \\ \color{red}t_{9} \\ \color{red}t_{10} \\ \color{red}d_{11} \\ \hdashline \color{red}d_{12} \\ \color{red}d_{13} \\ \end{bmatrix} + \begin{bmatrix} {r_{2}} \\ {r_{3}} \\ {r_{4}} \\ {r_{5}} \\ \hdashline {r_{6}} \\ {r_{7}} \\ {r_{8}} \\ {r_{9}} \\ {r_{10}} \\ {r_{11}} \\ {r_{12}} \\ \hdashline {r_{13}} \\ {r_{14}} \\ \end{bmatrix} \end{align*}\]

This makes sense. If most of the energy was in the later channel taps and the real channel is bigger than the model, we’ll indeed want to use a slightly later slice of the received symbols!

## conclusions #

If we have a slightly larger channel than the model, we can use the energy distribution – which we can find out via a simple correlation, if our training sequences are reasonable (its autocorrelation “sharp”, mostly zero except at zero delay ^{1}) – to figure out where on the signal to run the least-squares estimation. The amount of error presumably depends on the characteristics of the channel itself. Also, there are more advanced methods for channel estimation, most notably MMSE, which requires knowledge (or estimation) of noise and channel statistics. I can understand how one could estimate the noise statistics (if we have strong enough symbols we can wipe off the noise), but it’s slightly unclear to me how one estimates the channel statistics….if one’s trying to…estimate the channel? If you want to explain how this gets done in real-world systems, I would be delighted to hear about it!

I think what I’ll do next is try to formalize and write code for the time/frequency offset estimation, and get that correctly cueing the channel estimation on the right part of the signal. The goal of this series is to do a survey of the *all* the necessary “minimally viable” signal processing elements that compose a reasonable (similar data rates, similar channel properties, similar performance) GSM-ish receiver, not to explore *all* the possible methods (there are many of them, and people keep coming up with more!) for each signal processing block.

We send \(TS\), and receive \(TS \ast chan\) (\(TS\) convolved with the channel impulse response). If we want to estimate the channel with a simple correlation, the receiver computes \((TS \ast chan) \star TS\), where \(\star\) is the

*correlation*operator. The properties of convolution and correlation let us rewrite that as \(chan \ast (TS \star TS)\) – the channel impulse response itself, correlated with the autocorrelation of the training sequence. The closer the training sequence autocorrelation is to zero (besides at zero delay), the more accurate the simple correlation method’s estimate of the impulse response.↩︎