GSM receiver blocks: timing synchronization (part 2: correlation edition)

In the last post, we ran a least-squares on every possible time offset, calculated the loss, and declared the optimal timing offset to be the one with the lowest loss. This approach is incredibly inefficient, but critically, we know it handles the dispersive channel correctly. In this and the next post, we’ll use it as a gold standard to validate a more efficient approach.

cut to the chase: it’s a correlation, right? #

Yep. It’s a correlation. Our intuition indeed tells us to correlate the received signal against the modulated training sequence, and look for the correlation peak. If we didn’t have a dispersive channel, the story ends here.

The dispersive channel makes things a bit more subtle! Remember, what’s received is not going to look like what’s transmitted, so we have to be careful in our analysis. When we ran the least-squares, we were careful to run it only on the slice of received training sequence that was unaffected by unknown data. It’s possible we might have to do something similar here – use a subset of the training sequence, and not the whole thing.

index reckoning #

If the channel’s delay spread is \(L\) symbol intervals long, we have a couple reasonable choices for the correlation “template”:

the full training sequence
the training sequence with \(L\) symbols removed from the beginning
the training sequence with \(L\) symbols removed from the end
the training sequence with \(L\) symbols removed from both ends

I learned enough SAGE to calculate the symbolic expressions for the first two cases (with a channel length of 4, a training sequence length of 10, and 10 symbols before and after the training sequence) to see if there was some insight attainable by looking at the output:

sage: prepend_data = list(var('XXXXXXXXXX_%d' % (i)) for i in range(10))
sage: TS = list(var('TS_%d' % (i+10)) for i in range(10))
sage: append_data = list(var('XXXXXXXXXX_%d' % (i+20)) for i in range(10))
sage: burst = prepend_data + TS + append_data
sage: chan = list(var('CHAN_%d' % (i+1)) for i in range(4))
sage: received = convolution(burst, chan)
sage: convolution(received, list(reversed(TS)))
sage: convolution(received, list(reversed(TS[4:10])))

We can see that for instance, the correlation has zero outputs unaffected by unknown data (run it yourself if you want to check, i’m not including the output here), but correlating with TS[4:10] – removing the first 4 symbols from the training sequence – has two outputs unaffected by unknown data:

(CHAN_4*TS_10 + CHAN_3*TS_11 + CHAN_2*TS_12 + CHAN_1*TS_13)*TS_14 + (CHAN_4*TS_11 + CHAN_3*TS_12 + CHAN_2*TS_13 + CHAN_1*TS_14)*TS_15 + (CHAN_4*TS_12 + CHAN_3*TS_13 + CHAN_2*TS_14 + CHAN_1*TS_15)*TS_16 + (CHAN_4*TS_13 + CHAN_3*TS_14 + CHAN_2*TS_15 + CHAN_1*TS_16)*TS_17 + (CHAN_4*TS_14 + CHAN_3*TS_15 + CHAN_2*TS_16 + CHAN_1*TS_17)*TS_18 + (CHAN_4*TS_15 + CHAN_3*TS_16 + CHAN_2*TS_17 + CHAN_1*TS_18)*TS_19,


(CHAN_4*TS_11 + CHAN_3*TS_12 + CHAN_2*TS_13 + CHAN_1*TS_14)*TS_14 + (CHAN_4*TS_12 + CHAN_3*TS_13 + CHAN_2*TS_14 + CHAN_1*TS_15)*TS_15 + (CHAN_4*TS_13 + CHAN_3*TS_14 + CHAN_2*TS_15 + CHAN_1*TS_16)*TS_16 + (CHAN_4*TS_14 + CHAN_3*TS_15 + CHAN_2*TS_16 + CHAN_1*TS_17)*TS_17 + (CHAN_4*TS_15 + CHAN_3*TS_16 + CHAN_2*TS_17 + CHAN_1*TS_18)*TS_18 + (CHAN_4*TS_16 + CHAN_3*TS_17 + CHAN_2*TS_18 + CHAN_1*TS_19)*TS_19

This is a curious phenomenon: chopping off symbols from the training sequence – causing the correlation template to have fewer symbols – causes more output values that don’t depend on unknown data. This makes total sense, since if you have a tiny little template then it’ll have more “alignments” in the un-tainted section of the received signal.

Unfortunately, this goes in the opposite direction of traditional wisdom about correlation: use the largest template you can. So we’re left to wonder, is there a “happy medium” between too much contribution from unknown data, and too few symbols in the correlation template?

There is a saving grace: the standard assumption (and in fact, standard¹ practice!) is that modulators are fed data sufficiently² indistinguishable from random – so maybe “contribution from unknown data” is not as bad as it seems.

There’s only so much thinking about it can do. It seems like we’re going to have to figure out some ways to judge our estimators, and do some simulations.

estimator quality #

Let’s think of some criteria we can use to evaluate our estimators:

Error #

The estimator gives us an estimated timing offset, and the closer it is to the “true” (determined by least-squares offset sweep) timing offset, the happier we are.

For each trial, we will compute an estimated timing offset, a true timing offset, and an error – and we can accumulate the error over multiple trials (for instance, by calculating a mean squared error). We can then generate a graph of how the error changes as a function of signal-to-noise ratios, since it’s possible that noise affects each estimator differently.

what about the bits? #

The purpose of most³ receivers is to ingest RF (or baseband) and output the best possible estimate of the bits the transmitter ingested, and it’s unclear how these timing errors affect downstream signal processing.

If this were a receiver for a nondispersive channel, we’d make the argument that symbol timing error straightforwardly translates into symbol decisions happening at the wrong times. This leads to the RRC condition being violated causing symbol errors from ISI, and presumably more influence from noise, since we’re not capturing the signal at its peak. We could look at the pulse shape and filter responses and try and eyeball how much timing errors affects bit error rate.

However, we fully intend to tackle the nastiest of dispersive channels, and therefore what lies downstream of the synchronization blocks is a channel estimator and a trellis detector. Timing errors will affect both of these in more complicated ways.

We can try and handwave and say that small enough timing errors will be compensated for by the channel estimator and so we shouldn’t worry much, but I am interested in trying to see if we can find a somewhat reasonable way to quantify timing errors.

the dark forest downstream #

We haven’t got to trellis detection yet, and there are lots of subtleties and design choices which I don’t yet⁴ understand, but from what I know, the high level operating principle of trellis detection looks like this:

we do not try and find a sufficiently-magical⁵ filter that lets us “undo” the effect of the channel and feed it into a normal decision device
instead, we determine which symbols were sent by seeing how well the received signal matches what we would expect to receive for various transmitted symbol sequences
we do this with a local modulator⁶ that generates a “template” transmitted signal before the channel, for any given symbol sequence
the channel estimator told us what the channel looks like, so we can convolve the “template” signal against the channel to see what we’d expect to have received – if it the transmitter had sent that sequence of symbols
we choose the sequence of symbols that matches best

The error in the demodulation process is probably going to be vaguely of the form (with \(\ast\) convolution):

\[(\text{hypothetical transmitted signal}) \ast (\text{estimated channel}) - (\text{actual received signal})\]

Morally we should want to minimize this error, and we can compare how good our timing estimator is by comparing it to an “ideal” timing estimator, with something that looks like this:

Run the incredibly slow least-squares estimator over all possible offsets (ok we can cheat and cue it to where we know the training sequence lives), obtain an estimated channel with the lowest possible loss.
Calculate the magnitude of this over the whole signal: \[(\text{original transmitted signal}) \ast (\text{estimated channel with best offset}) - (\text{actual received signal})\]
Run the timing estimator, obtain a timing offset
Run a least-squares at the estimated timing offset, obtain an estimated channel
Calculate the magnitude of \[(\text{original transmitted signal}) \ast (\text{estimated channel}) - (\text{actual received signal})\]
The error of the timing estimator is the difference between the two magnitudes

I haven’t written code for this yet! But it seems reasonable?

Shape of the true correlation peak #

Is it narrow/wide? are there multiple sub-peaks? If there’s a single narrow peak (like when we slid the least-squares along the signal), then everything is happy, if the peak has multiple sub-peaks or is wider then we need to be more careful about what’s going on, and figure out how to go from a vector of correlation outputs to a single timing offset.

Sidelobe levels #

How high are the other peaks? If they’re high, the likelihood of the estimator choosing the wrong peak increases. To be clear, it’s often reasonable to accept higher sidelobe levels as a tradeoff for a narrower true peak / less errors, but we should be sure that our timing detector won’t accidentally lock on the wrong peak. Common methods of doing this would look like “having a local clock telling us approximately when we’re expecting to receive a burst” (like actual GSM receivers do) or an energy detector that tells us when a burst is starting.

next steps #

I’m going to write some code to implement a correlation-based timing estimator (parametrizable with how much of the training sequence we’re using as the “template”), along with code to test it against the “ideal” timing estimator.

And most critically, there will be plenty of graphs – of correlation peaks, sidelobes, and most critically, graphs with \(E_{b}/N_{0}\) on the x-axis!

If the transmitted data is trivially distinguishable from random, then the transmitter pays Shannon for energy (joules for feeding the power amplifier) and bandwidth (how many MHz we splatter our signal over) it doesn’t need. Motivating example without calculations: you are sending a hundred bits of data, with each of those bits generated by a random process with \(p(0) = 0.99\) and \(p(1) = 0.01\). This bitstring is quite distinguishable from random (even a low-pass filter can distinguish it). We can transmit the same information in the bitstring with much less energy and bandwidth by compressing it: Run-length encoding converts it into a much smaller bitstring. The compressed bitstring will look a lot closer to random data (and if there’s still obvious ways it differs, then we can compress it further :).↩︎
Only a certain amount of statistical indistinguishability. We don’t need full statistical indistinguishability here (error correction schemes necessarily introduce deterministic relationships between transmitted bits) and certainly not something like cryptographic indistinguishability.↩︎
There are, in fact, receivers where accurate channel and/or timing estimation is the primary goal, and the data is of secondary importance. For instance, a GPS receiver designer is mighty concerned about getting timing incredibly right, and an air-defense radar designer is in the business of estimating a channel – where one of the taps might be an enemy aircraft.↩︎
Ungerboeck vs Forney observation models is the most salient but there’s lots of other stuff.↩︎
The “sufficiently-magical filter” approach in fact works for well-behaved dispersive channels! Zero-forcing equalization (use a filter that’s the inverse of the channel) can lead to suffering really quick since if you have a null in your channel, you’ll have infinite noise amplification in your equalizer, which is bad. MMSE equalization strikes a balance between noise enhancement and ISI suppression based on the SNR, but with GSM channels we can’t get away with only MMSE. There are ways to design prefilters without introducing much noise to transmogrify the channel’s impulse response to have more of its energy towards the beginning, and this can help make the trellis detector less compute-intensive. Channel-shortening will be a different story, and one we will look at in a later post!↩︎
Usually just lookup tables, sorry to ruin the mystique.↩︎