softminus https://softminus.org/atom.xml sasha softminus@gmail.com 2023-05-05T00:00:00Z GSM receiver blocks: timing synchronization (part 3: less talk, more fight) https://softminus.org/posts/time-sync-part-3.html 2023-05-05 2023-05-05T00:00:00Z GSM receiver blocks: timing synchronization (part 3: less talk, more fight)

In the last post, we had thought up of some ideas for how to do timing synchronization. We note that this is not exactly how GSM works – with GSM you have special bursts that are used for timing synchronization, and the receiver uses those to keep its clock in sync with the transmitter. Since we’re not actually implementing a GSM receiver (how many GSM networks remain?), we’re going to look at the generic case of doing timing synchronization with a known training sequence in the presence of severe multipath (mostly for training purposes. I think this stuff is neat and I want to get better at it).

## More samples, better peak, less sidelobes #

We try out various correlations and see what happens (literally, we’re eyeballing stuff here):

tiledlayout(2,2)
nexttile
[c, lagz] = xcorr(received, modulated_training_sequence(1:end)); plot(abs(c(200:end)))
title("correlation with full training sequence")
nexttile
[c, lagz] = xcorr(received, modulated_training_sequence(9:end)); plot(abs(c(200:end)))
title("correlation with TS(9:end)")
nexttile
[c, lagz] = xcorr(received, modulated_training_sequence(1:end-8)); plot(abs(c(200:end)))
title("correlation with TS(1:end-8)")
nexttile
[c, lagz] = xcorr(received, modulated_training_sequence(9:end-8)); plot(abs(c(200:end)))
title("correlation with TS(9:end-8)")
copygraphics(gcf)

and we get the following plot. Note that the vertical scale is different for each of the subplots. OK, so as we kinda suspected in the previous post, using a longer correlation template leads to less sidelobe amplitude without obvious widening of the true correlation peak.

## Writing an estimator #

Now that we’ve decided to use the full training sequence for doing correlation in time, let’s write out an estimator. First of all, we correlate the received signal with the training sequence:

correlation_output = conv(signal, conj(flip(modulated_training_sequence)));
[val, uncorrected_offset] = max(correlation_output);

## Running it #

We use the same channel (otherwise the shape of the correlation peak will be different across runs), and we run it a bunch of times with the same training sequence and random data:


% signal creation
training_sequence = randi([0 1], 32,1) %[0,1,0,0,0,1,1,1,1,0,1,1,0,1,0,0,0,1,0,0,0,1,1,1,1,0]';

% channel creation
nominal_sample_rate = 1e6 * (13/48);
signal_channel = stdchan("gsmEQx6", nominal_sample_rate, 0);

modulated_training_sequence = minimal_modulation(training_sequence);

average_convolution = zeros(319,1);

average_convolution_energy = zeros(319,1);

for i = 1:1024
data = [randi([0 1],128,1); training_sequence; randi([0 1], 128,1)];
modulated = minimal_modulation(data);
signal_channel(complex(zeros(30,1)));
first_convolution = conv(awgned, conj(flip(modulated_training_sequence)));
average_convolution_energy = average_convolution_energy + abs(first_convolution);
average_convolution = average_convolution + first_convolution;
end

figure;
plot(abs(average_convolution));
title("average convolution output");
figure;
plot(abs(average_convolution_energy));
title("average convolution energy output");

function not_really_filtered = minimal_modulation(data)
not_really_filtered = pammod(data,2);
end

Output from a single run looks like this:

And on average, we get something like this: the first image is the average of the outputs, the second image is the average of the energy of the outputs:

## Eyeball-based analysis #

We observe two things:

1. There isn’t a single sharp correlation peak – it has lots of structure, even when averaged over 1024 runs. Even more disquieting, the structure seems constant across runs on the same channel, even though the data is random.

2. For a single run, there’s a lot of sidelobe energy. It gets averaged away over many runs, but our estimator needs to work on a single run only.

## Don’t worry too much about the sidelobes #

In an actual system, we almost always have additional information about our signal:

• when the signal started (energy detector)
• when we expect it to start (local timing reference)
• where the training sequence lives in the signal (hopefully the implementer has read the standard)

Even if this information is somewhat coarse/inaccurate, it lets us remove some of the irrelevant bits (or rather, samples,) of the received signal before feeding it to the timing estimator, which will reduce how many sidelobes appear.

If the sidelobes are sufficiently far away from the main peak such that we can ignore them without too much additional information about signal structure, what truly matters is the shape and position of the intended correlation peak, and what’s immediately around it.

## Channel shape influences correlation peak structure #

Even if we avoid the sidelobes, it’s unclear which part of the correlation peak we should use as our timing estimate. The highest peak? The first peak above a significant threshold? It’s not unambiguous.

Training/synchronization sequences are generally selected to have negligible autocorrelation at non-zero delays – to improve, well, correlating against them. This means that when we correlate the received signal against the training sequence, we’ll get something that looks like the channel impulse response estimate – and the “sharper” the autocorrelation of the training sequence, the better that channel impulse response estimate will be.

If you think of the channel as a tapped delay line, then the peak of the correlation output corresponds to the “tap” (delay) with the highest magnitude. This doesn’t necessarily correspond to the first/“earliest” tap with a significant coefficient, nor does it generally correspond to the best timing for the least-squares channel estimation.

Naturally, if we only have a single big tap in our channel, this is probably fine: as long as the single big tap is in the window for the least-squares, we’re home free. However, if we have multiple significant taps – especially if they aren’t all bunched together – and want to do a good job, things get more complicated.

We need a better way to process this hedgehog-looking correlation peak.

## Sliding energy window heuristic #

Morally, to minimize bit error, we want to find the channel taps that give us the most energy. The more energy, the less the noise can perturb your decisions – that’s the extreme tl;dr of the whole Shannon thing. You don’t care about super-attenuated paths (their contributions are barely distinguishable from noise), and if you have a single path that’s much stronger than the others, you can ignore the others to pretend you don’t have a dispersive channel at all!

With Viterbi detection it’s a tad nontrivial to reason about it, but imagine a rake receiver: if you want to get the most signal energy into your decision device, you need to identify the channel taps with the most energy. The rake receiver can pick out a finite but arbitrary set of paths, but with Viterbi, we need to decide on a window of the channel to use – everything within that window gets used, nothing outside gets used. That window tends to be fairly small: trellis-based detection is a controlled combinatorial explosion.

Since that window is precious, we need to cram as much energy into it as possible, and this is precisely why the timing estimator matters here. If our timing is suboptimal, we’re dropping valuable signal energy on the floor. This inspires a timing estimator design: run a window of appropriate size over the coarse channel impulse response (which is generated by the correlation we previously looked at) and pick the offset with the highest total energy in that window.

### Aside: timing error in the non-ISI case #

Note that with a more usual non-ISI case, timing matters because we want to sample the signal at the best time, otherwise the eye diagram closes and we get more bit errors. In this receiver architecture, channel estimation + Viterbi take care of that consideration: the fine timing estimate is, in effect, “baked into” the channel estimate.

### Creating the estimator #

To find the size of the window, we ask whoever designed our channel estimator / trellis detector what’s the biggest channel they can handle, which as usual, we call $$L$$. Here, we’ve decided $$L=8$$. Mathematically, we’re taking the absolute value of the first convolution, and convolving that with an $$L$$-long vector $$[1,\cdots,1]$$:

first_convolution = conv(signal, conj(flip(template)));
second_convolution = conv(abs(first_convolution), ones(1,8));

[val, uncorrected_offset] = max(second_convolution);

Note the abs(first_convolution). This is critical, and omitting it caused me a lot of sadness and confusion. We want the total energy in that window, and if there’s cancellation across channel coefficients/taps then we’re…not getting a total energy.

If we change our code:


average_second_convolution = zeros(326,1);
% ...
second_convolution = conv(abs(first_convolution), ones(1,8));
average_second_convolution = average_second_convolution + second_convolution;
% ...
plot(abs(average_second_convolution));
title("average second convolution output");

We get something that looks remarkably better. While a single run looks like this: Single run of the second convolution output; click for full size

This looks pretty bad sidelobe-wise, but sidelobes don’t doom us.

When we average over multiple runs (which effectively “averages out” the sidelobes), we see that our new correlation erased the structure in our correlation peak. Note that it’s not the averaging over multiple runs that’s done this!

Sometimes a little structure does show up, but it’s not nearly as bad as before:

We try it with a different channel (gsmTUx12c1) to make sure that it’s not a peculiarity of the channel model we had been using:

## Validation #

Convolving once against the training sequence, calculating the energy, then convolving again with a largest-supported-channel-length vector of ones certainly generates extremely aesthetic plots, but we’ve yet to make sure that it actually matches up against the ground truth.

Without any subtlety, we simply generate a loss vector (how much least-square loss for each offset) for each run, sum them all up, and plot them alongside the second convolution output:

average_ts_losses = zeros(1,264);

% ...
ts_losses = least_squares_offset(awgned, training_sequence);
average_ts_losses = average_ts_losses + ts_losses;
% ...

average_second_convolution = average_second_convolution / norm(average_second_convolution);
average_ts_losses = average_ts_losses / norm(average_ts_losses);

figure;
plot(abs(average_second_convolution));
hold on;
plot(average_ts_losses);
title("peak = second convolution, dip = least squares loss");

And the comparison is incredibly encouraging. Note that these are averages, so the sidelobes from the data get averaged away. We see that the convolution-based estimator has more sidelobes, but the main peak is sharp: AVERAGE convolution estimator vs. least squares loss; click for full size

To show it’s not a fluke, we show a comparison for a single run: Single run: convolution estimator vs. least squares loss; click for full size

There’s still the eternal question of the indices, which decidedly do not line up. We investigate:

correlation_indices = [];
least_squares_indices = [];

% ...
[val, correlation_index] = max(second_convolution);
correlation_indices = [correlation_indices correlation_index];
[val, least_squares_index] = min(ts_losses);
least_squares_indices = [least_squares_indices least_squares_index];
% ...

And in the command window we run:

>> sum(least_squares_indices-correlation_indices)/length(least_squares_indices)

ans =

-29.9922

>> 

The almost-integral offset does seem to be a fluke, but we run it a few more times and see it’s not – it seems to vary a little bit:


>> clear; average_convolution_output

ans =

-33.7109

>> clear; average_convolution_output

ans =

-29.7305

>> clear; average_convolution_output

ans =

-27.7578

>> clear; average_convolution_output

ans =

-30.9141

>> 

The $$\sim30$$-ness was a bit concerning, since the GSM training sequence is $$26$$ syms long and our channel/window is $$8$$ long and there’s no obvious and morally-upstanding way to get $$\sim30$$ out of that, but looking at our source code we see that we indeed did choose a $$32$$-long training sequence (the % is the comment character in MATLAB):

training_sequence = randi([0 1], 32,1); %[0,1,0,0,0,1,1,1,1,0,1,1,0,1,0,0,0,1,0,0,0,1,1,1,1,0]';

## Residual processing #

We seem to have a little error between the least-squares timing estimator (which we’re using as a reference) and our two-correlation-based estimator, which is a bit concerning. We try to figure out what’s going on, and we start off simple:

>> min(least_squares_indices)-max(least_squares_indices)

ans =

0

>> least_squares_indices(1)

ans =

142

>> 

We notice that the least-squares estimator always gives the same index ($$142$$), so whatever is going on, it’s in the correlation-based estimator, and uh, there’s definitely something going on:

>> hold on
>> plot(least_squares_indices)
>> plot(correlation_indices)

That outlier is what’s skewing the average! While our code didn’t save the raw data for each run (only the outputs of the estimators), it’s clear what happened: most of the time, this estimator doesn’t get tricked by the sidelobes, but when it does, it gets tricked hard.

We plot a histogram of the indices:

>> histogram(correlation_indices, 'BinMethod', 'integers')

The most common (by far) value is $$174$$, which indeed is $$142+32$$:


>> 174-32

ans =

142

>> least_squares_indices(1) % remember least_squares_indices(n)=142 for all n

ans =

142

>> 

## Noise #

We have not tested this estimator in the presence of noise. Not to sound like a excerpt from a statistical signal processing textbook, but it is essential to test how good your estimators perform in the presence of noise. Graphs with $$E_{b}/N_{0}$$ on the x-axis are, strictly speaking, optional, but they do look very nice.

We won’t do a full examination of how these estimators work in noise, but we’ll take a quick look.

I ran the same code, except modified with awgned = awgn(received,4);. This adds AWGN such that the signal to noise ratio is $$4\text{dB}$$. From cursory inspection of the plot below, we see that both estimators (the correlation-based one more so than the LS one) are more likely to be tricked by sidelobes in higher-noise conditions. Even if we ignore the sidelobe-caused indices, we see some variation/wobble and not a straight line. This represents error which won’t be eliminated by running the estimators on a smaller section of the signal. Indices of the two estimators in $$4\text{dB}$$ SNR; click for full size

Here’s the histograms for the two indices. We definitely see that the huge errors are from sidelobes and not from the correlation peak spreading out because of noise: Histogram for least-squares timing estimator in $$4\text{dB}$$ SNR; click for full size Histogram for correlation-based timing estimator in $$4\text{dB}$$ SNR; click for full size

If we zoom in to eliminate the sidelobes, we see that the correlation peaks indeed spread out, but approximately the same amount for both estimators: Histogram for least-squares timing estimator in $$4\text{dB}$$ SNR, sidelobes removed; click for full size Histogram for correlation-based timing estimator in $$4\text{dB}$$ SNR, sidelobes removed; click for full size

## Conclusion #

The least-squares timing estimator we had been using as a reference is excellent, but it’s incredibly compute-intensive. Here, we derived and tested an alternate correlation-based estimator which only requires a convolution, a squaring operation, and a second convolution – and the second convolution doesn’t even require any multiplies!

To reduce computational effort and avoid getting tricked by sidelobes, if at all practical, we should use a priori information (a fancy way of saying “when we started receiving it” and/or “when we expect to receive it” alongside “where the training sequence lives in the signal”) about the signal to slice a section of the signal and only run the estimator on that section.

While previously we had wanted to concoct a measure for the “goodness” of a timing estimator that’d make sense for this context (trellis-based detection in an ISI channel, we’ll be looking at Viterbi itself in the next few posts.

Why try and fake it when we’ll learn and use the real thing?

]]>
GSM receiver blocks: timing synchronization (part 2: correlation edition) https://softminus.org/posts/time-sync-part-2.html 2023-04-18 2023-04-18T00:00:00Z GSM receiver blocks: timing synchronization (part 2: correlation edition)

In the last post, we ran a least-squares on every possible time offset, calculated the loss, and declared the optimal timing offset to be the one with the lowest loss. This approach is incredibly inefficient, but critically, we know it handles the dispersive channel correctly. In this and the next post, we’ll use it as a gold standard to validate a more efficient approach.

## cut to the chase: it’s a correlation, right? #

Yep. It’s a correlation. Our intuition indeed tells us to correlate the received signal against the modulated training sequence, and look for the correlation peak. If we didn’t have a dispersive channel, the story ends here.

The dispersive channel makes things a bit more subtle! Remember, what’s received is not going to look like what’s transmitted, so we have to be careful in our analysis. When we ran the least-squares, we were careful to run it only on the slice of received training sequence that was unaffected by unknown data. It’s possible we might have to do something similar here – use a subset of the training sequence, and not the whole thing.

## index reckoning #

If the channel’s delay spread is $$L$$ symbol intervals long, we have a couple reasonable choices for the correlation “template”:

• the full training sequence
• the training sequence with $$L$$ symbols removed from the beginning
• the training sequence with $$L$$ symbols removed from the end
• the training sequence with $$L$$ symbols removed from both ends

I learned enough SAGE to calculate the symbolic expressions for the first two cases (with a channel length of 4, a training sequence length of 10, and 10 symbols before and after the training sequence) to see if there was some insight attainable by looking at the output:

sage: prepend_data = list(var('XXXXXXXXXX_%d' % (i)) for i in range(10))
sage: TS = list(var('TS_%d' % (i+10)) for i in range(10))
sage: append_data = list(var('XXXXXXXXXX_%d' % (i+20)) for i in range(10))
sage: burst = prepend_data + TS + append_data
sage: chan = list(var('CHAN_%d' % (i+1)) for i in range(4))
sage: convolution(received, list(reversed(TS[4:10])))

We can see that for instance, the correlation has zero outputs unaffected by unknown data (run it yourself if you want to check, i’m not including the output here), but correlating with TS[4:10] – removing the first 4 symbols from the training sequence – has two outputs unaffected by unknown data:

(CHAN_4*TS_10 + CHAN_3*TS_11 + CHAN_2*TS_12 + CHAN_1*TS_13)*TS_14 + (CHAN_4*TS_11 + CHAN_3*TS_12 + CHAN_2*TS_13 + CHAN_1*TS_14)*TS_15 + (CHAN_4*TS_12 + CHAN_3*TS_13 + CHAN_2*TS_14 + CHAN_1*TS_15)*TS_16 + (CHAN_4*TS_13 + CHAN_3*TS_14 + CHAN_2*TS_15 + CHAN_1*TS_16)*TS_17 + (CHAN_4*TS_14 + CHAN_3*TS_15 + CHAN_2*TS_16 + CHAN_1*TS_17)*TS_18 + (CHAN_4*TS_15 + CHAN_3*TS_16 + CHAN_2*TS_17 + CHAN_1*TS_18)*TS_19,

(CHAN_4*TS_11 + CHAN_3*TS_12 + CHAN_2*TS_13 + CHAN_1*TS_14)*TS_14 + (CHAN_4*TS_12 + CHAN_3*TS_13 + CHAN_2*TS_14 + CHAN_1*TS_15)*TS_15 + (CHAN_4*TS_13 + CHAN_3*TS_14 + CHAN_2*TS_15 + CHAN_1*TS_16)*TS_16 + (CHAN_4*TS_14 + CHAN_3*TS_15 + CHAN_2*TS_16 + CHAN_1*TS_17)*TS_17 + (CHAN_4*TS_15 + CHAN_3*TS_16 + CHAN_2*TS_17 + CHAN_1*TS_18)*TS_18 + (CHAN_4*TS_16 + CHAN_3*TS_17 + CHAN_2*TS_18 + CHAN_1*TS_19)*TS_19

This is a curious phenomenon: chopping off symbols from the training sequence – causing the correlation template to have fewer symbols – causes more output values that don’t depend on unknown data. This makes total sense, since if you have a tiny little template then it’ll have more “alignments” in the un-tainted section of the received signal.

Unfortunately, this goes in the opposite direction of traditional wisdom about correlation: use the largest template you can. So we’re left to wonder, is there a “happy medium” between too much contribution from unknown data, and too few symbols in the correlation template?

There is a saving grace: the standard assumption (and in fact, standard1 practice!) is that modulators are fed data sufficiently2 indistinguishable from random – so maybe “contribution from unknown data” is not as bad as it seems.

There’s only so much thinking about it can do. It seems like we’re going to have to figure out some ways to judge our estimators, and do some simulations.

## estimator quality #

Let’s think of some criteria we can use to evaluate our estimators:

### Error #

The estimator gives us an estimated timing offset, and the closer it is to the “true” (determined by least-squares offset sweep) timing offset, the happier we are.

For each trial, we will compute an estimated timing offset, a true timing offset, and an error – and we can accumulate the error over multiple trials (for instance, by calculating a mean squared error). We can then generate a graph of how the error changes as a function of signal-to-noise ratios, since it’s possible that noise affects each estimator differently.

### what about the bits? #

The purpose of most3 receivers is to ingest RF (or baseband) and output the best possible estimate of the bits the transmitter ingested, and it’s unclear how these timing errors affect downstream signal processing.

If this were a receiver for a nondispersive channel, we’d make the argument that symbol timing error straightforwardly translates into symbol decisions happening at the wrong times. This leads to the RRC condition being violated causing symbol errors from ISI, and presumably more influence from noise, since we’re not capturing the signal at its peak. We could look at the pulse shape and filter responses and try and eyeball how much timing errors affects bit error rate.

However, we fully intend to tackle the nastiest of dispersive channels, and therefore what lies downstream of the synchronization blocks is a channel estimator and a trellis detector. Timing errors will affect both of these in more complicated ways.

We can try and handwave and say that small enough timing errors will be compensated for by the channel estimator and so we shouldn’t worry much, but I am interested in trying to see if we can find a somewhat reasonable way to quantify timing errors.

### the dark forest downstream #

We haven’t got to trellis detection yet, and there are lots of subtleties and design choices which I don’t yet4 understand, but from what I know, the high level operating principle of trellis detection looks like this:

• we do not try and find a sufficiently-magical5 filter that lets us “undo” the effect of the channel and feed it into a normal decision device
• instead, we determine which symbols were sent by seeing how well the received signal matches what we would expect to receive for various transmitted symbol sequences
• we do this with a local modulator6 that generates a “template” transmitted signal before the channel, for any given symbol sequence
• the channel estimator told us what the channel looks like, so we can convolve the “template” signal against the channel to see what we’d expect to have received – if it the transmitter had sent that sequence of symbols
• we choose the sequence of symbols that matches best

The error in the demodulation process is probably going to be vaguely of the form (with $$\ast$$ convolution):

$(\text{hypothetical transmitted signal}) \ast (\text{estimated channel}) - (\text{actual received signal})$

Morally we should want to minimize this error, and we can compare how good our timing estimator is by comparing it to an “ideal” timing estimator, with something that looks like this:

1. Run the incredibly slow least-squares estimator over all possible offsets (ok we can cheat and cue it to where we know the training sequence lives), obtain an estimated channel with the lowest possible loss.
2. Calculate the magnitude of this over the whole signal: $(\text{original transmitted signal}) \ast (\text{estimated channel with best offset}) - (\text{actual received signal})$
3. Run the timing estimator, obtain a timing offset
4. Run a least-squares at the estimated timing offset, obtain an estimated channel
5. Calculate the magnitude of $(\text{original transmitted signal}) \ast (\text{estimated channel}) - (\text{actual received signal})$
6. The error of the timing estimator is the difference between the two magnitudes

I haven’t written code for this yet! But it seems reasonable?

### Shape of the true correlation peak #

Is it narrow/wide? are there multiple sub-peaks? If there’s a single narrow peak (like when we slid the least-squares along the signal), then everything is happy, if the peak has multiple sub-peaks or is wider then we need to be more careful about what’s going on, and figure out how to go from a vector of correlation outputs to a single timing offset.

### Sidelobe levels #

How high are the other peaks? If they’re high, the likelihood of the estimator choosing the wrong peak increases. To be clear, it’s often reasonable to accept higher sidelobe levels as a tradeoff for a narrower true peak / less errors, but we should be sure that our timing detector won’t accidentally lock on the wrong peak. Common methods of doing this would look like “having a local clock telling us approximately when we’re expecting to receive a burst” (like actual GSM receivers do) or an energy detector that tells us when a burst is starting.

## next steps #

I’m going to write some code to implement a correlation-based timing estimator (parametrizable with how much of the training sequence we’re using as the “template”), along with code to test it against the “ideal” timing estimator.

And most critically, there will be plenty of graphs – of correlation peaks, sidelobes, and most critically, graphs with $$E_{b}/N_{0}$$ on the x-axis!

1. If the transmitted data is trivially distinguishable from random, then the transmitter pays Shannon for energy (joules for feeding the power amplifier) and bandwidth (how many MHz we splatter our signal over) it doesn’t need. Motivating example without calculations: you are sending a hundred bits of data, with each of those bits generated by a random process with $$p(0) = 0.99$$ and $$p(1) = 0.01$$. This bitstring is quite distinguishable from random (even a low-pass filter can distinguish it). We can transmit the same information in the bitstring with much less energy and bandwidth by compressing it: Run-length encoding converts it into a much smaller bitstring. The compressed bitstring will look a lot closer to random data (and if there’s still obvious ways it differs, then we can compress it further :).↩︎

2. Only a certain amount of statistical indistinguishability. We don’t need full statistical indistinguishability here (error correction schemes necessarily introduce deterministic relationships between transmitted bits) and certainly not something like cryptographic indistinguishability.↩︎

3. There are, in fact, receivers where accurate channel and/or timing estimation is the primary goal, and the data is of secondary importance. For instance, a GPS receiver designer is mighty concerned about getting timing incredibly right, and an air-defense radar designer is in the business of estimating a channel – where one of the taps might be an enemy aircraft.↩︎

4. Ungerboeck vs Forney observation models is the most salient but there’s lots of other stuff.↩︎

5. The “sufficiently-magical filter” approach in fact works for well-behaved dispersive channels! Zero-forcing equalization (use a filter that’s the inverse of the channel) can lead to suffering really quick since if you have a null in your channel, you’ll have infinite noise amplification in your equalizer, which is bad. MMSE equalization strikes a balance between noise enhancement and ISI suppression based on the SNR, but with GSM channels we can’t get away with only MMSE. There are ways to design prefilters without introducing much noise to transmogrify the channel’s impulse response to have more of its energy towards the beginning, and this can help make the trellis detector less compute-intensive. Channel-shortening will be a different story, and one we will look at in a later post!↩︎

6. Usually just lookup tables, sorry to ruin the mystique.↩︎

]]>
GSM receiver blocks: synchronization in time (part 1: the gold standard) https://softminus.org/posts/time-sync-part-1.html 2023-04-15 2023-04-15T00:00:00Z GSM receiver blocks: synchronization in time (part 1: the gold standard)

So now that we have a good enough channel estimation mechanism, we need to figure out how to apply it to something that vaguely looks like a real-world problem. This means no hard coding indices! We’re not going to get away with that in the real world, unless maybe we’ve got a sync cable between the transmitter and receiver…

This means we need to handle a few things:

• Timing offsets: we don’t know exactly when the training sequence starts
• Frequency offsets: local oscillators aren’t perfect
• Phase offsets: transmitter and receiver local oscillators aren’t perfectly in phase

## unfazed about the phase #

Phase offsets are the easiest to handle, since those get “baked into” the channel estimate. Adding a phase offset $$\phi_{1}$$ before the channel and a phase offset $$\phi_{2}$$ after the channel is equivalent to multiplying the channel estimate by a complex number with magnitude 1 and phase $$\phi_{1}+\phi_{2}$$. With as $$\ast$$ convolution:

$((\phi_{1} \cdot \text{transmitted}) \ast \text{channel})\cdot \phi_{2} = \text{transmitted} \ast ([\phi_{1} + \phi_{2}] \cdot \text{channel})$

So we don’t even need to estimate a phase offset – the channel estimation handles it.

### but what about the phase noise? #

We handle phase noise with the time-honored tradition of ignoring it. Seriously though, I’m not sure how much it’s a problem. I think if we have a way to update the channel estimate as it’s being used to estimate bits (“per survivor processing”), we should be able to handle it, but I’m nowhere near that yet.

## we’ll fret about the freqs later #

A frequency offset will show up as a changing phase offset, and this won’t be handled by the channel estimation – unless, again, if we have a way to update the channel estimate as it’s being used in the actual demodulation.

Fortunately, we can estimate and compensate for frequency offsets earlier in the receiver and without needing the ability to estimate/update the channel estimate. In fact, we likely can get better results by compensating for frequency error with a mechanism designed for that purpose, that can operate over the entire received burst at once.

In actual GSM, coarse frequency synchronization is handled by the “frequency burst”; which carries no data but is designed to allow easy recovery of the carrier frequency at the mobile station. We’ll look at how to handle frequency offsets in a future post.

## time for timing #

Similarly, in actual GSM, coarse time synchronization is handled by the “synchronization burst” – which has an extra-long training sequence and information that identifies the base station.

We will look at how to handle time offsets in this post. The synchronization burst is a special case of a normal burst, and we can use the same methods to handle both.

## let the loss be your guide #

I spent some time guessing various indices for the least squares and eyeballing “how good” (with a channel generated by conv(modulated, [1,2,3,4,5,4,3,2]); ) the channel estimate was, which was kind of elucidating but not very principled.

Fortunately, we have a better tool: the least squares loss function.

We stop using the hardcoded channel [1,2,3,4,5,4,3,2] with its mysteriously-integer-valued coefficients, and use interference_channel = stdchan("gsmEQx6", nominal_sample_rate, 0); instead, and we add some AWGN: awgned = awgn(received,10);. Here’s the code:

## entire example code #


training_sequence = [0,1,0,0,0,1,1,1,1,0,1,1,0,1,0,0,0,1,0,0,0,1,1,1,1,0]';

data = [randi([0 1],64,1); training_sequence; randi([0 1], 128,1)];

modulated = minimal_modulation(data);
nominal_sample_rate = 1e6 * (13/48);

interference_channel = stdchan("gsmEQx6", nominal_sample_rate, 0);

nominal_channel_length = 8

modulated_training_sequence = minimal_modulation(training_sequence);
training_sequence_length = length(training_sequence)

toeplitz_column = modulated_training_sequence(nominal_channel_length:training_sequence_length);
toeplitz_row = flip(modulated_training_sequence(1:nominal_channel_length));
T = toeplitz(toeplitz_column, toeplitz_row);

clean_part_of_training_sequence = training_sequence_length - nominal_channel_length;

loss_vector = T* estimated_chan - interesting_part_of_received_signal;
TS_losses(offset) = norm(loss_vector);
end

[val, best_offset] = min(TS_losses);

function not_really_filtered = minimal_modulation(data)
not_really_filtered = pammod(data,2);
end

## brute force timing estimation #

The relevant section for what we’re doing today is below. We try every possible offset, and we see which one gives us the smallest loss:



loss_vector = T* estimated_chan - interesting_part_of_received_signal;
TS_losses(offset) = norm(loss_vector);
end

[val, best_offset] = min(TS_losses);

best_estimated_chan = lsqminnorm(T, interesting_part_of_received_signal)

Yeah, we redo the least squares calculation, but it’s not a big deal. This is a terribly slow way to do timing synchronization anyway, but it’s excellent to figure out what is going on.

## this is loss #

We plot(TS_losses) and get an unambigous and deep correlation dip:

Looking at how we calculated TS_losses, we see that it’s only calculated over the training sequence, not over the whole burst. We are curious to see what happens if we calculate the “error” over the whole burst. We’ll do this by convolving the transmitted signal (before the channel) with the estimated channel, and then subtracting the received signal:

>> tiledlayout(2,1)
>> nexttile
>> nexttile
>> copygraphics(gcf)

and they don’t look very similar at all :(

Either we made a big mistake, or there’s behavior in the real channel that a convolution with the estimated channel is failing to capture.

## the fake, in its attempt to be real #

We try and plot the difference, praying that there might be something interesting there…and get an error.

>> plot(abs(received-fake_received_signal))
Arrays have incompatible sizes for this operation.

This leads us to look at the sizes, and more specifically, the difference in sizes:

>> size(fake_received_signal)

ans =

225     1

ans =

218     1

>> 218-225

ans =

-7

7 is sus because it’s almost 8, the putative channel size in GSM. We look at the plots again, and we observe that the actual received signal is preceded by samples that look…zero-ish, and exactly 7 of them at that:

>> received

-0.0000 + 0.0001i % 1
-0.0014 - 0.0006i % 2
0.0027 + 0.0010i % 3
-0.0003 - 0.0022i % 4
0.0019 + 0.0056i % 5
-0.0040 - 0.0044i % 6
0.0018 + 0.0033i % 7
0.1531 + 0.2801i
0.2313 + 0.2729i
0.3761 - 0.1057i
-0.1923 - 0.1956i
-0.7524 - 0.6256i

This hasn’t had AWGN added – this is just from the effect of stdchan("gsmEQx6", nominal_sample_rate, 0). It looks like this Matlab channel simulation is flushing the channel with some noise, since the modulated signal doesn’t look like this at all:

>> modulated

modulated =

1.0000 + 0.0000i
1.0000 + 0.0000i
-1.0000 + 0.0000i
-1.0000 + 0.0000i
-1.0000 + 0.0000i
1.0000 + 0.0000i
-1.0000 + 0.0000i
-1.0000 + 0.0000i
1.0000 + 0.0000i
1.0000 + 0.0000i

If we run a cross-correlation between this “fake” received signal (generated by convolving the original modulated signal with the estimated channel) and the actual received signal, we get something remarkably disappointing:

>> fake_received_signal = conv(best_estimated_chan, modulated);
>> stem(lagz,abs(c))

If we zoom in on the center, we see it’s unfortunately nowhere near sharp:

It’s still unclear exactly what is going on – the behavior differences at the beginning/end of the channel simulation fail to explain the catastrophic lack of similarity.

We go back to our contrived, integer-valued channel:

received =  conv(modulated, [1,2,3,4,5,4,3,2]);

and run this again. To make things really simple, we turn off the noise:

awgned = received;

We plot TS_loss and see a perfect zero loss at an offset of 72 (64 + 8):

but the best estimated channel is nowhere near the actual channel, which is [1,2,3,4,5,4,3,2]:

>> best_estimated_chan

best_estimated_chan =

1.2873
0.3579
-1.0171
-0.6468
-0.5164
-0.3579
-3.1032
-2.9164

>> 

### it’s a bug #

So we take a look at our code again, and we find a plain old bug:

for offset = 1:(length(received)-clean_part_of_training_sequence)

loss_vector = T* estimated_chan - interesting_part_of_received_signal;
TS_losses(offset) = norm(loss_vector);
end

[val, best_offset] = min(TS_losses)

best_estimated_chan = lsqminnorm(T, interesting_part_of_received_signal)

In the penultimate line, where we slice out the part of the received signal we’ll run a least-squares on, the indices for the slice are(offset:offset+clean_part_of_training_sequence).

offset is the loop variable, and its value there is quite simply the offset of the last least-squares computed in the loop. It is not best_offset, which is the index of the minimum loss.

We fix this, and now the best estimated channel is indeed what we expect it to be:

best_estimated_chan =

1.0000
2.0000
3.0000
4.0000
5.0000
4.0000
3.0000
2.0000

## a brief detour in shoggoth-land #

Out of curiosity, I gave the above excerpt to ChatGPT-4 preceded by the prompt “find the bug:”, and it figures it out perfectly!

GPT-3.5 didn’t clue in on it, and even GPT-4 didn’t clue in on it when fed the whole file’s contents rather than the excerpt.

## validating the fix #

With this fix in place, we see that the fake received signal (generated by convolving the transmitted signal with the best estimated channel) is indeed identical to the original received signal, at least with the contrived channel:

## looking at real channels again #

We go back to the real channel (well, it’s not an actual IRL channel but it’s a simulation of an IRL channel), and see how much better our “fake” (generated by convolving the transmitted signal with the best estimated channel) received signal is:

Now this is a specimen!

This is, effectively, the same signal, except for subtleties in how the beginning and end are handled. It looks like the channel simulator flushes (seasons?) the beginning of the channel with very low-amplitude noise, and handles the end by “cutting off” the simulation before letting the channel “drain”, whereas the simplistic convolution doesn’t prepend low-amplitude noise at the beginning and lets the channel completely drain out.

If we add sufficient zeros to the beginning and end of the modulated signal, and feed it in both the channel simulation and the convolution:

>> received = interference_channel([zeros(16,1);modulated;zeros(16,1)]);
>> tiledlayout(2,1)
>> nexttile
>> nexttile
>> copygraphics(gcf)

Since these are complex signals, we should be sure that the real/imaginary parts match, and not just the magnitudes:

>> received = interference_channel([zeros(16,1);modulated;zeros(16,1)]);
>> tiledlayout(2,2)
>> nexttile
>> nexttile
>> nexttile
>> nexttile
>> copygraphics(gcf)

Looks good!

## a simpler method will wait for next time #

We’ve actually made a working time synchronization estimator! Unfortunately it’s incredibly inefficient – requiring a whole least-squares estimate for every possible offset. However, it will serve as an ironclad gold standard1 and help us build a more efficient synchronization estimator next time.

1. “ironclad gold standard” is funny if you take it literally (usually you plate cheap metals with gold, not the other way around :D)↩︎

]]>
GSM receiver blocks: least-squares channel estimation part 3: thinking about model mismatch https://softminus.org/posts/bigger-channel.html 2023-04-13 2023-04-13T00:00:00Z GSM receiver blocks: least-squares channel estimation part 3: thinking about model mismatch

Let’s look at the case when the actual channel is slightly longer than the model. If the channel is much longer than the model, then the model will be unable to capture most of the channel’s properties, and the channel estimation will probably be nonsense. But if the channel is only slightly longer than the model, then the model will be able to capture most of the channel’s information (with some inherent inaccuracy). In this case, the channel estimation might be good enough for practical purposes.

For a practical example, let’s say that we have a training sequence of length 10, and an assumed channel of length 4. As usual, the training sequence is sandwiched between unknown data symbols.

We call the training sequence symbols $$TS = t_{1}, \cdots, t_{10}$$, the data symbols (presumed unknown here) $$d_{i}$$ (for $$i<1$$ or $$i>10$$), and the channel coefficients $$chan = c_1,\cdots,c_4$$. The channel is assumed to be causal, so the channel output is given by convolving the channel coefficients with the transmitted symbols $$d_{-1}, d_{0}, t_{1}, \cdots, t_{10}, d_{11}, d_{12}$$.

## observation model and least squares #

The observation model is a convolution of the signal against the channel and looks like this (we draw the matrices aligned that way to make it clear how the multiplication works): \begin{align*} & \begin{bmatrix} c_{1}\\ c_{2}\\ c_{3}\\ c_{4} \end{bmatrix} & \\ \begin{bmatrix} t_{1} & d_{0} & d_{-1} & d_{-2} \\ t_{2} & d_{1} & d_{0} & d_{-1} \\ t_{3} & t_{2} & t_{1} & d_{0} \\ \hdashline t_{4} & t_{3} & t_{2} & t_{1} \\ t_{5} & t_{4} & t_{3} & t_{2} \\ t_{6} & t_{5} & t_{4} & t_{3} \\ t_{7} & t_{6} & t_{5} & t_{4} \\ t_{8} & t_{7} & t_{6} & t_{5} \\ t_{9} & t_{8} & t_{7} & t_{6} \\ t_{10} & t_{9} & t_{8} & t_{7} \\ \hdashline d_{11} & t_{10} & t_{9} & t_{8} \\ d_{12} & d_{11} & t_{10} & t_{9} \\ d_{13} & d_{12} & d_{11} & t_{10} \\ \end{bmatrix} \ast&& = \begin{bmatrix} {r_{2}} \\ {r_{3}} \\ {r_{4}} \\ \hdashline {r_{5}} \\ {r_{6}} \\ {r_{7}} \\ {r_{8}} \\ {r_{9}} \\ {r_{10}} \\ {r_{11}} \\ \hdashline {r_{12}} \\ {r_{13}} \\ {r_{14}} \\ \end{bmatrix} \end{align*}

Everything above and below the dashed lines is influenced by unknown data symbols, so we focus on the middle section of the matrix and the corresponding produced received symbols.

We notice that the least squares process “tries to” estimate the received symbols as a linear combination of the four columns of this matrix:

$\begin{bmatrix} t_{4} & t_{3} & t_{2} & t_{1} \\ t_{5} & t_{4} & t_{3} & t_{2} \\ t_{6} & t_{5} & t_{4} & t_{3} \\ t_{7} & t_{6} & t_{5} & t_{4} \\ t_{8} & t_{7} & t_{6} & t_{5} \\ t_{9} & t_{8} & t_{7} & t_{6} \\ t_{10} & t_{9} & t_{8} & t_{7} \\ \end{bmatrix}$

The least squares process will try to find coefficients $$[\hat{c}_{1}, \hat{c}_{2}, \hat{c}_{3}, \hat{c}_{4}]$$ such that the received symbols are well approximated by the linear combination of the four columns. The first received symbol not affected by unknown data – the first received symbol we ingest for the least-squares – is $$r_5 = c_1 \cdot t_4 + c_2 \cdot t_3 + c_3 \cdot t_2 + c_4 \cdot t_1$$, and this makes sense since the first row of the observation matrix is $$[t_4, t_3, t_2, t_1]$$.

## more channel coefficients, more problems #

If we add an extra channel coefficient $$c_5$$ , then the observation model looks like this. We bold the $$\mathbf{d_{0}}$$ in the fifth column since it is between the dashed lines but is an unknown data symbol. The red terms are the terms added by the extra channel coefficient.

\begin{align*} & \begin{bmatrix} c_{1}\\ c_{2}\\ c_{3}\\ c_{4} \\ \color{red}c_{5} \end{bmatrix} & \\ \begin{bmatrix} t_{1} & d_{0} & d_{-1} & d_{-2} & \color{red} d_{-3} \\ t_{2} & d_{1} & d_{0} & d_{-1} & \color{red} d_{-2} \\ t_{3} & t_{2} & t_{1} & d_{0} & \color{red} d_{-1}\\ \hdashline t_{4} & t_{3} & t_{2} & t_{1} & \color{red} \mathbf{d_{0}}\\ t_{5} & t_{4} & t_{3} & t_{2} & \color{red} t_{1} \\ t_{6} & t_{5} & t_{4} & t_{3} & \color{red} t_{2}\\ t_{7} & t_{6} & t_{5} & t_{4} & \color{red} t_{3}\\ t_{8} & t_{7} & t_{6} & t_{5} & \color{red}t_{4}\\ t_{9} & t_{8} & t_{7} & t_{6} & \color{red}t_{5}\\ t_{10} & t_{9} & t_{8} & t_{7} & \color{red}t_{6}\\ \hdashline d_{11} & t_{10} & t_{9} & t_{8} & \color{red}t_{7}\\ d_{12} & d_{11} & t_{10} & t_{9} & \color{red}t_{8}\\ d_{13} & d_{12} & d_{11} & t_{10} & \color{red}t_{9}\\ \end{bmatrix} \ast&& = \begin{bmatrix} {r_{2}} \\ {r_{3}} \\ {r_{4}} \\ \hdashline {r_{5}} \\ {r_{6}} \\ {r_{7}} \\ {r_{8}} \\ {r_{9}} \\ {r_{10}} \\ {r_{11}} \\ \hdashline {r_{12}} \\ {r_{13}} \\ {r_{14}} \\ \end{bmatrix} + c_{5} \begin{bmatrix} \color{red}d_{-3} \\ \color{red}d_{-2} \\ \color{red}d_{-1}\\ \hdashline \color{red}\mathbf{d_{0}}\\ \color{red}t_{1} \\ \color{red}t_{2}\\ \color{red}t_{3}\\ \color{red}t_{4}\\ \color{red}t_{5}\\ \color{red}t_{6}\\ \hdashline \color{red}t_{7}\\ \color{red}t_{8}\\ \color{red}t_{9} \end{bmatrix} \end{align*}

If $$c_5$$ is significantly less than the other coefficients, we can treat its contribution (the new vector on the right) to the received symbols like noise (and hope that it won’t mess up the estimation too much).

If the energy is distributed pretty evenly across all five channel coefficients, we can’t do this, since it’ll significantly worsen the fit. However, if we have the opposite case – $$c_1$$ significantly less than the other coefficients – we can do something similar to the previous case. We can treat the first column of the observation matrix as noise, and try to estimate the other four columns.

## it’s all off-by-ones? always was! #

Here’s the observation model if we’re trying to treat $$c_1$$ as negligible:

\begin{align*} & \begin{bmatrix} \color{red}c_{1}\\ c_{2}\\ c_{3}\\ c_{4} \\ c_{5} \end{bmatrix} & \\ \left[ \begin{array}{c|cccc} \color{red}t_{1} & d_{0} & d_{-1} & d_{-2} & d_{-3} \\ \color{red}t_{2} & d_{1} & d_{0} & d_{-1} & d_{-2} \\ \color{red}t_{3} & t_{2} & t_{1} & d_{0} & d_{-1}\\ \hdashline \color{red}t_{4} & t_{3} & t_{2} & t_{1} & \mathbf{d_{0}}\\ \color{red}t_{5} & t_{4} & t_{3} & t_{2} & t_{1} \\ \color{red}t_{6} & t_{5} & t_{4} & t_{3} & t_{2}\\ \color{red}t_{7} & t_{6} & t_{5} & t_{4} & t_{3}\\ \color{red}t_{8} & t_{7} & t_{6} & t_{5} & t_{4}\\ \color{red}t_{9} & t_{8} & t_{7} & t_{6} & t_{5}\\ \color{red}t_{10} & t_{9} & t_{8} & t_{7} & t_{6}\\ \hdashline \color{red}d_{11} & t_{10} & t_{9} & t_{8} & t_{7}\\ \color{red}d_{12} & d_{11} & t_{10} & t_{9} & t_{8}\\ \color{red}d_{13} & d_{12} & d_{11} & t_{10} & t_{9}\\ \end{array}\right] \ast&& = c_{1} \begin{bmatrix} \color{red}t_{1} \\ \color{red}t_{2} \\ \color{red}t_{3} \\ \hdashline \color{red}t_{4} \\ \color{red}t_{5} \\ \color{red}t_{6} \\ \color{red}t_{7} \\ \color{red}t_{8} \\ \color{red}t_{9} \\ \color{red}t_{10} \\ \hdashline \color{red}d_{11} \\ \color{red}d_{12} \\ \color{red}d_{13} \\ \end{bmatrix} + \begin{bmatrix} {r_{2}} \\ {r_{3}} \\ {r_{4}} \\ \hdashline {r_{5}} \\ {r_{6}} \\ {r_{7}} \\ {r_{8}} \\ {r_{9}} \\ {r_{10}} \\ {r_{11}} \\ \hdashline {r_{12}} \\ {r_{13}} \\ {r_{14}} \\ \end{bmatrix} \end{align*}

However, we need to be careful with indices when we do this! Note that if we ignore the stuff in red, the submatrix in between the horizontal dashed lines is not the same column vectors as previously! This will cause trouble! In order to get the same column vectors (those only containing known training symbols and spanning the maximum possible length) as before and have them multiplied with the correct subset of the channel $$[c_2,\cdots,c_5]$$, we need to shift the indices by one.

Indeed, to have a least-squares that generates a sensible output, the first received symbol we use needs to be $$\textrm{noise} + c_2 \cdot t_4 + c_3 \cdot t_3 + c_4 \cdot t_2 + c_5 \cdot t_1$$ (with the “noise” being the contribution from the first channel tap, which we are assuming is negligible). We need this since the first row of the least-squares matrix still is $$[t_4, t_3, t_2, t_1]$$. Looking at the indices of this desired received symbol, we see that this would be $$r_6$$, not $$r_5$$ in the previous case.

To write the correct observation model (which highlights the matrix we’ll use for least-squares), we simply move the horizontal dashed lines one row down!

\begin{align*} & \begin{bmatrix} \color{red}c_{1}\\ c_{2}\\ c_{3}\\ c_{4} \\ c_{5} \end{bmatrix} & \\ \left[ \begin{array}{c|cccc} \color{red}t_{1} & d_{0} & d_{-1} & d_{-2} & d_{-3} \\ \color{red}t_{2} & d_{1} & d_{0} & d_{-1} & d_{-2} \\ \color{red}t_{3} & t_{2} & t_{1} & d_{0} & d_{-1}\\ \color{red}t_{4} & t_{3} & t_{2} & t_{1} & \mathbf{d_{0}}\\ \hdashline \color{red}t_{5} & t_{4} & t_{3} & t_{2} & t_{1} \\ \color{red}t_{6} & t_{5} & t_{4} & t_{3} & t_{2}\\ \color{red}t_{7} & t_{6} & t_{5} & t_{4} & t_{3}\\ \color{red}t_{8} & t_{7} & t_{6} & t_{5} & t_{4}\\ \color{red}t_{9} & t_{8} & t_{7} & t_{6} & t_{5}\\ \color{red}t_{10} & t_{9} & t_{8} & t_{7} & t_{6}\\ \color{red}d_{11} & t_{10} & t_{9} & t_{8} & t_{7}\\ \hdashline \color{red}d_{12} & d_{11} & t_{10} & t_{9} & t_{8}\\ \color{red}d_{13} & d_{12} & d_{11} & t_{10} & t_{9}\\ \end{array}\right] \ast&& = c_{1} \begin{bmatrix} \color{red}t_{1} \\ \color{red}t_{2} \\ \color{red}t_{3} \\ \color{red}t_{4} \\ \hdashline \color{red}t_{5} \\ \color{red}t_{6} \\ \color{red}t_{7} \\ \color{red}t_{8} \\ \color{red}t_{9} \\ \color{red}t_{10} \\ \color{red}d_{11} \\ \hdashline \color{red}d_{12} \\ \color{red}d_{13} \\ \end{bmatrix} + \begin{bmatrix} {r_{2}} \\ {r_{3}} \\ {r_{4}} \\ {r_{5}} \\ \hdashline {r_{6}} \\ {r_{7}} \\ {r_{8}} \\ {r_{9}} \\ {r_{10}} \\ {r_{11}} \\ {r_{12}} \\ \hdashline {r_{13}} \\ {r_{14}} \\ \end{bmatrix} \end{align*}

This makes sense. If most of the energy was in the later channel taps and the real channel is bigger than the model, we’ll indeed want to use a slightly later slice of the received symbols!

## conclusions #

If we have a slightly larger channel than the model, we can use the energy distribution – which we can find out via a simple correlation, if our training sequences are reasonable (its autocorrelation “sharp”, mostly zero except at zero delay 1) – to figure out where on the signal to run the least-squares estimation. The amount of error presumably depends on the characteristics of the channel itself. Also, there are more advanced methods for channel estimation, most notably MMSE, which requires knowledge (or estimation) of noise and channel statistics. I can understand how one could estimate the noise statistics (if we have strong enough symbols we can wipe off the noise), but it’s slightly unclear to me how one estimates the channel statistics….if one’s trying to…estimate the channel? If you want to explain how this gets done in real-world systems, I would be delighted to hear about it!

I think what I’ll do next is try to formalize and write code for the time/frequency offset estimation, and get that correctly cueing the channel estimation on the right part of the signal. The goal of this series is to do a survey of the all the necessary “minimally viable” signal processing elements that compose a reasonable (similar data rates, similar channel properties, similar performance) GSM-ish receiver, not to explore all the possible methods (there are many of them, and people keep coming up with more!) for each signal processing block.

1. We send $$TS$$, and receive $$TS \ast chan$$ ($$TS$$ convolved with the channel impulse response). If we want to estimate the channel with a simple correlation, the receiver computes $$(TS \ast chan) \star TS$$, where $$\star$$ is the correlation operator. The properties of convolution and correlation let us rewrite that as $$chan \ast (TS \star TS)$$ – the channel impulse response itself, correlated with the autocorrelation of the training sequence. The closer the training sequence autocorrelation is to zero (besides at zero delay), the more accurate the simple correlation method’s estimate of the impulse response.↩︎

]]>
I have the power of equations! https://softminus.org/posts/equations.html 2023-04-12 2023-04-12T00:00:00Z I have the power of equations!

we use the code from these two posts to enable mathjax and syntax highlighting with hakyll (errors, omissions, and indentation butchery below are my own, I am not (yet?) a professional haskell programmer)


syntaxHighlightingStyle :: Style

mathExtensions =
[ Ext_tex_math_dollars
, Ext_tex_math_double_backslash
, Ext_latex_macros
]
codeExtensions =
[ Ext_fenced_code_blocks
, Ext_backtick_code_blocks
, Ext_fenced_code_attributes
]

defaultExtensions = writerExtensions defaultHakyllWriterOptions
newExtensions = foldr enableExtension defaultExtensions (mathExtensions <> codeExtensions)

pandocWriterSoupedUpOptions = defaultHakyllWriterOptions {
writerHTMLMathMethod = MathJax "",
writerExtensions = newExtensions,
writerH
-- [...]
create ["css/syntax.css"] $do route idRoute compile$ do
makeItem \$ styleToCss syntaxHighlightingStyle

and without further ado, we have

### the equations! #

$\begin{eqnarray} x+1 = 2 \\ y+2 = 3 \end{eqnarray}$

### matrices! #

$\begin{bmatrix} 1 & 2 & 3\\ a & b & c \end{bmatrix}$

### and even extremely fancy matrices! #

the code for this was inspired from this stackexchange post:

\begin{align*} & \begin{bmatrix} m_{0} & m_{1} & m_{2} & m_{3} \\ m_{4} & m_{5} & m_{6} & m_{7} \\ m_{8} & m_{9} & m_{10} & m_{11} \\ m_{12} & m_{13} & m_{14} & m_{15} \end{bmatrix} \\ \begin{bmatrix} v_{0} & v_{1} & v_{2} & v_{3} \end{bmatrix} & \mspace{5mu} \bigl[\begin{matrix} {r_{0}} & \mspace{15mu} {r_{1}} & \mspace{15mu} {r_{2}} & \mspace{15mu} {r_{3}} \end{matrix} \mspace{15mu} \bigr] \end{align*}

]]>

Here’s some interesting papers and webpages that I have hanging around in open browser tabs. Better to have them here than languishing in browser tabs/history/bookmarks!

## signal processing stuff #

### spicy signal processing: beyond circularity and linearity #

• normal filters add up multiple copies of the same signal, but time-offset
• array processing adds up multiple copies of the same signal, but space-offset
• FRESH (FREquency SHift) filters add up multiple copies of the same signal, but frequency-offset
• this is useful because many signals (like communication/radar RF signals) have redundancy/correlation in their frequency domain (a property called cyclostationarity)
• “Noncircularity exploitation in Signal Processing Overview and Application to Radar” by F. Barbaresco, Pascal Chevalier; about widely linear processing/filtering/estimation

• a lot of time it’s justified to assume that complex-valued signals through complex-valued systems behave the same as real valued signals and systems (and using the same sort of filters / estimators you’d use for real-valued everythings)
• pretending that complex signals work just like real signals depends on an assumption called “second-order circularity”
• second-order circularity doesn’t always hold!
• for instance if the signal (prior to passing through the channel) only takes a real value (like -1 or 1, like with a BPSK), then there’s a fundamental asymmetry between the inphase and quadrature channels, and that violates the second-order circularity assumption.
• note: a symmetric QAM signal (modulated with random data, as always) is itself not circularly symmetric (add a phase offset and the little square lattice gets tilted) but it is second-order circular
• if second-order circularity doesn’t hold and you process the received signal in a way that can’t tease apart the asymmetry then you are leaving signal on the table.
• in the case where the modulated signal is only real-valued (or can be transformed to be only real-valued) that special signal structure morally lets you get a sort of processing gain because you know that any variation in the complex axis is noise/interference/etc: • a linear filter looks like $$y = h\cdot x$$ ($$y$$ output, $$h$$ coefficients, $$x$$ input), the widely-linear model looks like $$y = g \cdot x + h \cdot x^*$$ ($$y$$ output, $$h$$ and $$g$$ coefficients, $$x$$ input, and $$x^*$$ the complex conjugate of $$x$$) – so it’s linear in both $$x$$ and its complex conjugate $$x^*$$
• as i understand it, this lets the system do stuff like “take only the real part of the signal” (because the noise all lives in the imaginary axis) but in a principled way
• “Widely Linear Estimation with Complex Data”, by Bernard Picinbono, Pascal Chevalier, also about widely linear processing

• “Receivers with widely linear processing for frequency-selective channels” by H. Gerstacker; R. Schober; A. Lampe: more about widely linear processing

• Widely linear filtering isn’t new: “Conjugate linear filtering” by W. Brown; R. Crane is from 1969!

• “Enhanced widely linear filtering to make quasi-rectilinear signals almost equivalent to rectilinear ones for SAIC/MAIC” by Pascal Chevalier, Rémi Chauvat, Jean-Pierre Delmas

• we saw earlier that if a signal (as transmitted) has a special form and only lives in the reals (like BPSK or a PAM), this allows for a form of processing gain at the receiver
• even more interestingly, this allows for signal separation / interference cancellation (if both the desired and interfering signal are of this form): the receiver can adjust the phase of the received signal until the desired signal lives only on the reals (this is a linear operation), and trash the imaginary component of the signal altogether
• the real-world realization is more complex since there are two channels (desired signal channel, interferer signal channel) that need to be taken into account, but this actually works: it’s called “single antenna interference cancellation” (SAIC)
• the titles of those papers implies that this is deployed for GSM networks, which notably uses GMSK, which is definitely not BPSK nor a PAM
• however, it turns out we can use this “single antenna interference cancellation” for certain modulations that aren’t BPSK or a PAM, with an additional step: the infamous “derotation”, which converts an MSK into BPSK, and converts GMSK into an almost-BPSK (“almost” because of the second Laurent pulse)
• this paper goes well beyond the standard SAIC; looking into both widely-linear filtering and FRESH filtering, in order to exploit the spectral structure of the signal of interest
• two books i found that might be useful later

## other people’s websites i liked (it’s nice seeing what other people get up to with static site generators) #

]]>
GSM receiver blocks: rough notes on coarse timing estimation https://softminus.org/posts/coarse-timing-notes.html 2023-01-05 2023-01-05T00:00:00Z GSM receiver blocks: rough notes on coarse timing estimation

In this GSMish scenario we don’t actually need pinpoint/“fine” timing/phase accuracy, since a good enough Viterbi demodulator effectively “cleans up” remaining timing/phase offset as long as it’s fed with an accurate enough channel estimate (especially if it’s able to update its channel estimate).

### small timing offsets don’t matter too much for this demodulator #

In a simplistic scenario, if our channel looks like $$[1,1]$$, it doesn’t matter if the channel estimator outputs $$[1,1,0,0,0,0,0,0]$$ or $$[0,1,1,0,0,0,0,0]$$ (here we are using the classic GSM design choice of making our channel estimator handle channels of length 8) or anything up to $$[0,0,0,0,0,0,1,1]$$ – we get the same results at the end. If we’re misaligned enough to get $$[0,0,0,0,0,0,0,1]$$ we are leaving half the energy in the received signal on the table, so we do want as much energy possible in the actual channel’s impulse response to appear within the channel estimate the demodulator is given.

Of course, with a more realistic case, the actual channel won’t be just two symbols long, this is terrestrial radio, not a PCB trace / transmission line nor an airplane-to-satellite radio channel :p

### coarse timing offsets matter a lot to the channel estimator #

In the case where the physical channel has a length commensurate with the channel length designed in the channel estimator / demodulator, we want to make sure that our least-squares channel estimator gets aimed at the right place in the burst – if it ingests lots of signal affected by unknown data (as opposed to known training sequence data affected by an unknown channel), its output will be kinda garbage.

### form is emptiness, cross-correlation is matched filtering #

We’d be at an impasse1 if the least squares estimator was our only tool here, but we have a simpler tool that’s more forgiving of misalignments: cross-correlating the received signal against the modulated training sequence. Another way of thinking of this is that we’re running our received signal through a matched filter (with the reference/template signal the modulated training sequence) – it’s literally the same convolution.

Doing this gives us something that looks like this: Using the Mk I eyeball, it’s pretty clear where the training sequence lives – at the tallest peak.

For implementation in software or gateware, we can encode this logic pretty easily: calculate the correlation, then iterate and look for the biggest peak. However, we notice that there’s a bunch of spurious peaks all around, and it’d be quite bad if we accidentally matched on a spurious peak: the channel estimate would be garbage, and the output of the demodulator would be beyond useless, since it wouldn’t even be starting off at the right spot in the signal.

We can avoid this failure case by running the correlation on a smaller window, which reduces the chances of hitting a false correlation peak. We determine the position of the smaller window using our prior knowledge of the transmitted signal structure – where the training sequence lives relative to the start of the signal – and an estimator to determine when the start of the signal happens.

It’s pretty easy to determine when the start of the signal happens: square/sum each incoming I/Q pair to get a magnitude, and keep a little window of those magnitudes and when their sum exceeds a threshold, well, that’s when the signal started.

We use this to narrow down the possible locations for the training sequence in the received signal. However, we still should run the correlation since this energy-detection start-of-signal estimator has more variance than the correlation-based timing offset estimator.

### why training sequences are like that #

Incidentally, the GSM training sequences (and lots of training sequences in other well-designed wireless communications systems) have interesting properties:

• their power spectra are approximately flat
• their autocorrelation have a tall narrow peak that approximates an impulse, and has much less energy elsewhere

The former is a desired property since we want to evenly probe the frequency response of the bandpass channel. Spreading the training sequence’s power unevenly (lots of power in one part of the passband and much less in another part of the passband) causes a worse signal-to-noise2 ratio in the parts of the passband with less training sequence power. It’s a zero-sum affair since the transmitter has finite transmit power.

The autocorrelation property not only lets us use these training sequences for time synchronization, but it lets us use correlation as a rough channel impulse response estimate. If we’re satisfied with a very suboptimal receiver, we can just use the correlation as our channel estimate. However, least-squares generally will give us a more accurate channel impulse response, since the autocorrelation of the training sequence is not 1 at zero lag and 0 elsewhere – there’s little sidelobes: ### truck-correlation (it’s like auto-correlation but with 18 wheels) #

If you don’t have a good intuition for what a narrow autocorrelation does here, you can develop one by going to a loading dock or a construction site and paying attention when big trucks or earthmoving equipment back up. See, those big rigs are required to have a back-up beeper to warn bystanders that the driver is backing up and can’t see well what’s behind the vehicle.

#### pure tone: easy to detect, hard to localize #

There’s two common types of back-up beeper, and unfortunately the more common kind outputs a series of beeps of a single pure tone (without changing frequency between beeps). If you close your eyes and only use that sound to determine where the truck is, you’ll find it’s quite a difficult task: it seems like the sound is coming from everywhere! The brain has a variety of mechanisms to localize sources of sound, and besides the ultra-basic “find which ear is receiving the loudest signal” method many of them kinda boil down to doing cross-correlations of variously-delayed versions of the left ear’s signal against variously-delayed versions of the right ear’s signal, and looking for correlation peaks. Seems familiar!

Unfortunately, the pure sine tone is the worst possible signal for this, since there’ll be tons of correlation peaks (each oscillation of the sine wave is identical to its precursor and successor), and if there’s audio-reflective surfaces around you and the truck, there’ll be tons of echoes too. Ambiguities galore! More spurs than a cowboy convention!

Ironically, the most useful (for angle-of-arrival localization) part of the pure-tone truck beeper’s signal is the moment the beep starts3, since the precursor is zero – the rest of the beep is comparatively useless for localization (an estimation task) but extremely useful for knowing that there’s indeed a truck somewhere in the neighborhood backing up (a detection task). The start and end of the beep are the most spectrally rich part of the beeper’s output, and this is indeed what we expect.

The pure sine wave is the easiest possible signal to detect (with our friend the matched filter), but the worst possible signal for localization; and this irony is why you can hear truck back-up beepers from uselessly far away but can’t easily tell which truck is backing up.

#### white noise: hard to detect, easy to localize #

Fortunately, there’s truck back-up beepers that output sounds far more amenable to localization: little bursts of white noise. If you haven’t heard those, you can find a youtube video of those in action, play it on your computer, and try and localize your computer’s speakers with your eyes closed.

You’ll notice that this is basically the optimal signal if you want to do angle-of-arrival estimation with delays and correlations – there’s only one correlation peak, and it’s exactly where you want it. It’s also extremely spectrally rich, and it has to be, since spectrally poor signals have worse autocorrelation properties. It also has the advantage of “blending in” with other noise: on-and-off bursts of white noise get “covered up” by white noise (and become indistinguishable from white noise) very quickly, a pure tone is much more difficult to cover up with white noise.

This is what a good training sequence looks like: simple correlation gets you a passable estimate for the channel impulse response along with the timing offset, since the autocorrelation approximates an impulse. Also, the spectral richness ensures that all the frequency response of the bandpass channel is probed.

### A Viterbi-style demodulator is a controlled combinatorial explosion#

I don’t think there’s too much useful we can do with the coarse correlation-based channel estimation to enable a more accurate channel estimation with more advanced (least-squares) methods – I had imagined looking at the coarse correlation-based channel estimate and looking for a window with the most energy and then doing a least-squares channel estimate only on that window, but I don’t think that actually has realistic benefits.

However, that idea (focusing on where energy is concentrated in the channel impulse response) does point to a more fructuous4 game we can play with channel impulse response: transforming the channel to squash the channel’s energy as much as possible into the earlier channel coefficients, and this is called “channel shortening”. Channel shortening is interesting because rather than having to delay decisions until the last possible moment, we can commit to decisions earlier, which reduces the computational burden (and area/power requirements) on a Viterbi-style demodulator pretty significantly.

If the impulse of the channel is highly front-loaded into, say, the first 3 symbols, we force a decision after only 3 symbol periods, since the likelihood of something after that making us change our mind is very unlikely. We still keep track of the effect of our decisions for as long as the channel lasts, since otherwise we’ll be introducing actual error (even if we make all the right decisions) that’d be pretty harmless to avoid: once we made the decisions, figuring out their effect is as simple as feeding them through a channel-length FIR filter.

1. maybe not, i am unsure if looking at the least square residuals would be enough to determine lack of time synchronization↩︎

2. which I am assuming to be distributed evenly across the passband↩︎

3. the moment the beep ends is theoretically the same but your ears are more desensed than when the beep starts↩︎

4. I’ve always wanted to use that word (or rather, its French cognate “fructueux”) in writing.↩︎

]]>
GSM receiver blocks: least-squares channel estimation part 2: working out indices and lengths https://softminus.org/posts/least-squares-indices.html 2022-12-19 2022-12-19T00:00:00Z GSM receiver blocks: least-squares channel estimation part 2: working out indices and lengths

In my post on least-squares channel estimation, I had done some reasoning about which received samples can be safely (they’re not affected by unknown data) used for a least-squares channel estimation:

The simple way to cope with this is to refuse to touch the first $$L-1$$ samples, and run our channel impulse response estimate over the $$M-L+1$$ samples after those. In GSM, this still gives us good performance, since for $$M=26$$, $$L=8$$ we have 19 samples to estimate 8 channel coefficients. Note that we also can’t use the trailing (in the scan, the last 4 rows) received symbols, since those also are affected by unknown data.

Now, our convolution matrix has dimensions $$M-L+1$$ by $$L$$, which makes sense, the only “trustworthy” (unaffected by unknown data) symbols are $$M-L+1$$ long, and we are convolving by a channel of length $$L$$.

Figuring out the exact offset for interference_rx_downsampled has been a bit tricky, and I haven’t yet dived into writing the right correlation to estimate the exact timing offset required.

From playing around some more in MATLAB with my source code, I realized I still don’t have a strong understanding of the exact offsets/indices/lengths at play here.

## let’s think step by step (and draw pictures) #

Rather than stare at algebraic expressions, we will draw pictures that speak to the physical meaning of the problem to help us reach expressions we actually understand.

We’ll take a generic GSM-like1 transmitted burst that is composed of $$D_1$$ data bits, followed by a midamble of $$TS$$ training symbol bits, and $$D_2$$ data bits.

Here’s what the burst looks like. I’ve written down the indices (starting at 1) for the first and last bit in each section. We note that all the lengths are correct:

• First data section is from $$1$$ to $$D_1$$ so its length is $$D_1-1+1 = D_1$$
• Midamble is from $$D_1+1$$ to $$D_1 + TS$$ so its length is $$D_1+TS-(D_1+1)+1 = TS-1+1 = TS$$
• Second data section is from $$D_1+TS+1$$ to $$D_1+TS+D_2$$ so its length is $$D_1 + TS + D_2 - (D_1 + TS + 1) + 1 = D_1-D_1 + TS - TS + D_2 - 1 + 1 = D_2$$.
• Total burst is from $$1$$ to $$D_1+TS+D_2$$ so its length is $$D_1 + TS + D_2 - 1 + 1 = D_1 + TS + D_2$$.

It’s clear how we can isolate any particular section of this burst before it has passed through a dispersive channel.

### notation review for intervals of integers#

• $$[1,5]$$ means “1 to 5, inclusive of the bounds (”closed”) on both sides”, and represents $${1,2,3,4,5}$$
• $$(1,5)$$ means “1 to 5, non-inclusive of the bounds (”open”) on both sides”, and represents $${2,3,4}$$
• We also can have left-closed right-open: $$[1,5)$$ is inclusive of the $$1$$ but not of the $$5$$ so we have: $${1,2,3,4}$$
• And likewise with left-open right-closed: $$(1,5]$$ represents $${2,3,4,5}$$

As the subtitle in the header insinuates, a dispersive channel is represented by a convolution. The structure of convolution tells us that each transmitted sample will affect multiple received samples, and the channel vector’s finite length tells us it’s not gonna be all of them.

We note that a single sample will be “smeared out” by a channel of length $$L$$ onto a span that’s $$L$$ long: single element convolution, channel length $$L=5$$

As for the indices, if this sample lives at index $$n$$, the index of this little “span of influence” will be $$[n, n+L-1]$$. Why these indices?

• the starting index: We currently don’t care3 about absolute delays, just what happens inside the delay spread. Remember the “ideal coaxial cable” thought experiment from our last post: the problem remains identical no matter how much ideal coaxial cable lives between our receiver antenna and our receiver frontend. We can therefore say that the input sample at index $$n$$ gets transmogrified by an “identity channel” (impulse response of $$$$, it doesn’t change the signal at all) to be an output sample at index $$n$$ – no need to add any offset.
• This means that the first output sample to be affected by our input sample will be at index $$n$$, which justifies the left-closed (includes its boundary): $$[n,$$

• the ending index: If the “span of influence” is $$L$$ long, the last sample that is affected by our input sample will be at index $$n+L-1$$. This justifies the right-closed (includes the boundary): $$,n+L-1]$$

Going back to our “single element convolution” example, if the $$x$$ input sample lives at index $$10$$, the first nonzero output sample lives at index $$10$$ by fiat. We observe nonzero output samples at $$11, 12, 13, 14$$ as well. Output sample $$15$$ and beyond are zero, as are samples $$9$$ and lower. This means that we have nonzero output at $$[10, 14]$$, and if we let $$n=10$$ and $$L=5$$ we get $$[10, 10+5-1]=[10,14]$$, which matches up with what we see.

## how the burst structure gets preserved by the convolution #

There is a definite structure to the transmitted burst: known data (the training sequence) sandwiched by unknown data. In realistic systems, the designers will select a training sequence length longer than any reasonable channel they expect to contend with, and so we expect:

• some received samples will be a function only of unknown data
• some received samples will be a function of unknown data and training sequence bits
• some received samples will be a function only of training sequence bits

To figure out which received samples are which, let’s draw out what happens when our burst gets convolved with a channel of length $$L$$. Each transmitted symbol will get “smeared out” onto an $$L$$-long span, and we focus on the symbols at the boundaries of each section.

The center line represents what the receiver hears, and for clarity, we draw the unknown data sections above the center line and the training sequence below the center line. $$D_1 = 11$$, $$TS=13$$, $$D_2=14$$, channel length $$L=5$$ case, full resolution here. The samples live on the lines of the graph paper, not in the spaces

Things are much more clear now!

• $$[1, D_1]$$, with length $$(D_1)-(1)+1=D_1$$: the output’s only affected by the first data section
• $$[D_1+1, D_1+L-1]$$ with length $$(D_1+L-1)-(D_1+1)+1=L-1$$: the output is affected by the first data section and the training sequence
• $$[D_1+L, D_1+TS$$ with length $$(D_1+TS)-(D_1+L)+1=D_1+TS-D_1-L+1=TS-L+1$$: the output is only affected by the training sequence. This is the section we use for a least-squares channel estimate!
• [$$D_1+TS+1, D_1+TS+L-1]$$ with length $$(D_1+TS+L-1) - (D_1+TS+1) + 1= D_1 + TS +L -1 -D_1 -TS -1 +1= L-1$$: the output is affected by the training sequence and the second data section
• $$[D_1+TS+L, D_1+TS+D_2+L-1]$$ with length $$(D_1+TS+D_2+L-1) - (D_1+TS+L) + 1= D_1 +TS + D_2+L - 1 -D_1 -TS -L +1 = D_2$$. This part is only affected by the second data section.

Now let’s sum4 up all those lengths to see if our work checks out: $$(D_1) + (L-1) + (TS-L+1) + (L-1) + (D_2) = D_1 + L -1 +TS -L +1 +L -1 +D_2 = D_1 +D_2 +TS +L -1$$. This is indeed what we get when we convolve a vector with length $$D_1+D_2+TS$$ (the total length of the burst as it’s transmitted) by a vector with length $$L$$ (the channel)!

As usual, if you notice an error in my work, I’d be very grateful if you could point it out to me.

1. GSM’s “stealing bits” act like regular bits for modulation/demodulation, and the tail bit structure is not relevant for channel estimation (it will be relevant when we look at trellises).↩︎

2. or rather, convolving in↩︎

3. We will soon need to care about absolute delays to solve the time synchronization problem. Not the question of how to get synchronized to UTC or TAI, but rather figuring out when exactly we receive each burst. This is critical since for instance, if the time sync is incorrect, the channel estimator could end up being fed modulated unknown data rather than the midamble!↩︎

4. a sum to check our work, call that a check-sum :p↩︎

]]>
probability review problems https://softminus.org/posts/problem-set-0.html 2022-12-18 2022-12-18T00:00:00Z Edgar Lin wrote up some probability review problems for me to work through, and here they are (along with my solutions so far).

## probability review problems #      ## my solutions (so far, i’ve got a few problems left) #            ]]>
GSM receiver blocks: GMSK and Laurent https://softminus.org/posts/gmsk-and-laurent.html 2022-12-08 2022-12-08T00:00:00Z GSM receiver blocks: GMSK and Laurent

Not all modulation schemes have the zero-ISI property that RRC-filtered 1 linear modulations have. Continuous-phase modulations (like GMSK, which we’ll be looking at) generally introduce inter-symbol interference: if your receiver recovers symbols by slicing-and-thresholding the received-and-filtered signal, it will have degraded performance – even if its timing is perfect.

This doesn’t prevent us from making high-performance (approaching optimal) receivers for GMSK. If the transmitter has a direct line-of-sight to the receiver and there’s not much else in the physical environment to allow for alternate paths, the channel won’t have much dispersive effect. This lets us approximate the channel as an non-frequency-selective attenuation followed by additive white Gaussian noise. In this case, you can use the Laurent decomposition of the GMSK amplitude-domain waveform to make a more complex receiver that’s quite close to optimal.

The former case is common in aerospace applications: if an airplane/satellite is transmitting a signal to an airplane/satellite or to a ground station, there usually is a quite good line of sight between the two – with not many radio-reflective objects in between that could create alternate paths. The received signal will look very much like the transmitted signal, only much weaker.

If your transmitter and receiver antennae aren’t in the sky or in space, they’re probably surrounded by objects that can reflect radio waves. In fact, they might not even have any line of sight to each other at all! You can use your cell phone anywhere with service, not just anywhere you have a cellular base station within line of sight.

If you’ve ever spoken loudly in a quiet tunnel/cave/parking garage, you hear echoes – replicas of your voice, except delayed and attenuated. A similar phenomenon occurs when there’s multiple paths the radio waves can take from the transmitter to the receiver. Think of the channel as a tapped delay line: the receiver receives multiple copies of the signal superimposed on each other, with each copy delayed by the corresponding path delay and attenuated by the corresponding path loss.

### in which we ignore doppler #

Imagine an extreme case: sending symbols at $$1$$ symbol per second, and leaving the channel silent for $$1$$ second between each symbol. Let’s say we have four fixed paths with equal attenuation, with delays $$50$$ms, $$100$$ms, $$150$$ms, and $$210$$ms. The difference between the shortest path (the path that will start contributing its effect at the receiver the earliest) and the longest path (the path that takes the longest time to start contributing its effect at the receiver) is known as the “delay spread” and here, it’s $$210-50$$ms$$=160$$ms. Initially, receiver gets something very much non-constant: after each of the paths “get filled up”, they appear at the receiver, but only happens in the first $$160$$ms of the symbol. However, after that $$160$$ms, the channel reach equilibrium, and for the remaining $$1000$$ms$$-160$$ms$$=840$$ms, the receiver receives a constant signal. If the receiver ignores the first $$160$$ms of each symbol, it can ignore the multipath altogether! big $$T$$ is the symbol length, small $$t$$ is the delay spread

Note that the absolute delay of the paths does impacts the latency of the system, but it doesn’t impact how the channel corrupts the signal. You could imagine the same system, except that there’s 3,000,000 kilometers of ideal (doesn’t attenuate or change the signal, just delays it) coaxial cable between the transmitter and the transmit antenna. That’s gonna add 10 seconds2 of delay, but it won’t alter the received signal at all.

This dynamic (symbol time much greater than delay spread) is why analog voice modulation doesn’t need fancy signal processing to cope with multipath. The limit of human hearing is 20 kilohertz, and $$c/(20kHz)=15$$ kilometers, which is pretty big – paths with multiple kilometers of additional distance are gonna be pretty attenuated and won’t be very significant to the receiver3.

The higher the data rate compared to the delay spread, the less you can ignore multipath. Increase the symbol rate to GSM’s $$270$$ kilosymbols per second, and we get $$c/(270kHz)=1$$ kilometer. Paths with hundreds of meters of additional distance aren’t negligible in lots of circumstances!

A high-performance demodulator has to function4 despite this channel-induced ISI. It turns out that the same mechanism that needs to handle the channel-induced ISI (which changes based on the physical arrangement of the scatterers in the environment, and is estimated by the receiver, often using known symbols) can also handle the modulation-induced ISI as well.

## the Laurent decomposition #

The “Gaussian” in “GMSK” isn’t a filter that gets applied to the time-domain samples. Rather, it’s a filter that gets applied in the frequency-domain, and this frequency-domain signal gets used to feed an oscillator – and it’s that oscillator that generates the time-domain baseband signal.

The following 3 diagrams are from the wonderful Chapter 2 of Volume 3 of the JPL DESCANSO Book Series.

The Laurent decomposition tells us that the Gaussian-shaped GMSK frequency-domain pulse, after it gets digested by an oscillator, ends up being equivalent to two time-domain pulses (there are more but they are truly negligible), $$C_0$$ (the big one) and $$C_1$$ (the small one): The two Laurent decomposition pulses for GMSK. Source: Page 74, Chapter 2, Volume 3, JPL DESCANSO Book Series How to use the two Laurent pulses to produce a GMSK waveform. Source: Page 79, Chapter 2, Volume 3, JPL DESCANSO Book Series

The first Laurent pulse is excited by a function of the current data symbol5. So far, so good. A suboptimal receiver can pretend that a GMSK waveform is only made of $$C_0$$ Laurent pulses. If you ignore the $$C_1$$ pulse, this reduces GMSK to MSK. MSK is not a linear modulation, and has nonzero ISI: the amplitude-domain pulse doesn’t have the zero-ISI property that RRC has.

However, if we have a good phase estimate, we can separate the MSK signal into in-phase ($$I$$) and quadrature ($$Q$$) signals. MSK6 has a wonderful property once we’ve decomposed it this way: The “useful channel” alternates between $$I$$ and $$Q$$ for every symbol and contains no ISI, and the “other channel” (which alternates between $$Q$$ and $$I$$) contains all the ISI.

To phrase it another way, on even symbols, the information needed to estimate the symbol is all in $$I$$, and the ISI is all in $$Q$$, and on odd symbols, the information needed to estimate the symbol is all in $$Q$$, and the ISI is all in $$I$$. Looking at $$I$$ and $$Q$$ separately eliminates the ISI, and this lets us make a receiver that looks much like a linear modulation receiver (integrate-and-dumps, comparators, etc) with close to ideal performance.

## the modulator has memory #

Stuff gets more interesting if you don’t ignore the second Laurent pulse. What’s that one excited by? Well, it’s a function of the current bit, the previous bit, and the bit before that! There’s even a little shift register on the bottom left!

Incidentally, that shift register isn’t just theoretical. If you implement a GMSK modulator with precomputed waveforms in a ROM (as opposed to using a Gaussian filter / integrator / NCO), there’s gonna be a shift register that looks much like that, which helps you index the ROM and postprocess the ROM output. I implemented a GMSK modulator in Verilog that uses precomputed waveforms, with the paper “Efficient implementation of an I-Q GMSK modulator” (doi://10.1109/82.481470 by Alfredo Linz and Alan Hendrickson) as a guide. GMSK modulator architecture. Source: Linz1996 (doi://10.1109/82.481470) I define that shift register in my Verilog. Source: a GMSK modulator I wrote a while ago, based on the design in Linz1996 (doi://10.1109/82.481470) Shifting in new data symbols into the shift register. Source: a GMSK modulator I wrote a while ago, based on the design in Linz1996 (doi://10.1109/82.481470)

There’s 16 possible waveforms you need to be able to generate (8 possible values of the shift register; I and Q for each), but the structure of the modulation lets you cut down on ROM required: if you can time-reverse (index the ROM backwards) and/or sign-reverse (flip the sign of the samples coming out of the ROM), you can store just 4 basic curves in the ROM and generate all 16 waveforms that way. The shift register determines which basic curves get used. Source: a GMSK modulator I wrote a while ago, based on the design in Linz1996 (doi://10.1109/82.481470) One bit of the shift register, along with another bit of state (accumulated phase quadrant), determines how we post-process the output of the ROM. Source: a GMSK modulator I wrote a while ago, based on the design in Linz1996 (doi://10.1109/82.481470)

Unlike with RRC, there’s no magic filter that nulls out GMSK’s ISI/memory. Unlike with MSK, separating $$I$$ and $$Q$$ doesn’t neatly separate the data and the ISI.

## detection theory tl;dr #

Every time a demodulator receives a new sample (or receives $$n$$ new samples if there are $$n$$ samples per symbol), it needs to decide what symbol was most likely to generate that sample. If it didn’t do something like that, it wouldn’t be much of a demodulator.

If the modulator has no memory, this task is pretty simple: we look at the sample values each possible symbol would have generated, and we compare each of those gold-standard values against the value we actually received. Which symbol was most likely to have been sent? The symbol whose value is the closest to what was actually received.

How accurate is this? Depends on how many possible symbols there are! Increase the number of possible symbols (“bits per symbol”, “modulation order”), and this decreases the amplitude of noise necessary to sufficiently shift the received sample such that the closest symbol is incorrect.

## the third eye of the demodulator #

If the modulator has memory, this task is more complicated. The signal that the modulator generates for a symbol don’t just depend on the current symbol, but on a certain number of past symbols as well.

If the demodulator wants to extract the most possible information from the received signal, it needs to read the modulator’s mind.

Assume the demodulator has access to a perfect mind-reading channel: we can see into all of the modulator’s state – except for what’s affected by the current symbol. The latter proviso prevents the demodulator’s task from becoming trivial. Via the mind-reading channel, the demodulator knows the last two bits the modulator sent: call them $$b_1$$ and $$b_2$$. There’s a standard assumption that the transmitted signal is a random bitstream, so knowing $$b_1$$ and $$b_2$$ gives the demodulator strictly zero information about $$b_3$$.

The demodulator actually has to estimate $$b_3$$ from the noisy received signal, like usual. However, that task is actually solvable now! We have a local copy of a GMSK modulator, and we generate two candidate signals: one with the sequence $$(b_1, b_2, 0)$$, and one with the sequence $$(b_1, b_2, 1)$$. If what was actually received is closer to the former, we decide a $$0$$ was sent, if the latter is closer, we decide a $$1$$ was sent.

You see where this is going! We estimated a value for $$b_3$$ – call it $${b\_estimated}_3$$ – by comparing the two possible alternatives. Now, when the modulator sends $$b_4$$, we don’t need the mind-reading channel anymore! We already have our best estimate for what $$b_3$$ was, and we can use that $${b\_estimated}_3$$ to find $$b_4$$! Indeed, we use our local GMSK modulator to modulate $$(b_2, {b\_estimated}_3, 0)$$, and $$(b_2, {b\_estimated}_3, 1)$$ and use that to determine what $$b_4$$ likely is.

## third eye closed #

Unfortunately, eschewing the mind-reading channel isn’t free. The clunky $${b\_estimated}_3$$ notation foreshadowed that $${b\_estimated}_3$$ and $$b_3$$ aren’t guaranteed to be equal. $${b\_estimated}_3$$ might be the best possible estimate we can make but it still can be incorrect!

If $${b\_estimated}_3 \neq b_3$$ and we try and guess what $$b_4$$ is by using $$(b_2, {b\_estimated}_3, 0)$$ and $$(b_2, {b\_estimated}_3, 1)$$ as references, we’re in for a world of hurt. The error with $${b\_estimated}_3$$ is forgivable (there’s noise, errors happen), but using an incorrect value of $$b_3$$ to estimate $$b_4$$ propagates that error into $${b\_estimated}_4$$…which will propagate into $${b\_estimated}_5$$, and so on.

We want to average out errors, not propagate them!

If we still had our mind-reading channel, we would know the true value of $$b_3$$ was (of course, only after we commit ourselves to $${b\_estimated}_3$$, otherwise the game is trivial), and use that to estimate $$b_4$$, by using $$(b_2, b_3, 0)$$ and $$(b_2, b_3, 1)$$ for comparison against our received signal.

We’re at a loss here, because mind-reading channels don’t exist, but if we don’t use the mind-reading channel, our uncertain guesses can amplify errors.

## you are the third eye #

It turns out we were almost on the right track. We can turn this error-amplification7 scheme into something truly magical (a sort of magic that actually exists) if we

# avoid making decisions until the last possible moment.

We have to make decisions on uncertain data. However, this doesn’t oblige us to make a decision for $$b_i$$ as soon as it is possible to make a better-than-chance decision for $$b_i$$! If there’s useful data that arrives after we have committed to a decision on $$b_i$$, we’re throwing that data away – at least when it comes to estimating $$b_i$$.

In fact, if we want to do the best job we can, we’ll keep accumulating incoming data until the incoming data tells us nothing about $$b_i$$. Only then will we make a decision for $$b_i$$, since we’ve collected all the relevant data that could possibly be useful for its estimation.

But how do we add up all that information? What metrics get used to compare different possibilities? How will this series of selections estimate the sequence of symbols that most likely entered the modulator? And how do we avoid a combinatorial explosion?

2. ideal coaxial cable has a velocity factor of 1↩︎

3. unless you’re on shortwave/HF, where it is possible to get echoes since the ionosphere sometimes does give rise to paths with drastically different distances and without catastrophic attenuation↩︎

4. The equalization task with OFDM is greatly simplified: orthogonal frequency-domain subcarriers + circular prefixes create a circulant matrix. The receiver does a big FFT, and the properties of the circulant matrix means the effect of a dispersive channel is limited to multiplying the output of each subcarrier by a complex coefficient. That complex coefficient is merely the amplitude/phase response of the channel, measured at that subcarrier’s frequency. In real-world systems you need a way to estimate those complex coefficients for each subcarrier (symbols with known/fixed values are useful for this), a way to adapt them as the channel changes over time, and a way to cope with Doppler.↩︎

5. This figure says “precoded” which means that if you want to get the same result, you need to put a differential encoder in front of the bitstream input; but using this diagram (instead of “Fig. 2-33” in the same chapter) more clearly demonstrates that GSM has a 3-symbol memory.↩︎

6. for $$h=0.5$$ full-response continuous-phase modulations more generally↩︎

7. This scheme actually works fine if most of the energy in the channel/modulator impulse response lives in the earliest coefficient; since the guesses will just…tend to be right most of the time! However, that’s not generally the case, RF channels are rarely this friendly, unless line of sight dominates. You can shorten an unfriendly channel by decomposing its impulse response into an all-pass filter and a minimum-phase filter (whose energy will indeed be front-loaded), but it probably won’t guarantee you a channel that lets you get away with avoiding a trellis altogether…↩︎

]]>