Here’s some interesting papers and webpages that I have hanging around in open browser tabs. Better to have them here than languishing in browser tabs/history/bookmarks!

- two sets of lecture notes on spatial/array processing that look at different criteria (zero forcing vs error minimization) and deterministic vs stochastic (which I think is a synonym for Bayesian) approaches
- “Beamforming: a versatile approach to spatial filtering” by B.D. Van Veen; K.M. Buckley
- The entire set of notes from the NATO “Advanced Radar Systems, Signal and Data Processing” (RTO-EN-SET-086bis) lecture series

“Cyclic Wiener filtering: Theory and Model” by Gardner

- normal filters add up multiple copies of the same signal, but time-offset
- array processing adds up multiple copies of the same signal, but
*space*-offset - FRESH (FREquency SHift) filters add up multiple copies of the same signal, but
*frequency*-offset - this is useful because many signals (like communication/radar RF signals) have redundancy/correlation in their frequency domain (a property called cyclostationarity)

“Noncircularity exploitation in Signal Processing Overview and Application to Radar” by F. Barbaresco, Pascal Chevalier; about widely linear processing/filtering/estimation

- a lot of time it’s justified to assume that complex-valued signals through complex-valued systems behave the same as real valued signals and systems (and using the same sort of filters / estimators you’d use for real-valued everythings)
- pretending that complex signals work just like real signals depends on an assumption called “second-order circularity”
- second-order circularity doesn’t always hold!
- for instance if the signal (prior to passing through the channel) only takes a real value (like -1 or 1, like with a BPSK), then there’s a
*fundamental asymmetry*between the inphase and quadrature channels, and that violates the second-order circularity assumption. - note: a symmetric QAM signal (modulated with random data, as always) is itself not circularly symmetric (add a phase offset and the little square lattice gets tilted) but it
*is*second-order circular - if second-order circularity doesn’t hold and you process the received signal in a way that can’t tease apart the asymmetry then you are leaving signal on the table.
- in the case where the modulated signal is only real-valued (or can be transformed to be only real-valued) that special signal structure morally lets you get a sort of
*processing gain*because you know that any variation in the complex axis is noise/interference/etc: - a linear filter looks like
*y*=*h*⋅*x*(*y*output,*h*coefficients,*x*input), the*widely-linear*model looks like*y*=*g*⋅*x*+*h*⋅*x*^{*}(*y*output,*h*and*g*coefficients,*x*input, and*x*^{*}the complex conjugate of*x*) – so it’s linear in both*x*and its complex conjugate*x*^{*} - as i understand it, this lets the system do stuff like “take only the real part of the signal” (because the noise all lives in the imaginary axis) but in a principled way

“Widely Linear Estimation with Complex Data”, by Bernard Picinbono, Pascal Chevalier, also about widely linear processing

“Receivers with widely linear processing for frequency-selective channels” by H. Gerstacker; R. Schober; A. Lampe: more about widely linear processing

Widely linear filtering isn’t new: “Conjugate linear filtering” by W. Brown; R. Crane is from 1969!

“Enhanced widely linear filtering to make quasi-rectilinear signals almost equivalent to rectilinear ones for SAIC/MAIC” by Pascal Chevalier, Rémi Chauvat, Jean-Pierre Delmas

- we saw earlier that if a signal (as transmitted) has a special form and only lives in the reals (like BPSK or a PAM), this allows for a form of processing gain at the receiver
- even more interestingly, this allows for
*signal separation*/*interference cancellation*(if both the desired and interfering signal are of this form): the receiver can adjust the phase of the received signal until the desired signal lives only on the reals (this is a linear operation), and*trash*the imaginary component of the signal altogether - the real-world realization is more complex since there are two channels (desired signal channel, interferer signal channel) that need to be taken into account, but this actually works: it’s called “single antenna interference cancellation” (SAIC)
- some papers about SAIC:
- “Performance bounds for cochannel interference cancellation within the current GSM standard”
- “A Single Antenna Interference Cancellation Algorithm for Increased GSM Capacity”
- “Single antenna interference cancellation (SAIC) for GSM networks”

- the titles of those papers implies that this is deployed for GSM networks, which notably uses GMSK, which is definitely not BPSK nor a PAM
- however, it turns out we can use this “single antenna interference cancellation” for certain modulations that aren’t BPSK or a PAM, with an additional step: the infamous “derotation”, which converts an MSK into BPSK, and converts GMSK into an almost-BPSK (“almost” because of the second Laurent pulse)
- this paper goes well beyond the standard SAIC; looking into both widely-linear filtering
*and*FRESH filtering, in order to exploit the spectral structure of the signal of interest

two books i found that might be useful later

- Wideband SDR Platform on a Budget, Update # 2, Observations of Starlink Downlink w/ Software Defined Radio by reddit user christianhahn09: amazing SDR built with devboards for:
- TI ADC16DX370 dual-channel, 370 Msps, 16-bit ADC
- ADI LTC5594 wideband I/Q demodulator
- Xilinx Kintex-7 KC705 FPGA

- a Windfreak SynthHD PRO v2, dual channel RF signal generator

the reduceron reconfigured and re-evaluated (paper and slides) and Graph Reduction Hardware Revisited: a microarchitecture that does graph reduction

some stuff about haskell’s STG-machine and execution model:

- “A Haskell Compiler” by David Terei
- SPJ’s Implementing Lazy Functional Languages on Stock Hardware: The Spineless Tagless G-machine
- “Lazy evaluation illustrated for Haskell divers” by Takenobu Tani: “The STG-machine is the marriage of Lambda calculus and Turing machine”

- “EverParse: Verified Secure Zero-Copy Parsers for Authenticated Message Formats”:
- real-world data formats are rife with protocol-meaningful numbers (indices/offsets/counts/lengths/ranges), and therefore context-sensitive
- trying to parse them with hand-written code often leads to parsing/semantic validation/action code being blended together in unprincipled and insecure ways (“shotgun parsers”)
- using parsers generated from language descriptions would improve the situation; except that most parser generators are meant for context-free grammars (stuff that looks like a programming language, not an IP packet or a PDF file)
- EverParse addresses this task for TLVish (tag length value) formats

- The computational power of Parsing Expression Grammars
- Implementation and Optimization of PEG Parsers for Use on FPGAs
- Research Report: The Parsley Data Format Definition Language
- A Verified Packrat Parser Interpreter for Parsing Expression Grammars

- overleaf: in-browser LaTeX editor/typesetter
- 0xabadidea’s backlog post – which inspired me to do this poast
- ask useful questions
- some math tricks, poasted by Terence Tao
- maintaining momentum
- Why and how to write things on the Internet
- Transformers from scratch
- bird SQL
- not knowing

In this GSMish scenario we don’t actually need pinpoint/“fine” timing/phase accuracy, since a good enough Viterbi demodulator effectively “cleans up” remaining timing/phase offset as long as it’s fed with an accurate enough channel estimate (especially if it’s able to *update* its channel estimate).

In a simplistic scenario, if our channel looks like [1,1], it doesn’t matter if the channel estimator outputs [1,1,0,0,0,0,0,0] or [0,1,1,0,0,0,0,0] (here we are using the classic GSM design choice of making our channel estimator handle channels of length 8) or anything up to [0,0,0,0,0,0,1,1] – we get the same results at the end. If we’re misaligned enough to get [0,0,0,0,0,0,0,1] we *are* leaving half the energy in the received signal on the table, so we do want as much energy possible in the actual channel’s impulse response to appear within the channel estimate the demodulator is given.

Of course, with a more realistic case, the actual channel won’t be just two symbols long, this *is* terrestrial radio, not a PCB trace / transmission line nor an airplane-to-satellite radio channel :p

In the case where the physical channel has a length commensurate with the channel length designed in the channel estimator / demodulator, we want to make sure that our least-squares channel estimator gets aimed at the right place in the burst – if it ingests lots of signal affected by unknown data (as opposed to known training sequence data affected by an unknown channel), its output will be kinda garbage.

We’d be at an impasse^{1} if the least squares estimator was our only tool here, but we have a simpler tool that’s more forgiving of misalignments: cross-correlating the received signal against the modulated training sequence. Another way of thinking of this is that we’re running our received signal through a matched filter (with the reference/template signal the modulated training sequence) – it’s literally the same convolution.

Doing this gives us something that looks like this:

Using the Mk I eyeball, it’s pretty clear where the training sequence lives – at the tallest peak.

For implementation in software or gateware, we can encode this logic pretty easily: calculate the correlation, then iterate and look for the biggest peak. However, we notice that there’s a bunch of spurious peaks all around, and it’d be quite bad if we accidentally matched on a spurious peak: the channel estimate would be garbage, and the output of the demodulator would be beyond useless, since it wouldn’t even be starting off at the right spot in the signal.

We can avoid this failure case by running the correlation on a smaller window, which reduces the chances of hitting a false correlation peak. We determine the position of the smaller window using our prior knowledge of the transmitted signal structure – where the training sequence lives relative to the start of the signal – and an estimator to determine when the start of the signal happens.

It’s pretty easy to determine when the start of the signal happens: square/sum each incoming I/Q pair to get a magnitude, and keep a little window of those magnitudes and when their sum exceeds a threshold, well, that’s when the signal started.

We use this to narrow down the possible locations for the training sequence in the received signal. However, we still should run the correlation since this energy-detection start-of-signal estimator has more variance than the correlation-based timing offset estimator.

Incidentally, the GSM training sequences (and lots of training sequences in other well-designed wireless communications systems) have interesting properties:

- their power spectra are approximately flat
- their autocorrelation have a tall narrow peak that approximates an impulse, and has much less energy elsewhere

The former is a desired property since we want to evenly probe the frequency response of the bandpass channel. Spreading the training sequence’s power unevenly (lots of power in one part of the passband and much less in another part of the passband) causes a worse signal-to-noise^{2} ratio in the parts of the passband with less training sequence power. It’s a zero-sum affair since the transmitter has finite transmit power.

The autocorrelation property not only lets us use these training sequences for time synchronization, but it lets us use correlation as a rough channel impulse response estimate. If we’re satisfied with a very suboptimal receiver, we can just use the correlation as our channel estimate. However, least-squares generally will give us a more accurate channel impulse response, since the autocorrelation of the training sequence is not 1 at zero lag and 0 elsewhere – there’s little sidelobes:

If you don’t have a good intuition for what a narrow autocorrelation does here, you can develop one by going to a loading dock or a construction site and paying attention when big trucks or earthmoving equipment back up. See, those big rigs are required to have a back-up beeper to warn bystanders that the driver is backing up and can’t see well what’s behind the vehicle.

There’s two common types of back-up beeper, and unfortunately the more common kind outputs a series of beeps of a single pure tone (without changing frequency between beeps). If you close your eyes and only use that sound to determine where the truck is, you’ll find it’s quite a difficult task: it seems like the sound is coming from *everywhere*! The brain has a variety of mechanisms to localize sources of sound, and besides the ultra-basic “find which ear is receiving the loudest signal” method many of them kinda boil down to doing cross-correlations of variously-delayed versions of the left ear’s signal against variously-delayed versions of the right ear’s signal, and looking for correlation peaks. Seems familiar!

Unfortunately, the pure sine tone is the *worst* possible signal for this, since there’ll be tons of correlation peaks (each oscillation of the sine wave is identical to its precursor and successor), and if there’s audio-reflective surfaces around you and the truck, there’ll be tons of echoes too. Ambiguities galore! More spurs than a cowboy convention!

Ironically, **the most useful** (for angle-of-arrival localization) **part of the pure-tone truck beeper’s signal is the moment the beep starts**^{3}, since the precursor is *zero* – the rest of the beep is comparatively useless for localization (an estimation task) but extremely useful for knowing that there’s indeed a truck **somewhere in the neighborhood** backing up (a detection task). The start and end of the beep are the most spectrally rich part of the beeper’s output, and this is indeed what we expect.

The pure sine wave is the *easiest* possible signal to detect (with our friend the matched filter), but the *worst* possible signal for localization; and this irony is why you can hear truck back-up beepers from *uselessly* far away but can’t easily tell which truck is backing up.

Fortunately, there’s truck back-up beepers that output sounds far more amenable to localization: little bursts of white noise. If you haven’t heard those, you can find a youtube video of those in action, play it on your computer, and try and localize your *computer’s speakers* with your eyes closed.

You’ll notice that this is basically the optimal signal if you want to do angle-of-arrival estimation with delays and correlations – there’s only one correlation peak, and it’s exactly where you want it. It’s also extremely spectrally rich, and it has to be, since spectrally poor signals have worse autocorrelation properties. It also has the advantage of “blending in” with other noise: on-and-off bursts of white noise get “covered up” by white noise (and become indistinguishable from white noise) very quickly, a pure tone is much more difficult to cover up with white noise.

This is what a good training sequence looks like: simple correlation gets you a passable estimate for the channel impulse response along with the timing offset, since the autocorrelation approximates an impulse. Also, the spectral richness ensures that all the frequency response of the bandpass channel is probed.

I don’t think there’s too much useful we can do with the coarse correlation-based channel estimation to enable a more accurate channel estimation with more advanced (least-squares) methods – I had imagined looking at the coarse correlation-based channel estimate and looking for a window with the most energy and then doing a least-squares channel estimate only on that window, but I don’t think that actually has realistic benefits.

However, that idea (focusing on where energy is concentrated in the channel impulse response) *does* point to a more fructuous^{4} game we can play with channel impulse response: transforming the channel to *squash* the channel’s energy as much as possible into the earlier channel coefficients, and this is called “channel shortening”. Channel shortening is interesting because rather than having to delay decisions until the last possible moment, we can commit to decisions earlier, which reduces the computational burden (and area/power requirements) on a Viterbi-style demodulator pretty significantly.

If the impulse of the channel is highly front-loaded into, say, the first 3 symbols, we force a decision after only 3 symbol periods, since the likelihood of something *after that* making us change our mind is very unlikely. We still keep track of the *effect* of our decisions for as long as the channel lasts, since otherwise we’ll be introducing actual error (even if we make all the right decisions) that’d be pretty harmless to avoid: once we made the decisions, figuring out their effect is as simple as feeding them through a channel-length FIR filter.

maybe not, i am unsure if looking at the least square residuals would be enough to determine lack of time synchronization↩︎

which I am assuming to be distributed evenly across the passband↩︎

the moment the beep ends is theoretically the same but your ears are more desensed than when the beep

*starts*↩︎I’ve

*always*wanted to use that word (or rather, its French cognate “fructueux”) in writing.↩︎

In my post on least-squares channel estimation, I had done some reasoning about which received samples can be safely (they’re not affected by unknown data) used for a least-squares channel estimation:

The simple way to cope with this is to refuse to touch the first

L− 1 samples, and run our channel impulse response estimate over theM−L+ 1 samples after those. In GSM, this still gives us good performance, since forM= 26,L= 8 we have 19 samples to estimate 8 channel coefficients. Note that we also can’t use the trailing (in the scan, the last 4 rows) received symbols, since thosealsoare affected by unknown data.

Now, our convolution matrix has dimensions

M−L+ 1 byL, which makes sense, the only “trustworthy” (unaffected by unknown data) symbols areM−L+ 1 long, and we are convolving by a channel of lengthL.

Figuring out the exact offset for

`interference_rx_downsampled`

has been a bit tricky, and I haven’t yet dived into writing the right correlation to estimate the exact timing offset required.

From playing around some more in MATLAB with my source code, I realized I still don’t have a strong understanding of the exact offsets/indices/lengths at play here.

Rather than stare at algebraic expressions, we will draw pictures that speak to the physical meaning of the problem to help us reach expressions we actually *understand*.

We’ll take a generic GSM-like^{1} transmitted burst that is composed of *D*_{1} data bits, followed by a *midamble* of *T**S* training symbol bits, and *D*_{2} data bits.

Here’s what the burst looks like. I’ve written down the indices (starting at 1) for the first and last bit in each section.

We note that all the lengths are correct:

- First data section is from 1 to
*D*_{1}so its length is*D*_{1}− 1 + 1 =*D*_{1} - Midamble is from
*D*_{1}+ 1 to*D*_{1}+*T**S*so its length is*D*_{1}+*T**S*− (*D*_{1}+1) + 1 =*T**S*− 1 + 1 =*T**S* - Second data section is from
*D*_{1}+*T**S*+ 1 to*D*_{1}+*T**S*+*D*_{2}so its length is*D*_{1}+*T**S*+*D*_{2}− (*D*_{1}+*T**S*+1) + 1 =*D*_{1}−*D*_{1}+*T**S*−*T**S*+*D*_{2}− 1 + 1 =*D*_{2}. - Total burst is from 1 to
*D*_{1}+*T**S*+*D*_{2}so its length is*D*_{1}+*T**S*+*D*_{2}− 1 + 1 =*D*_{1}+*T**S*+*D*_{2}.

It’s clear how we can isolate any particular section of this burst before it has passed through a dispersive channel.

- [1,5] means “1 to 5, inclusive of the bounds (”closed”) on both sides”, and represents 1, 2, 3, 4, 5
- (1,5) means “1 to 5, non-inclusive of the bounds (”open”) on both sides”, and represents 2, 3, 4
- We also can have left-closed right-open: [1, 5) is inclusive of the 1 but not of the 5 so we have: 1, 2, 3, 4
- And likewise with left-open right-closed: (1, 5] represents 2, 3, 4, 5

As the subtitle in the header insinuates, a dispersive channel is represented by a convolution. The structure of convolution tells us that each transmitted sample will affect multiple received samples, and the channel vector’s finite length tells us it’s not gonna be all of them.

We note that a single sample will be “smeared out” by a channel of length *L* onto a span that’s *L* long:

As for the indices, if this sample lives at index *n*, the index of this little “span of influence” will be [*n*,*n*+*L*−1]. Why these indices?

- the starting index: We currently don’t care
^{3}about*absolute delays*, just what happens*inside the delay spread*. Remember the “ideal coaxial cable” thought experiment from our last post: the problem remains identical no matter how much ideal coaxial cable lives between our receiver antenna and our receiver frontend. We can therefore say that the input sample at index*n*gets transmogrified by an “identity channel” (impulse response of [1], it doesn’t change the signal at all) to be an output sample at index*n*– no need to add any offset.

This means that the

**first**output sample to be affected by our input sample will be at index*n*, which justifies the left-closed (includes its boundary): [*n*,the ending index: If the “span of influence” is

*L*long, the last sample that is affected by our input sample will be at index*n*+*L*− 1. This justifies the right-closed (includes the boundary): ,*n*+*L*− 1]

Going back to our “single element convolution” example, if the *x* input sample lives at index 10, the first nonzero output sample lives at index 10 by fiat. We observe nonzero output samples at 11, 12, 13, 14 as well. Output sample 15 and beyond are zero, as are samples 9 and lower. This means that we have nonzero output at [10,14], and if we let *n* = 10 and *L* = 5 we get [10,10+5−1] = [10,14], which matches up with what we see.

There is a definite structure to the transmitted burst: known data (the training sequence) sandwiched by unknown data. In realistic systems, the designers will select a training sequence length longer than any reasonable channel they expect to contend with, and so we expect:

- some received samples will be a function only of unknown data
- some received samples will be a function of unknown data and training sequence bits
- some received samples will be a function only of training sequence bits

To figure out which received samples are which, let’s draw out what happens when our burst gets convolved with a channel of length *L*. Each transmitted symbol will get “smeared out” onto an *L*-long span, and we focus on the symbols at the boundaries of each section.

The center line represents what the receiver hears, and for clarity, we draw the unknown data sections above the center line and the training sequence below the center line.

Things are much more clear now!

- [1,
*D*_{1}], with length (*D*_{1}) − (1) + 1 =*D*_{1}: the output’s only affected by the first data section - [
*D*_{1}+1,*D*_{1}+*L*−1] with length (*D*_{1}+*L*−1) − (*D*_{1}+1) + 1 =*L*− 1: the output is affected by the first data section*and*the training sequence - [
*D*_{1}+*L*,*D*_{1}+*T**S*with length (*D*_{1}+*T**S*) − (*D*_{1}+*L*) + 1 =*D*_{1}+*T**S*−*D*_{1}−*L*+ 1 =*T**S*−*L*+ 1: the output is only affected by the training sequence.**This is the section we use for a least-squares channel estimate!** - [
*D*_{1}+*T**S*+ 1,*D*_{1}+*T**S*+*L*− 1] with length (*D*_{1}+*T**S*+*L*−1) − (*D*_{1}+*T**S*+1) + 1 =*D*_{1}+*T**S*+*L*− 1 −*D*_{1}−*T**S*− 1 + 1 =*L*− 1: the output is affected by the training sequence*and*the second data section - [
*D*_{1}+*T**S*+*L*,*D*_{1}+*T**S*+*D*_{2}+*L*−1] with length (*D*_{1}+*T**S*+*D*_{2}+*L*−1) − (*D*_{1}+*T**S*+*L*) + 1 =*D*_{1}+*T**S*+*D*_{2}+*L*− 1 −*D*_{1}−*T**S*−*L*+ 1 =*D*_{2}. This part is only affected by the second data section.

Now let’s sum^{4} up all those lengths to see if our work checks out: (*D*_{1}) + (*L*−1) + (*T**S*−*L*+1) + (*L*−1) + (*D*_{2}) = *D*_{1} + *L* − 1 + *T**S* − *L* + 1 + *L* − 1 + *D*_{2} = *D*_{1} + *D*_{2} + *T**S* + *L* − 1. This is indeed what we get when we convolve a vector with length *D*_{1} + *D*_{2} + *T**S* (the total length of the burst as it’s transmitted) by a vector with length *L* (the channel)!

As usual, if you notice an error in my work, I’d be very grateful if you could point it out to me.

GSM’s “stealing bits” act like regular bits for modulation/demodulation, and the tail bit structure is not relevant for channel estimation (it will be relevant when we look at trellises).↩︎

or rather, convolving in↩︎

We will soon need to care about absolute delays to solve the

*time synchronization*problem. Not the question of how to get synchronized to UTC or TAI, but rather figuring out when exactly we receive each burst. This is critical since for instance, if the time sync is incorrect, the channel estimator could end up being fed*modulated unknown data*rather than the midamble!↩︎a sum to check our work, call that a check-sum :p↩︎

]]>

Not all modulation schemes have the zero-ISI property that RRC-filtered ^{1} linear modulations have. Continuous-phase modulations (like GMSK, which we’ll be looking at) generally introduce inter-symbol interference: if your receiver recovers symbols by slicing-and-thresholding the received-and-filtered signal, it will have degraded performance – even if its timing is perfect.

This doesn’t prevent us from making high-performance (approaching optimal) receivers for GMSK. If the transmitter has a direct line-of-sight to the receiver and there’s not much else in the physical environment to allow for alternate paths, the channel won’t have much dispersive effect. This lets us approximate the channel as an non-frequency-selective attenuation followed by additive white Gaussian noise. In this case, you can use the Laurent decomposition of the GMSK amplitude-domain waveform to make a more complex receiver that’s quite close to optimal.

The former case is common in aerospace applications: if an airplane/satellite is transmitting a signal to an airplane/satellite or to a ground station, there usually is a quite good line of sight between the two – with not many radio-reflective objects in between that could create alternate paths. The received signal will look very much like the transmitted signal, only much weaker.

If your transmitter and receiver antennae aren’t in the sky or in space, they’re probably surrounded by objects that can reflect radio waves. In fact, they might not even have *any* line of sight to each other at all! You can use your cell phone anywhere with service, not just anywhere you have a cellular base station within line of sight.

If you’ve ever spoken loudly in a quiet tunnel/cave/parking garage, you hear echoes – replicas of your voice, except delayed and attenuated. A similar phenomenon occurs when there’s multiple paths the radio waves can take from the transmitter to the receiver. Think of the channel as a tapped delay line: the receiver receives multiple copies of the signal superimposed on each other, with each copy delayed by the corresponding path delay and attenuated by the corresponding path loss.

Imagine an extreme case: sending symbols at 1 symbol per second, and leaving the channel silent for 1 second between each symbol. Let’s say we have four fixed paths with equal attenuation, with delays 50ms, 100ms, 150ms, and 210ms. The difference between the shortest path (the path that will start contributing its effect at the receiver the earliest) and the longest path (the path that takes the longest time to start contributing its effect at the receiver) is known as the “delay spread” and here, it’s 210 − 50ms = 160ms. Initially, receiver gets something very much non-constant: after each of the paths “get filled up”, they appear at the receiver, but only happens in the first 160ms of the symbol. However, after that 160ms, the channel reach equilibrium, and for the remaining 1000ms − 160ms = 840ms, the receiver receives a constant signal. If the receiver ignores the first 160ms of each symbol, it can ignore the multipath altogether!

Note that the *absolute delay* of the paths does impacts the latency of the system, but it doesn’t impact how the channel corrupts the signal. You could imagine the same system, except that there’s 3,000,000 kilometers of ideal (doesn’t attenuate or change the signal, just delays it) coaxial cable between the transmitter and the transmit antenna. That’s gonna add 10 seconds^{2} of delay, but it won’t alter the received signal at all.

This dynamic (symbol time much greater than delay spread) is why analog voice modulation doesn’t need fancy signal processing to cope with multipath. The limit of human hearing is 20 kilohertz, and *c*/(20*k**H**z*) = 15 kilometers, which is pretty big – paths with multiple kilometers of additional distance are gonna be pretty attenuated and won’t be very significant to the receiver^{3}.

The higher the data rate compared to the delay spread, the less you can ignore multipath. Increase the symbol rate to GSM’s 270 kilosymbols per second, and we get *c*/(270*k**H**z*) = 1 kilometer. Paths with hundreds of meters of additional distance aren’t negligible in lots of circumstances!

A high-performance demodulator has to function^{4} despite this channel-induced ISI. It turns out that the same mechanism that needs to handle the channel-induced ISI (which changes based on the physical arrangement of the scatterers in the environment, and is estimated by the receiver, often using known symbols) can also handle the modulation-induced ISI as well.

The “Gaussian” in “GMSK” isn’t a filter that gets applied to the *time-domain* samples. Rather, it’s a filter that gets applied in the *frequency-domain*, and this frequency-domain signal gets used to feed an oscillator – and it’s that oscillator that generates the time-domain baseband signal.

The following 3 diagrams are from the wonderful Chapter 2 of Volume 3 of the JPL DESCANSO Book Series.

The Laurent decomposition tells us that the Gaussian-shaped GMSK frequency-domain pulse, after it gets digested by an oscillator, ends up being equivalent to two time-domain pulses (there are more but they are truly negligible), *C*_{0} (the big one) and *C*_{1} (the small one):

The first Laurent pulse is excited by a function of the current data symbol^{5}. So far, so good. A suboptimal receiver can pretend that a GMSK waveform is only made of *C*_{0} Laurent pulses. If you ignore the *C*_{1} pulse, this reduces GMSK to MSK. **MSK is not a linear modulation, and has nonzero ISI**: the amplitude-domain pulse doesn’t have the zero-ISI property that RRC has.

However, if we have a good phase estimate, we can separate the MSK signal into in-phase (*I*) and quadrature (*Q*) signals. MSK^{6} has a wonderful property once we’ve decomposed it this way: The “useful channel” alternates between *I* and *Q* for every symbol and contains no ISI, and the “other channel” (which alternates between *Q* and *I*) contains all the ISI.

To phrase it another way, on even symbols, the information needed to estimate the symbol is all in *I*, and the ISI is all in *Q*, and on odd symbols, the information needed to estimate the symbol is all in *Q*, and the ISI is all in *I*. Looking at *I* and *Q* separately eliminates the ISI, and this lets us make a receiver that looks much like a linear modulation receiver (integrate-and-dumps, comparators, etc) with close to ideal performance.

Stuff gets more interesting if you don’t ignore the second Laurent pulse. What’s that one excited by? Well, it’s a function of the current bit, the previous bit, **and the bit before that**! There’s even a little shift register on the bottom left!

Incidentally, that shift register isn’t just theoretical. If you implement a GMSK modulator with precomputed waveforms in a ROM (as opposed to using a Gaussian filter / integrator / NCO), there’s gonna be a shift register that looks much like that, which helps you index the ROM and postprocess the ROM output. I implemented a GMSK modulator in Verilog that uses precomputed waveforms, with the paper “Efficient implementation of an I-Q GMSK modulator” (doi://10.1109/82.481470 by Alfredo Linz and Alan Hendrickson) as a guide.

There’s 16 possible waveforms you need to be able to generate (8 possible values of the shift register; I and Q for each), but the structure of the modulation lets you cut down on ROM required: if you can time-reverse (index the ROM backwards) and/or sign-reverse (flip the sign of the samples coming *out* of the ROM), you can store just 4 basic curves in the ROM and generate all 16 waveforms that way.

Unlike with RRC, there’s no magic filter that nulls out GMSK’s ISI/memory. Unlike with MSK, separating *I* and *Q* doesn’t neatly separate the data and the ISI.

Every time a demodulator receives a new sample (or receives *n* new samples if there are *n* samples per symbol), it needs to decide what symbol was most likely to generate that sample. If it didn’t do something like that, it wouldn’t be much of a demodulator.

If the modulator has no memory, this task is pretty simple: we look at the sample values **each possible** symbol would have generated, and we compare each of those gold-standard values against the value we *actually received*. Which symbol was most likely to have been sent? The symbol whose value is the closest to what was actually received.

How accurate is this? Depends on how many possible symbols there are! Increase the number of possible symbols (“bits per symbol”, “modulation order”), and this decreases the amplitude of noise necessary to sufficiently shift the received sample such that the closest symbol is incorrect.

If the modulator has memory, this task is more complicated. The signal that the modulator generates for a symbol don’t just depend on the current symbol, but on a certain number of past symbols as well.

If the demodulator wants to extract the most possible information from the received signal, it needs to read the modulator’s mind.

Assume the demodulator has access to a perfect mind-reading channel: we can see into all of the modulator’s state – except for what’s affected by the current symbol. The latter proviso prevents the demodulator’s task from becoming trivial. Via the mind-reading channel, the demodulator knows the last two bits the modulator sent: call them *b*_{1} and *b*_{2}. There’s a standard assumption that the transmitted signal is a random bitstream, so knowing *b*_{1} and *b*_{2} gives the demodulator strictly zero information about *b*_{3}.

The demodulator actually has to estimate *b*_{3} from the noisy received signal, like usual. However, that task is actually solvable now! We have a local copy of a GMSK modulator, and we generate two candidate signals: one with the sequence (*b*_{1},*b*_{2},0), and one with the sequence (*b*_{1},*b*_{2},1). If what was actually received is closer to the former, we decide a 0 was sent, if the latter is closer, we decide a 1 was sent.

You see where this is going! We estimated a value for *b*_{3} – call it *b*_*e**s**t**i**m**a**t**e**d*_{3} – by comparing the two possible alternatives. Now, when the modulator sends *b*_{4}, we don’t need the mind-reading channel anymore! **We already have our best estimate for what b_{3} was, and we can use that b_estimated_{3} to find b_{4}!** Indeed, we use our local GMSK modulator to modulate (

Unfortunately, eschewing the mind-reading channel isn’t free. The clunky *b*_*e**s**t**i**m**a**t**e**d*_{3} notation foreshadowed that *b*_*e**s**t**i**m**a**t**e**d*_{3} and *b*_{3} aren’t guaranteed to be equal. *b*_*e**s**t**i**m**a**t**e**d*_{3} might be the best possible estimate we can make but it still can be incorrect!

If *b*_*e**s**t**i**m**a**t**e**d*_{3} ≠ *b*_{3} and we try and guess what *b*_{4} is by using (*b*_{2},*b*_*e**s**t**i**m**a**t**e**d*_{3},0) and (*b*_{2},*b*_*e**s**t**i**m**a**t**e**d*_{3},1) as references, we’re in for a world of hurt. The error with *b*_*e**s**t**i**m**a**t**e**d*_{3} is forgivable (there’s noise, errors happen), but using an incorrect value of *b*_{3} to estimate *b*_{4} **propagates that error into b_estimated_{4}**…which will propagate into

We want to *average* out errors, not *propagate* them!

If we still had our mind-reading channel, we would know the true value of *b*_{3} was (of course, only after we commit ourselves to *b*_*e**s**t**i**m**a**t**e**d*_{3}, otherwise the game is trivial), and use that to estimate *b*_{4}, by using (*b*_{2},*b*_{3},0) and (*b*_{2},*b*_{3},1) for comparison against our received signal.

We’re at a loss here, because mind-reading channels don’t exist, but if we don’t use the mind-reading channel, our uncertain guesses can amplify errors.

It turns out we were almost on the right track. We can turn this error-amplification^{7} scheme into something truly magical (a sort of magic that actually exists) if we

We have to make decisions on uncertain data. However, this doesn’t oblige us to make a decision for *b*_{i} *as soon as it is possible to make a better-than-chance decision* for *b*_{i}! If there’s useful data that arrives *after* we have committed to a decision on *b*_{i}, we’re throwing that data away – at least when it comes to estimating *b*_{i}.

In fact, if we want to do the best job we can, we’ll keep accumulating incoming data until the incoming data tells us *nothing* about *b*_{i}. Only then will we make a decision for *b*_{i}, since we’ve collected all the relevant data that could possibly be useful for its estimation.

But how do we add up all that information? What metrics get used to compare different possibilities? How will this series of selections estimate the sequence of symbols that most likely entered the modulator? And how do we avoid a combinatorial explosion?

- “Efficient implementation of an I-Q GMSK modulator” (doi://10.1109/82.481470 by Alfredo Linz and Alan Hendrickson)
- “Comparison of Demodulation Techniques for MSK” by Uwe Lambrette, Ralf Mehlan, and Heinrich Meyr
- “GMSK Demodulator Implementation for ESA Deep-Space Missions”, by Gunther M. A. Sessler; Ricard Abello; Nick James; Roberto Madde; Enrico Vassallo
- Chapter 2 of Volume 3 (“Bandwidth-Efficient Digital Modulation with Application to Deep-Space Communications”) of the JPL DESCANSO Book Series, by Marvin K. Simon

(with an RRC receive filter at the receiver)↩︎

ideal coaxial cable has a velocity factor of 1↩︎

unless you’re on shortwave/HF, where it

*is*possible to get echoes since the ionosphere sometimes*does*give rise to paths with drastically different distances and without catastrophic attenuation↩︎The equalization task with OFDM is greatly simplified: orthogonal frequency-domain subcarriers + circular prefixes create a circulant matrix. The receiver does a big FFT, and the properties of the circulant matrix means the effect of a dispersive channel is limited to multiplying the output of each subcarrier by a complex coefficient. That complex coefficient is merely the amplitude/phase response of the channel, measured at that subcarrier’s frequency. In real-world systems you need a way to estimate those complex coefficients for each subcarrier (symbols with known/fixed values are useful for this), a way to adapt them as the channel changes over time, and a way to cope with Doppler.↩︎

This figure says “precoded” which means that if you want to get the same result, you need to put a differential encoder in front of the bitstream input; but using this diagram (instead of “Fig. 2-33” in the same chapter) more clearly demonstrates that GSM has a 3-symbol memory.↩︎

for

*h*= 0.5 full-response continuous-phase modulations more generally↩︎This scheme actually works fine if most of the energy in the channel/modulator impulse response lives in the earliest coefficient; since the guesses will just…tend to be right most of the time! However, that’s not generally the case, RF channels are rarely this friendly, unless line of sight dominates. You can shorten an unfriendly channel by decomposing its impulse response into an all-pass filter and a minimum-phase filter (whose energy will indeed be front-loaded), but it probably won’t guarantee you a channel that lets you get away with avoiding a trellis altogether…↩︎

I had a pretty good math tutoring session today; we looked at the least squares channel estimation problem and got it working for a simple hardcoded channel, without a transmit filter, and without upconversion/downconversion. I learned how to make a Toeplitz matrix that represents the convolution (modulated training sequence convolved against the dispersive channel) I want.

It’s a bit counter-intuitive – I’m used to thinking in terms of a vector signal being convolved by a matrix channel – but convolution commutes, so I put the signal I know (the known preamble / training sequence) into the matrix, and then do the least squares estimation by taking the pseudoinverse of that. The dimensions and contents of this matrix were unintuitive to me at first, but going over it with my tutor helped clarify it, and I’m going to explain it below.

Our channel is *L* (8 for GSM, which I’m using as a guide here) symbol intervals long, and the full training sequence is *M* (26 for GSM) symbol intervals. If there is enough radio silence before and after the training sequence is sent, we will be convolving an *M*-long modulated symbol vector with an *L*-long channel impulse response vector – in order to get an *M* + *L* − 1-long received signal vector. If we’re putting the transmitted symbols inside the convolution matrix, that matrix would have to have *M* + *L* − 1 rows and *L* columns to make the dimensions match; multiply it by the channel impulse response (*L*-long) and get *M* + *L* − 1 received samples out.

Here’s a visualisation of how the matrix multiplication works out (I let *L* = 4 and *M* = 9 to save me writing, and I’m not going to figure out how to typeset LaTeX equations with pandoc/hakyll tonight). Yeah, I forgot the *c*_{1}, *c*_{2}, *c*_{3}, *c*_{4} channel coefficients when I was writing out the dot product results :p

The received signal at any given time is the dot product of the CIR with the corresponding row of the matrix. Before we begin transmitting, the received signal is zero. The first three sampling instants are a bit odd – we start receiving signal, but not all the channel impulse response is in play.

Only at the fourth sampling instant do we have all four channel impulse response points in play. It took *L* − 1 = 3 timesteps of the signal being transmitted for the channel to be “filled up”. Once the channel is “filled up”, we have *M* − *L* + 1 = 6 instants we can use (and then the training sequence ends).

Now, in GSM, there isn’t radio silence before the training sequence is sent. The training sequence lives in the middle^{1} of each burst, so there are data bits before and after it. The placid zeros in the convolution aren’t placid anymore – they’re data symbols, and if we need to estimate the channel before^{2} we can figure out the transmitted symbols, they’re *unknown* data symbols at that!

The simple way to cope with this is to refuse to touch the first *L* − 1 samples, and run our channel impulse response estimate over the *M* − *L* + 1 samples after those. In GSM, this still gives us good performance, since for *M* = 26, *L* = 8 we have 19 samples to estimate 8 channel coefficients. Note that we also can’t use the trailing (in the scan, the last 4 rows) received symbols, since those *also* are affected by unknown data.

Now, our convolution matrix has dimensions *M* − *L* + 1 by *L*, which makes sense, the only “trustworthy” (unaffected by unknown data) symbols are *M* − *L* + 1 long, and we are convolving by a channel of length *L*.

In MATLAB, we can make the appropriate Toeplitz matrix with `T = toeplitz(modulated_training_sequence(8:26), flip(modulated_training_sequence(1:8)));`

, and we use it to estimate the channel by doing `estimated_chan = lsqminnorm(T, interference_rx_downsampled(64+8:64+18+8))`

Figuring out the exact offset for `interference_rx_downsampled`

has been a bit tricky, and I haven’t yet dived into writing the right correlation to estimate the exact timing offset required.

Also, using actual channels (with MATLAB’s `stdchan`

) has led to results that don’t seem right visually, and I’m unsure why that’s the case. My suspicion is that when I call MATLAB’s `resample`

, something weird happens to the signal, since `resample`

has a filter of its own, and that’s going to be incorporated into the channel impulse response as well in a way that doesn’t make sense visually. Moving forward, I will try and figure out a way to measure the accuracy of the channel impulse response by looking at the sent and received signals, rather than by eyeballing it.

The channel changes over time, so if you estimate it just once and don’t do any adaptation, it’s better to have it be in the middle than at either end.↩︎

It’s possible to estimate the channel just from the known training sequence, run the Viterbi demodulator to estimate the symbols, then use those estimated symbols to run a

*second*, more accurate (you have more data points!) channel estimation and use that improved channel estimate to run a*second*Viterbi demodulation. If you want to get extra-fancy you can make all those components ingest / produce soft values and get the channel decoder in the loop as well…↩︎

Due to personal circumstances, I’m currently unable to devote adequate time for this project. I’ll leave this post up.

PLONKish arithmetizations look like^{1} FPGAs made out of polynomials. We have a rectangular array that ingests signals (elements of a finite field) from the outside world, highly-parametrizable computations that get evaluated on subsections of the array, and routing wires (also carrying elements of a finite field) connecting different parts of the array – and cost factors for all of these.

I am going to learn about arithmetic circuits, non-interactive zero-knowledge proving systems, Boolean logic synthesis, and place-and-route methods by trying to create a flow that ingests an arithmetic circuit (in some suitable intermediate representation) and outputs an optimized, placed, and routed version of the circuit, suitable for feeding into a PLONKish proof system like Halo2.

This first post is intended as a high-level summary of FPGA logic synthesis and of non-interactive zero-knowledge proof systems.

Logic-synthesis and place-and-route algorithms help FPGA designers convert high[ish] level descriptions of hardware – boolean circuits – into a configuration bitstream that gets shifted into an FPGA’s configure bits. A lot of these optimization problems live in NP, but they are realistically feasible thanks to industry and academic research that’s given the field a lot of clever data structures and algorithms that leverage appropriate local heuristics for the task in question. Lots of CPU time helps too.

A generic FPGA is composed of:

- Input/output cells
- Routing: a fabric of programmable wires / switch boxes between the logic blocks
- Logic blocks, which usually come in multiple flavors and are tesselated in various patterns on the FPGA’s fabric. For instance, a generic FPGA might have:
- 100,000 4-bit LUTs, each of which is a little box that can implement
*any arbitrary*4-input 1-output boolean function (and is therefore parametrized with 16 configuration bits) - 100 32-bit multiplier-accumulator blocks (usually
^{2}referred to as Digital Signal Processing (DSP) blocks), each of which has a multiplier / accumulator / preadder / etc, to implement rapid convolutions or filters or FFTs. Likewise,*each*DSP block has a bunch of configuration bits associated with it that will determine how that specific DSP block acts, and it’s the flow’s responsibility to decide what those configuration bits will be.

- 100,000 4-bit LUTs, each of which is a little box that can implement

A generic FPGA flow behaves somewhat like this:

- Ingest the hardware description language and convert it to an IR
- Simplify the IR
- Cut the now-simplified IR into appropriate pieces, and map each of the pieces to logic block types
- What’s an appropriate piece?
- Ideally, each cut corresponds nicely to a specific
*flavor*of logic block that lives on the target FPGA. - For instance, if your FPGA has lots of 4-bit input LUTs, we’re gonna try and cut up our combinatorial logic into 4-bit input Boolean functions.
- Hard multiplier-accumulate blocks are more efficient than implementing those functions out of LUTs, so there’s code that looks for any section of the IR that can be substituted with appropriate parametrization of a hard DSP block.
- This doesn’t always work perfectly so there’s available directives in the HDL to explicitly specify that you want a specific type of DSP block with a specific parametrization, rather than using the generic multiply/add operators in the HDL.

- Now that we have turned the circuit into a connected graph of logic blocks (along with specific parametrization bits for each specific logic block), we assign a physical location (on the FPGA die) to each logic block in question
- Find a configuration for the routing fabric that connects all the logic blocks properly (and connects the input/output cells to the appropriate logic blocks)
- Move the logic blocks around on the fabric until the delays in the routing fabric are minimized

The mathematical inner workings of non-interactive zero-knowledge proofs are well beyond me at the moment, but I’ll try to explain why they’re useful, describe high-level moving parts, and explain how arithmetic circuits are involved.

It’s easy to prove that a piece of data corresponds to a hash value. You look up the specification of the hash, implement it, and feed the data in. For instance, the SHA1 of “eggs” (without a newline at the end) is `bd111dcb4b343de4ec0a79d2d5ec55a3919c79c4`

.

You can make a committment scheme out of this; by getting some salt (`dd if=/dev/urandom bs=1 count=8|xxd -p`

), pouring the salt on the eggs, and hashing it. SHA1(“`eggs c13aa680`

”) = `0f0feae06f795589c1d22f8e0efd774af631144d`

. By publishing `0f0feae06f795589c1d22f8e0efd774af631144d`

, I *force myself* to commit to “`eggs`

” (and not, say, “`lettuce`

”), because there’s no way I can feasibly come up with a value for `salt`

that’ll make “`lettuce [salt]`

” hash to `0f0feae06f795589c1d22f8e0efd774af631144d`

. Moreover, my argument that there’s “`eggs`

” in `0f0feae06f795589c1d22f8e0efd774af631144d`

is ironclad. Anyone who can run and understand SHA1 is convinced.

In order to make a convincing argument about the preimage of a given hash, it seems like I would need to disclose the full preimage. For instance, if I want prove that a certain hash output has a preimage that starts with one of `{"apple", "butter", "cheese", "eggs"}`

(without telling you which one), there’s no straightforward way to show that `14f84ddd83c41687598752c95564e6f5676b8c5c`

meets the specification: it’s indistinguishable from `afba1413f970af24337c61208390a755b6791d4c`

, which does not.

Ex nihilo, this is indeed the case. A random cryptographic hash by itself is indistinguishable^{3} from any other random hash. However, there’s a set of generic mathematical constructions (non-interactive zero-knowledge proofs) that allow us to generate a proof (which is really just Special Data with certain properties) – to accompany the cryptographic hash output – and will convince anyone that our secret preimage starts with one of `{"apple", "butter", "cheese", "eggs"}`

**without disclosing which one**.

An NIZK system has certain properties that make cryptographers happy, like “no information about the secret data can be derived from the proof” and “trying to prove something false or tampering with a valid proof always renders the proof invalid”.

The overall picture is this:

- We have some public data and some private data:
- Here, the public data is the hash output.
- Here, the private data is the preimage.

- We have a statement, made of a set of constraints, which expresses relations across the ensemble of the public data and the private data:
- Here, constraint 1 is “The secret preimage starts with
`{"apple", "butter", "cheese", "eggs"}`

” - Here, constraint 2 is “Running SHA1 on the secret preimage gives the public hash output”

- Here, constraint 1 is “The secret preimage starts with

The proof system (and there are many, they have names like Groth16 and Halo2) is generic for *all possible statements* that can be expressed in a finite size. You can use a proof system to prove, for instance statements like “the Merkle tree with root hash *X* has a leaf that meets a set of properties *Y*” **without telling anyone which leaf it is**!

It’s like a compiler, except instead of running code to produce a value, the proof system produces code that’ll prove/verify that a certain agreed-upon computation got executed faithfully and that certain agreed-upon constraints hold. To do so, you write up your statement as constraints in a specialized constraint language, and in exchange for that hard work, the proof system generates two pieces of software, both of which are generic for all (valid) sets of public data and private data:

- a prover, which ingests a specific set of public/private data and outputs a proof that the specific statement holds
- a verifier, which ingests a specific set of public data and a proof; and tells us whether the proof of that statement is valid

The constraints and computations that we want to prove are described (by the user) by a specialized constraint language and get rendered (by the proving system) into a set of arithmetic circuits^{4} with a specific form. The specific form depends on the proving system in question. Certain proving systems use rank-1 constraint system (R1CS), which has a relatively simple form:

- The constraint system is composed of multiple constraints and is satisfied by a single solution.
- The solution vector
*s⃗*contains the values of all the public, private, and intermediate values in the specific instance we are trying to prove. **Each**constraint has three vectors (*a⃗*,*b⃗*,*c⃗*)- To satisfy each constraint with the solution vector
*s⃗*, we must have (*s⃗*⋅*a⃗*)(*s⃗*⋅*b⃗*) − (*s⃗*⋅*c⃗*) = 0.

PLONK-style arithmetization schemes are more complex and in fact look like little integrated circuits (the Halo2 documentation and code explicitly use this metaphor!) or slices of FPGA fabric rather than a grab-bag of variables and quadratic constraints.

Not “look implementable on FPGAs”; though presumably there are clever ways to implement PLONKish proving/verification on FPGAs.↩︎

If you abbreviate “Multiply-ACcumulate” as “MAC” without context it’s confusing because “Media Access Controller” is another thing that lives on some FPGAs, goes by “MAC” too, and there’s no common alternate acronym for

*those*!↩︎Assuming your hash function isn’t broken, that the inputs aren’t identical, and that you are not an immortal who has counted to 2

^{128}.↩︎Over a large finite field, because this is cryptography↩︎

I’m designing a prototype Mode-A/C/S 1030MHz receiver so I can start testing demodulation algorithms (with real signals) and implementing them as gateware. In a previous post I mentioned multiple candidate receiver topologies and for simplicity’s sake, I’ve decided to go with the AD9363 approach. The AD9363 might not give me enough dynamic range to meet the standard’s requirements, but it’s enough to get started with algorithm development/testing.

I’m planning to use an ECP5 FPGA for the baseband processor because they’re well-supported by the yosys/nextpnr open-source FPGA toolchain. I already have an ECP5 evaluation board (the one from Lattice without the PCIe edge connector), so I’m currently designing an AD9363 board that can plug into that evaluation board’s pin headers.

Today I planned out the power supplies for the AD9363. I’d just copy what the reference manual suggests, but I can’t do that exactly because this ECP5 board lets me set the FPGA’s VCCIO to 2.5V but not to 1.8V. Honestly I don’t know to develop a preference for VCCIO voltage; given the AD9363 and the ECP5 both seem pretty flexible about it. Is a smaller voltage better? I don’t know. Also I want to tap off the 12V because the 3.3V/2.5V regulators on the ECP5 board don’t seem to have lots of margin for added loads.

I’m thinking of feeding that 12V to a two-output low-noise switching regulator LT8653S to generate 1.8 and 2.5V. The 2.5V gets used for the interface and GPO (general purpose output) supplies on the AD9363, which aren’t that critical in terms of noise sensitivity.

The 1.8V doesn’t get used directly but instead I feed it into two separate ADP1754 low-noise LDOs, and each of the LDOs feeds a different set of power supply pins on the AD9363, like the reference manual suggests. Also I’m thinking of doing what’s done in the ADALM-PLUTO and using an ADM7160 ultra-low-noise LDO to power the oscillator.

Actually, I’m sort of reconsidering using that evaluation board due to signal integrity reasons (this Analog Devices webpage states that the CMOS drivers in the AD9363 are purposefully weak and recommends using LVDS instead, and I have no idea how to use LVDS) and instead making my own board from scratch with an ECP5 *and* an AD9363. It’s not like there’s a lot more to keeping the ECP5 happy than there is for the AD9363 – it’s just power supplies and decoupling capacitors. I think I’ll do it, it looks like a fun challenge!

Also I’m using a draft version of DO-260A (which I found on the internet) as reference for all of this. If you can get me DO-260B and/or DO-181E (or the EUROCAE equivalents) I’d be very happy. RTCA charges **a lot** for these documents – to the point that it’s a lot cheaper to actually pay membership fees (which lets you have the documents for free) which I’d absolutely do except they say “membership is restricted to organizations that are doing business in aviation” and I’m not sure I qualify as that.

I’ve decided to put my 3-phase motor controller project on hold. I don’t really have the equipment/knowledge/experience to pursue it at the level I want to and I’ve come to terms that there’ll always be plenty of really interesting subjects for which this holds and that it’s better to do work where I *can* get interesting results done with what I have. Currently, those are my RF projects, so that’s what I’m spending time on.

I got a MATLAB Home License and the relevant toolboxes, which has been surprisingly useful since it provides tested implementations of channel models and modulators/demodulators. Previously, I had been trying to make physical-layer simulation software from scratch *and* validate it, *and* use it for developing receiver subcomponents. While I do appreciate what I’ve learned from making a channel/fading simulator, I never did quite understand how exactly to implement the Doppler model and I’ve been making significantly more progress now that I can use MATLAB and its Communications Toolbox as a source for golden/reference implementations. After all, people who work in industry use these tools all the time, so why shouldn’t I?

There’s a bunch of subcomponents in a full-featured continuous-phase modulation^{1} receiver:

- digital frontend: corrects for imperfections such as DC offset, I/Q imbalance, coarse frequency error
- feedforward channel estimation: estimates the channel impulse response based on known symbols
- channel shortening prefilter: concentrates the energy of the effective channel impulse response in the initial taps, without causing noise enhancement
- interference cancellation / space-time adaptive filters: nulls out co-channel interference
- trellis-style equalizer: given a channel impulse response, what’s the most likely sequence of symbols that led to the observed channel output?
- channel coding decoder: given the output symbols (or log-likelihood-ratios) from the equalizer, what bits were fed into the channel coder?

For each of those subcomponents, there’s plenty of subtle implementation details and even different mathematical approaches that can be used. For instance, the channel prefiltering can be done via with a linear prediction filter or by spectral factorization of the channel impulse response into an all-pass and minimum-phase component. The trellis-style equalizer can be expressed in the Ungerboeck and Forney formulations and I still don’t understand exactly why (or if) they’re equivalent or how exactly to construct the branch metrics or matched filter in front of the equalizer. Also, there’s per-survivor processing, in which the most-likely candidate sequence is continually used by the equalizer to revise the channel impulse response estimate.

Rather than jumping straight to trying to implement all of these, I’ve been playing around in MATLAB just trying to get the math to work (and comparing different approaches). For example, I had already implemented a GMSK modulator and used it to try correlation-based channel estimation in Python with some GSM midamble training sequences. In MATLAB I’ve been trying out least-squares channel estimation, understanding exactly how it works theoretically, looking at what coefficients go into which matrix/vector, and looking at its performance with various levels of noise. I’m going to do similar with the digital frontend, prefilter, equalizer, and channel decoder to gain sufficient theoretical background and intuition to go into the implementation phase.

I designed a PCB for the 1030MHz transponder receiver with a Mini-Circuits lowpass filter, coupling capacitors, ESD protection, and using edge-mount SMA connectors and grounded coplanar waveguide for the transmission lines. Kicad screenshots and photos of the unassembled boards and stencil are at the bottom of the page. I will populate it with components and have it tested on a friend’s vector network analyzer to verify that the impedance is close enough to 50 ohms.

A Mode-A/C/S receiver is subject to pretty strict requirements. It has to function over a large dynamic range, it has to preserve phase to decode the DPSK burst, and it has to preserve enough amplitude information for the receiver to compare the pulse amplitudes with the Minimum Triggering Level, and with each other (sidelobe suppression). Since the pulses are submicrosecond, I cannot use the actual AGC function of a standard RF-to-IQ-samples RFIC, and since phase must be preserved, a straighforward demodulating log amp is inapplicable.

Nevertheless, I came up with several candidate approaches:

- a downconverter whose output passes into a log amp with a separate limiter output (AD641 / AD8306 / AD8309), then ADC the log video (for amplitude) and ADC the limiter output (for phase)
- fast variable attenuator in the signal chain controlled by the FPGA, to allow for per-pulse “AGC”
- using a fully-integrated RFIC like AD9361 but adjusting the internal gain parameters for some intentional power-compression / limiting action, since i don’t need to preserve all the amplitude information, just enough to compare pulses with MTL and each other
- Doing the same except with discrete RF/IF amplifiers

Rather than deciding now which option to select, I’m going to try and make a quick-and-dirty prototype that intentionally does not meet the dynamic range requirements but that *does* incorporate all the relevant parts of the signal chain (RF frontend, downconverter and LO, IF filters, ADC, FPGA), in order to have *something* working such that I can start working on the actual demodulation code.

I’m trying to make a GSM/EDGE baseband because of the availability of test equipment and documentation, but there’s other systems that use continuous-phase modulations, such as aerospace telemetry and certain tactical data links.↩︎

Sorry for the lack of blog post this week, I’ve been busy trying to set up a workspace and procure equipment to work with PCBs. I had already bought a toaster oven to use as a reflow oven but I hadn’t been able to operate it since I needed a way to evacuate flux fumes (since I have a rather strong preference to not have flux fumes in my bedroom). I’ve ordered a fan with appropriate diameter, a metal hose, and also steel plates to make an enclosure (sorta like a fume hood) around the toaster oven / hot air station / soldering iron; with the idea that the fan will be able to suck air from the enclosure and blow the air through the hose and eventually, out the window.

I’ve also ordered a “microscope” (actually just a camera that outputs HDMI, let’s hope it doesn’t have too much lag) so I can place 0402’s, a thermocouple (to calibrate the toaster oven), and also bolts / nuts / brackets / drill bit / a hole saw / steel bars to assemble everything together.

]]>