
Procedural Audio in McRogueFace
I added a complete procedural audio system to my game engine in one sitting. About 5,000 lines of C++ and Python, zero segfaults. Here's the story.
Why Procedural Audio?
McRogueFace is a roguelike game engine I've been building in C++ with Python scripting. 7DRL 2026 was coming up and I needed sound effects. I could hunt down free sound packs, but I wanted something more interesting: sounds generated from code, so every playthrough could have unique audio.
There's a practical reason too. McRogueFace compiles to WebAssembly, and the web build story for asset-heavy games is miserable. Every time you change a sound file, you repack, re-upload, and pray the browser cache cooperates. But if your sounds are generated from code, redeploying is just rebuilding the binary. The sounds come along for free because they are the code. That turned out to matter more than I expected.
If you've ever played a game and heard those satisfying little blip and zap sounds, there's a good chance they came from sfxr, a legendary tool by DrPetter that generates retro-style sound effects from a handful of parameters. I decided to build sfxr synthesis directly into the engine.
The SoundBuffer API
The core of the system is mcrfpy.SoundBuffer, a new Python type that wraps raw PCM audio data. You can create sounds several ways:
import mcrfpy
# Load from file
music = mcrfpy.SoundBuffer("theme.wav")
# Generate a pure tone with ADSR envelope
tone = mcrfpy.SoundBuffer.tone(440.0, 0.5, "sine",
attack=0.01, decay=0.1, sustain=0.7, release=0.2)
# Generate a retro sound effect
coin = mcrfpy.SoundBuffer.sfxr("coin")
laser = mcrfpy.SoundBuffer.sfxr("laser", seed=42)
# Play it
s = mcrfpy.Sound(coin)
s.play()
Every DSP method returns a new buffer, so you can chain effects without worrying about mutating the original:
processed = coin.low_pass(2000).echo(150, 0.4, 0.3).normalize()
I agonized a bit over the immutable-return pattern vs. in-place mutation, and I'm glad I went with immutability. It makes the API predictable and composable, even if it's slightly less memory-efficient.
sfxr Under the Hood
The sfxr synthesis engine is based on DrPetter's original algorithm. It runs at 44100 Hz mono with 8x supersampling. There are 24 parameters controlling everything from waveform shape to frequency slides, duty cycle, vibrato, a phaser, and a 3-stage envelope with "sustain punch." The entire synthesizer is about 350 lines of C++ that only touches <cmath>, <cstdint>, and <random> — no audio framework, no platform headers. That's what makes it compile to WASM without a single #ifdef.
I implemented 7 presets: coin, laser, explosion, powerup, hurt, jump, and blip. Each preset randomizes parameters within characteristic ranges, so every call with a different seed gives you a unique variation on the theme. And if you want fine control, you can pass all 24 parameters explicitly.
The best part is mutation: buf.sfxr_mutate(0.1, seed=99) jitters every parameter by a small amount, giving you variations on a sound. Start with a satisfying "coin pickup" and mutate it slightly for each different item the player collects.
The DSP Effects Chain
Once you have a buffer, you can process it through a chain of effects. I implemented these from scratch in C++:
- Pitch shift - linear interpolation resampling
- Low/High pass filters - single-pole IIR filters
- Echo - circular delay buffer with feedback
- Reverb - simplified Freeverb: 4 parallel comb filters + 2 series allpass filters
- Distortion - tanh soft clipping
- Bit crush - quantize to N bits with optional sample rate reduction
- Gain, normalize, reverse, slice - the basics
The reverb was the most fun to implement. The Freeverb algorithm uses specific "magic number" delay lengths (1116, 1188, 1277, 1356 samples for the comb filters) that were apparently discovered through experimentation in the late '90s. I scaled them proportionally for different sample rates, but the originals are tuned for 44100 Hz, which is what I'm using anyway.
Then I Got Carried Away: Animalese
Once tone generation worked, I had an idea: what if NPCs could "talk" in the babble-speak style of Animal Crossing? In that game, characters make vowel-ish sounds at high speed as their text displays, giving the impression of speech without actual voice acting.
I built it entirely in Python using the SoundBuffer API — no new C++ needed. That was a satisfying test of the API's expressiveness: if I could implement something this involved purely through the Python bindings, the abstractions were pulling their weight. Here's the approach:
-
Letter-to-vowel mapping. Every letter of the alphabet maps to one of 5 vowel types based on phonetic similarity. 'A' maps to 'ah', 'E' maps to 'eh', 'U' maps to 'oo', etc.
-
Formant synthesis. Each vowel has characteristic formant frequencies (F1 and F2). I generate a sawtooth tone at the character's base pitch, then run it through a low-pass filter tuned to F1. It's a crude approximation of real formant synthesis, but it works.
-
Consonant bursts. Letters like B, D, K, and S get a short noise burst at 2500 Hz prepended before their vowel, giving the impression of articulation.
-
Assembly. Letters are concatenated with 25% overlap for that characteristic babble effect. Spaces and punctuation become silence gaps. A light reverb pass at the end adds warmth.
-
Personality presets. Different characters get different voices:
| Personality | Pitch | Rate | Notes |
|---|---|---|---|
| Cranky | 90 Hz | 10/s | Low, slow, breathy |
| Normal | 180 Hz | 12/s | Middle of the road |
| Peppy | 280 Hz | 18/s | High, fast, chirpy |
| Lazy | 120 Hz | 8/s | Low, slow, very breathy |
Each letter's pitch is jittered by a random number of semitones, so the speech has natural melodic variation. The "peppy" preset sounds genuinely chirpy and the "cranky" one sounds like a grumpy old shopkeeper.
The first time I heard it come out of my speakers, I laughed out loud. It doesn't sound like speech, exactly — it sounds like speech the way a cartoon sounds like real life. Plausible, pleasant, and instantly recognizable as "someone talking." When it plays synchronized with word-by-word text display, the effect is surprisingly compelling. Text goes from dead words on screen to something that feels alive.
Zero Assets, Full Audio
Here's the punchline for the web build angle: the entire McRogueFace audio system ships with zero sound files. Every sound in the game — sfxr effects, Animalese dialogue, ambient tones — is synthesized at runtime from parameters. The WASM binary doesn't need to bundle a single .wav or .ogg. When I rebuild and push, the new audio is just there, because it was never a file in the first place.
Because the synthesis code is pure math — no platform-specific audio libraries — the same C++ compiles for desktop (SFML) and web (SDL2 via Emscripten) without conditional compilation. The DSP effects chain is the same way: ~300 lines of vector-to-vector transforms that don't know or care what platform they're running on. McRogueFace already has a make playground target that builds a browser-based Python REPL, so in principle you could type mcrfpy.SoundBuffer.sfxr("coin").play() in your browser tab right now and hear the result. I'm working on embedding that as a live demo in this post.
This is the kind of thing that only makes sense for a certain kind of project. If you need orchestral music or recorded dialogue, procedural synthesis isn't your tool. But for a roguelike where you want 50 variations on "monster got hit" and NPC babble, generating everything from a handful of numbers is both lighter and more fun than managing an asset pipeline.
No Segfaults
I want to note this because it surprised me: approximately 5,000 lines of new C++ code integrating with SFML's audio subsystem and Python's C API, and not a single segfault during development. The immutable buffer pattern probably helped a lot — no shared mutable state between the C++ audio thread and Python. All the interesting memory problems were saved for the next day (that's a whole other story).
How It Played Out
This all shipped about a week before 7DRL 2026. In the jam build, every monster faction got a unique sfxr "voice" built from mutated presets — undead have this wet, crunchy hit sound while fire elementals get bright crackling ones. Item pickups have procedurally varied sounds so grabbing a potion never sounds exactly the same twice. And the NPCs babble at you in Animalese as their dialogue scrolls by, which is exactly as charming as I hoped.
The full audio system shipped with 62 unit tests, which is more tests than I wrote for any other single feature in this engine. I think that's because the API is genuinely fun to play with — I kept thinking of edge cases I wanted to verify because I was having a good time.
This article was scaffolded with backblog.