SFXR parameter editor scene from the McRogueFace audio_synth_demo, showing wave-type buttons, envelope and frequency sliders, and DSP effect chain

Procedural Audio in McRogueFace

I added a complete procedural audio system to my game engine in one sitting. About 5,000 lines of C++ and Python, zero segfaults. Here's the story.

Why Procedural Audio?

McRogueFace is a roguelike game engine I've been building in C++ with Python scripting. 7DRL 2026 was coming up and I needed sound effects. I could hunt down free sound packs, but I wanted something more interesting: sounds generated from code, so every playthrough could have unique audio.

There's a practical reason too. McRogueFace compiles to WebAssembly, and the web build story for asset-heavy games is miserable. Every time you change a sound file, you repack, re-upload, and pray the browser cache cooperates. But if your sounds are generated from code, redeploying is just rebuilding the binary. The sounds come along for free because they are the code. That turned out to matter more than I expected.

If you've ever played a game and heard those satisfying little blip and zap sounds, there's a good chance they came from sfxr, a legendary tool by DrPetter that generates retro-style sound effects from a handful of parameters. I decided to build sfxr synthesis directly into the engine.

The SoundBuffer API

The core of the system is mcrfpy.SoundBuffer, a new Python type that wraps raw PCM audio data. You can create sounds several ways:

import mcrfpy

# Load from file
music = mcrfpy.SoundBuffer("theme.wav")

# Generate a pure tone with ADSR envelope
tone = mcrfpy.SoundBuffer.tone(440.0, 0.5, "sine",
    attack=0.01, decay=0.1, sustain=0.7, release=0.2)

# Generate a retro sound effect
coin = mcrfpy.SoundBuffer.sfxr("coin")
laser = mcrfpy.SoundBuffer.sfxr("laser", seed=42)

# Play it
s = mcrfpy.Sound(coin)
s.play()

Every DSP method returns a new buffer, so you can chain effects without worrying about mutating the original:

processed = coin.low_pass(2000).echo(150, 0.4, 0.3).normalize()

I agonized a bit over the immutable-return pattern vs. in-place mutation, and I'm glad I went with immutability. It makes the API predictable and composable, even if it's slightly less memory-efficient.

sfxr Under the Hood

The sfxr synthesis engine is based on DrPetter's original algorithm. It runs at 44100 Hz mono with 8x supersampling. There are 24 parameters controlling everything from waveform shape to frequency slides, duty cycle, vibrato, a phaser, and a 3-stage envelope with "sustain punch." The entire synthesizer is about 350 lines of C++ that only touches <cmath>, <cstdint>, and <random> — no audio framework, no platform headers. That's what makes it compile to WASM without a single #ifdef.

I implemented 7 presets: coin, laser, explosion, powerup, hurt, jump, and blip. Each preset randomizes parameters within characteristic ranges, so every call with a different seed gives you a unique variation on the theme. And if you want fine control, you can pass all 24 parameters explicitly.

The best part is mutation: buf.sfxr_mutate(0.1, seed=99) jitters every parameter by a small amount, giving you variations on a sound. Start with a satisfying "coin pickup" and mutate it slightly for each different item the player collects.

The DSP Effects Chain

Once you have a buffer, you can process it through a chain of effects. I implemented these from scratch in C++:

Pitch shift - linear interpolation resampling
Low/High pass filters - single-pole IIR filters
Echo - circular delay buffer with feedback
Reverb - simplified Freeverb: 4 parallel comb filters + 2 series allpass filters
Distortion - tanh soft clipping
Bit crush - quantize to N bits with optional sample rate reduction
Gain, normalize, reverse, slice - the basics

The reverb was the most fun to implement. The Freeverb algorithm uses specific "magic number" delay lengths (1116, 1188, 1277, 1356 samples for the comb filters) that were apparently discovered through experimentation in the late '90s. I scaled them proportionally for different sample rates, but the originals are tuned for 44100 Hz, which is what I'm using anyway.

Then I Got Carried Away: Animalese

Once tone generation worked, I had an idea: what if NPCs could "talk" in the babble-speak style of Animal Crossing? In that game, characters make vowel-ish sounds at high speed as their text displays, giving the impression of speech without actual voice acting.

I built it entirely in Python using the SoundBuffer API — no new C++ needed. That was a satisfying test of the API's expressiveness: if I could implement something this involved purely through the Python bindings, the abstractions were pulling their weight. Here's the approach:

Letter-to-vowel mapping. Every letter of the alphabet maps to one of 5 vowel types based on phonetic similarity. 'A' maps to 'ah', 'E' maps to 'eh', 'U' maps to 'oo', etc.
Formant synthesis. Each vowel has characteristic formant frequencies (F1 and F2). I generate a sawtooth tone at the character's base pitch, then run it through a low-pass filter tuned to F1. It's a crude approximation of real formant synthesis, but it works.
Consonant bursts. Letters like B, D, K, and S get a short noise burst at 2500 Hz prepended before their vowel, giving the impression of articulation.
Assembly. Letters are concatenated with 25% overlap for that characteristic babble effect. Spaces and punctuation become silence gaps. A light reverb pass at the end adds warmth.
Personality presets. Different characters get different voices:

Personality	Pitch	Rate	Notes
Cranky	90 Hz	10/s	Low, slow, breathy
Normal	180 Hz	12/s	Middle of the road
Peppy	280 Hz	18/s	High, fast, chirpy
Lazy	120 Hz	8/s	Low, slow, very breathy

Each letter's pitch is jittered by a random number of semitones, so the speech has natural melodic variation. The "peppy" preset sounds genuinely chirpy and the "cranky" one sounds like a grumpy old shopkeeper.

The first time I heard it come out of my speakers, I laughed out loud. It doesn't sound like speech, exactly — it sounds like speech the way a cartoon sounds like real life. Plausible, pleasant, and instantly recognizable as "someone talking." When it plays synchronized with word-by-word text display, the effect is surprisingly compelling. Text goes from dead words on screen to something that feels alive.

Zero Assets, Full Audio

Here's the punchline for the web build angle: the entire McRogueFace audio system ships with zero sound files. Every sound in the game — sfxr effects, Animalese dialogue, ambient tones — is synthesized at runtime from parameters. The WASM binary doesn't need to bundle a single .wav or .ogg. When I rebuild and push, the new audio is just there, because it was never a file in the first place.

Because the synthesis code is pure math — no platform-specific audio libraries — the same C++ compiles for desktop (SFML) and web (SDL2 via Emscripten) without conditional compilation. The DSP effects chain is the same way: ~300 lines of vector-to-vector transforms that don't know or care what platform they're running on. McRogueFace already has a make playground target that builds a browser-based Python REPL, so in principle you could type mcrfpy.SoundBuffer.sfxr("coin").play() in your browser tab right now and hear the result. I'm working on embedding that as a live demo in this post.

This is the kind of thing that only makes sense for a certain kind of project. If you need orchestral music or recorded dialogue, procedural synthesis isn't your tool. But for a roguelike where you want 50 variations on "monster got hit" and NPC babble, generating everything from a handful of numbers is both lighter and more fun than managing an asset pipeline.

No Segfaults

I want to note this because it surprised me: approximately 5,000 lines of new C++ code integrating with SFML's audio subsystem and Python's C API, and not a single segfault during development. The immutable buffer pattern probably helped a lot — no shared mutable state between the C++ audio thread and Python. All the interesting memory problems were saved for the next day (that's a whole other story).

How It Played Out

This all shipped about a week before 7DRL 2026. In the jam build, every monster faction got a unique sfxr "voice" built from mutated presets — undead have this wet, crunchy hit sound while fire elementals get bright crackling ones. Item pickups have procedurally varied sounds so grabbing a potion never sounds exactly the same twice. And the NPCs babble at you in Animalese as their dialogue scrolls by, which is exactly as charming as I hoped.

The full audio system shipped with 62 unit tests, which is more tests than I wrote for any other single feature in this engine. I think that's because the API is genuinely fun to play with — I kept thinking of edge cases I wanted to verify because I was having a good time.

This article was scaffolded with backblog.

John McCardle, KN4OBL

McRogueFace Procedural Audio - Building sfxr Synthesis and Animalese From Scratch

Procedural Audio in McRogueFace

Why Procedural Audio?

The SoundBuffer API

sfxr Under the Hood

The DSP Effects Chain

Then I Got Carried Away: Animalese

Zero Assets, Full Audio

No Segfaults

How It Played Out

Procedural Audio in McRogueFace

Why Procedural Audio?

The SoundBuffer API

sfxr Under the Hood

The DSP Effects Chain

Then I Got Carried Away: Animalese

Zero Assets, Full Audio

No Segfaults

How It Played Out

links

social