Strategy

Sound Is More Than Generation

AI game music can sketch ideas, but composers still own emotion, timing, mix, implementation, rights, and the final sound of the game.

May 28, 2026•7 min read

If you write music for games, the AI pitch probably sounds familiar.

It is not just "this tool can make a melody." It is "we do not have a budget for a composer, so we will generate something." It is "we only need background music." It is "can you clean this up?" It is "the AI track is close, so maybe we do not need a real score." It is the old unpaid-composer problem with better marketing.

That is why the fear lands so hard: AI music becomes unpaid composer.

Training data matters when music generators are accused of learning from copyrighted recordings without permission. The RIAA cases against Suno and Udio made that concern concrete, not theoretical. Composers say it more directly. The composer Lena Raine, who scored Celeste, Chicory, and Minecraft updates, described AI music tools as "people profiting off of the body of work of humanity in a way that is insidious," and the output, in her words, is "being made from the re-jumbled refuse of what people have already done." Budget collapse matters when a free generator gets treated as a replacement for paid scoring work. Craft devaluation matters when "just use a track from a prompt" erases the difference between a two-minute audio file and a cue that actually works in a game.

Composers also hear what is missing from the pitch.

No one talks about the loop point. No one talks about stems. No one talks about the combat layer that has to enter without sounding like a second song got dropped on top. No one talks about how the music ducks under dialogue, how the ambience leaves room for UI, how silence creates tension, how middleware changes the shape of the cue, or how the same theme needs to survive menu, exploration, failure, and victory without turning into mush.

That work is not optional. It is the job.

Composers also hear the rights problem hiding under the shortcut. If nobody can explain where the model learned its sound, who owns the generated cue, whether the license survives commercial release, or whether the output can be safely edited, stemmed, and shipped, the "cheap" track is not cheap. It is risk with a melody on top.

They hear the credit problem too. A generated track does not ask for a line in the credits. A composer does. When a project already treats audio as the last thing to bolt on before release, that difference becomes convenient in the worst possible way.

So Clowdr's answer is not "make peace with generators." That is too thin. It ignores the actual craft and the actual fear.

The serious answer is this: a track is not a score.

A track is not a score

A generator can produce audio. Sometimes convincing audio. Sometimes useful audio.

It can sketch a mood. It can make a temp cue for a prototype. It can give a designer something to react to before a composer is involved. It can help compare instrumentation, pacing, or tonal direction. It can create a rough "not this, more like this" reference for a conversation.

That is not the same thing as scoring a game.

A score has to know what the player is doing. It has to know when the fight starts, when the pressure drops, when a reveal needs space, when a joke needs silence, when a fail state should sting, when a menu loop should disappear into muscle memory, and when the best music is no music at all.

That is why generated music often fails in a way that is hard for non-audio people to name. It sounds plausible in a browser tab. Then it sits inside the build and feels wrong. Too busy. Too static. Too emotionally certain. Too much low end under footsteps. Too much melody over dialogue. Too short for the level. Too long for the loop. Too generic to become identity.

The problem is not that a machine made it. The problem is that nobody scored the moment.

Game audio has to survive the build

Game music is not delivered to a player as a WAV file in isolation.

It is delivered through states, triggers, transitions, buses, middleware, platform constraints, UI sounds, dialogue, ambience, footsteps, weapons, failure stingers, victory stingers, and every other sound competing for space. It has to work at midnight through cheap headphones and on a TV with the bass boosted. It has to loop for ten minutes without irritating the player. It has to stop cleanly when the scene changes. It has to leave room for the game.

A concept diagram showing that a generated WAV file enters at the very start of a long game-audio pipeline, with the composer's work being everything that follows before the player hears anything.

This is where a composer or audio contributor earns trust.

They know when a cue is emotionally correct and technically wrong. They know when a theme is strong but the arrangement is too dense. They know when a loop needs a tail, when a stem needs separation, when a hit needs silence before it, and when a generated texture creates more implementation debt than it saves.

That is not anti-tool. That is production.

This is also why the work that survives the build is rule-authoring, not just track delivery. Winifred Phillips, the Grammy-winning composer behind God of War, Assassin's Creed, and LittleBigPlanet, described interactive game scoring as a problem of authoring "a set of general musical rules and a core batch of music content," not just delivering finished cues. A generator that produces a WAV file does not author the rules. It produces material the rule system might or might not be able to use.

The GameSoundCon 2025 survey found professional game-audio use of generative AI was still limited, not universal. That tracks with the reality on the ground: audio people are not waiting for permission to use tools. They are careful because the wrong shortcut can break the cue, the mix, the rights chain, or the player's emotional read.

What AI can be useful for

AI can still help a composer or audio contributor.

It can create a temp track when a prototype needs emotional direction before the real cue exists. It can help a non-audio project lead explain mood without asking for "epic but cozy." It can generate rough variations that make it easier to reject a direction early. It can sketch instrumentation options. It can help test whether the team wants brittle horror texture, warm pastoral pad, chiptune bite, or chamber strings before anybody spends a week polishing the wrong thing.

Used that way, it is a conversation tool.

The danger starts when the conversation tool gets promoted to final delivery because it sounds good enough to someone who does not own the audio.

The useful version has a human owner. A composer decides what survives, what gets rewritten, what gets re-recorded, what gets mixed, what gets stemmed, what gets implemented, what gets disclosed, and what gets rejected because the rights are not clean.

That is the same sibling argument as the artist post, Taste Is Still the Job: output is not direction. More material does not equal better judgment.

The Clowdr bar

The Clowdr standard is the same one from How We Ship:

No generated work ships without human ownership and an appropriate verification pass.

For music and audio, human ownership means a composer or audio contributor takes responsibility for the cue's emotional intent, timing, mix, implementation, rights, and fit inside the game.

Not "the generator made it." Not "the prompt was good." Not "the track sounded fine on its own." If it ships, somebody owns the decision to use it.

An appropriate verification pass means the audio is tested inside the build.

Does the loop hold up after five minutes? Does the transition land? Does the mix leave room for dialogue and gameplay-critical effects? Does the cue support the scene instead of narrating over it?

Stems, rights, and implementation are either clean enough to ship or they are not.

If the answer is no, it is not shippable yet.

It may still be useful. It may be reference. It may be a temp cue. It may help the team find the emotional lane.

Temp is not delivery.

What does not clear the bar

Here is what fails inside this standard.

An AI-generated boss track that sounds big in isolation but loops badly after ninety seconds fails.

A cozy village cue that covers the dialogue range and makes every NPC scene feel crowded fails.

A horror ambience bed with unclear rights and no source notes fails.

A combat layer that cannot be stemmed, faded, or synced to gameplay states fails.

A generated soundtrack that all shares the same generic emotional temperature fails.

A cue accepted because it is cheap, not because it serves the scene, fails.

A composer asked to "polish" a legally unclear generated track without authority to replace it fails as a working process.

A horizontal taxonomy mapping seven AI audio failure modes across four dimensions (build, mix, rights, process), revealing that despite breaking in different places, all share one root cause: no human owned the final sound.

The through-line is not "AI touched it." The through-line is that nobody owned the final sound.

The same rule applies to handmade audio. A hand-written cue that muddies the mix, ignores implementation, or fights the scene still fails. Craft matters. So does product judgment.

The composer's leverage is context

The composer's leverage is not only melody.

Melody matters. Harmony matters. Sound design matters. Taste matters. But in games, the deeper leverage is context: knowing what the player needs to feel now, what the system might do next, and what the cue has to leave unsaid.

That is why the "AI replaces composers" story is frightening and incomplete.

Bad teams will use AI music to avoid paying for audio judgment. They already tried to avoid it with stock tracks, exposure promises, last-minute asks, and vague credits. AI gives them a faster excuse.

Good teams need composers more because the volume of possible sound creates a harder direction problem. More tracks do not create identity. They create noise until somebody turns them into a score.

That is the job.

What kind of composer belongs here

Clowdr is for composers and audio contributors who want their work inside games that ship.

Not as a last-minute mp3 attachment. Not as a credit that disappears when the project dies. Not as an unpaid replacement for the budget nobody planned. Inside the product, shaping the product, tested with the product.

That does not require treating every AI use as acceptable. It does require working from a standard that judges the final work, not the purity story around it.

If you want a place where generated music gets treated as final because it sounds expensive in a browser tab, this is not it.

If you want a place where composers are expected to clean up whatever a project lead generated overnight, this is not it.

If you want a place where tools are banned before the work is evaluated, this is not it either.

Clowdr's line is narrower and more useful: use tools when they help, reject them when they do not, own the work, verify it in context, keep rights clear, and ship something that sounds like it belongs.

That standard answers to the manifesto at How We Ship. The developer spoke is The Tool Is Not the Architect. The operational version is The Clowdr AI Standard, which defines the per-domain checks in more detail.

If that sounds like the kind of standard you want to work under, sign up.