I Lost $150 in 20 Minutes Market-Making on Kalshi. Here's Every Bug That Did It.

Published 2025-3-5

I built an automated market maker for Kalshi, the CFTC-regulated event contracts exchange. It uses an Avellaneda-Stoikov model to quote binary options — the kind of thing that sounds great in a Jupyter notebook and costs you real money in production.

The codebase is located at github.com/rodlaf/KalshiMarketMaker.

After deploying to prod, the bot burned through $150 in about 20 minutes. Not a slow bleed over weeks. Twenty minutes. I sat down to audit the whole thing line by line. Here's what I found.

Bug 1: The inventory skew was inverted

The Avellaneda-Stoikov model adjusts your reservation price away from your inventory direction. If you're long, it nudges your quotes lower to encourage selling. If you're short, it nudges higher to encourage buying. This is the core mean-reversion mechanism that keeps you from accumulating a massive directional position.

My implementation had the sign flipped:

# What I had (WRONG — pushes INTO inventory)
inventory_skew = inventory * self.inventory_skew_factor * mid_price

# What it should be (pushes AWAY from inventory)
inventory_skew = -inventory * self.inventory_skew_factor * mid_price

One character. One minus sign. Instead of mean-reverting, the bot was trend-following its own inventory. Long? Buy more. Short? Sell more. A feedback loop that guaranteed losses.

Bug 2: The spread calculation was neutered

The Avellaneda-Stoikov spread formula was being multiplied by 0.01 before being applied:

return max(base_spread * spread_multiplier * 0.01, self.min_spread)

This collapsed every computed spread down to the floor (min_spread). The entire model — gamma, sigma, k, time decay — all of it was dead code. The bot was quoting with a fixed 2-cent spread on every market regardless of volatility or inventory risk.

Bug 3: Sigma was off by 100x

sigma: 0.001 in config. For binary options that move in increments of $0.01 on a $0.00–$1.00 range, sigma needs to be in the neighborhood of 0.05–0.15. At 0.001, the model thought volatility was essentially zero, so it quoted razor-thin spreads on everything. Combined with bug #2, this was belt-and-suspenders wrong.

Bug 4: Market selection preferred wide spreads

The scoring function normalized spread and ranked markets. But it forgot to invert the spread score:

# What I had — higher spread = higher score (WRONG)
spread_norm = normalize(market["spread_cents"], min_spread, max_spread)

# What it should be — tighter spread = higher score
spread_norm = 1.0 - normalize(market["spread_cents"], min_spread, max_spread)

The bot was actively seeking out the most illiquid, widest-spread markets on the exchange. Exactly the markets where a small retail market maker gets picked off by informed flow.

Bug 5: The bot bought combo markets it couldn't exit

This was the kill shot, and probably where most of the $150 went.

Kalshi has "multivariate event" (MVE) markets — combo contracts across multiple events. They have limited liquidity and can be extremely difficult to exit once you're in. The API has an mve_filter=exclude parameter, but MVE markets were still slipping through. I didn't have a local safety check.

The bot happily loaded up on positions in KXMVECROSSCATEGORY-* and KXMVESPORTSMULTIGAMEEXTENDED-* tickers. Once in, there was no getting out at a reasonable price. I watched my account value drop in real time as these positions sat there, untradeable.

The fix was a multi-layer filter: hard ticker prefix check (all MVE tickers start with KXMVE), plus checking the mve_collection_ticker, mve_selected_legs, and strike_type fields from the API response. Trust but verify — especially when "trust" means losing money you can't recover.

Bug 6: top_n was 50

The bot was trying to market-make 50 markets simultaneously with a 20-contract global cap. That's not market making, that's a fractional position scattered across 50 illiquid markets. Too thin to earn the spread, too spread out to manage risk.

Why it blew up so fast

Every one of these bugs individually would have been a slow drag on returns. Together, they created a $150-in-20-minutes situation:

Inverted market selection sent the bot to the widest, most illiquid markets
No MVE filter meant it bought untradeable combo positions
Inverted skew meant it piled into losing positions instead of mean-reverting
Neutered spread + wrong sigma meant it quoted 2 cents wide on everything
50 simultaneous markets meant it was spraying orders everywhere

The bot was basically a liquidity-taker masquerading as a liquidity-provider. It was buying every combo it could find at the worst possible prices, then doubling down when the position moved against it.

What I'd do differently

Paper trade with assertions. Before going live, run the model on historical data and assert that inventory mean-reverts. Assert that the spread widens as inventory grows. If your model can't pass basic sanity checks on fake data, it won't pass them with real money either.
Test the economics, not just the code. Unit tests that check spread > 0 are useless. Test that reservation_price < mid when long. Test that tighter spread markets score higher. The bugs weren't in the plumbing — they were in the math.
Filter defensively. Don't trust API query parameters to fully filter what you don't want. Kalshi's mve_filter=exclude missed markets. A hard ticker-prefix check (KXMVE*) would have caught them immediately. Defense in depth isn't just for security.
Start with 1 market. Not 50. Not even 5. One market, watched closely, with a kill switch, for a week.
Set a drawdown kill switch. The bot had position limits but no P&L circuit breaker. If I'd had a "stop if down $20" check, I'd have lost $20 instead of $150.

The money's gone. But at least I know exactly where it went — and it makes for a good cautionary tale about deploying quantitative strategies without proper testing infrastructure.