April 23, 2026·9 min read·BestFolio Research Team

Kelly 9Sig Revisited: What We Got Wrong, What the Faithful Rules Actually Show

Last week we published a piece arguing that Jason Kelly’s 9Sig system would ruin you, citing a 99.3% max drawdown in our backtest. The post did what posts like that tend to do on r/LETFs and Reddit’s Kelly-adjacent communities: it generated a lot of replies, some agreeing, many explaining very specifically where our simulation had gotten the rules wrong. We owe those readers a direct answer, so here it is.

Our original 9Sig implementation was not a faithful rendition of the Kelly Letter rulebook. It was close enough to be misleading and wrong enough to overstate the drawdown. This post explains what the faithful rules actually are, what our corrected backtest produces, and why our conclusion about the closed-system risk still stands, just with more honest numbers.

What we had wrong

The version we shipped on April 11 used five assumptions that do not match Kelly’s published rules. In the spirit of showing our receipts rather than quietly fixing the code:

Starting allocation 80/20 instead of 60/40. The canonical 9Sig base allocation and post-reset target is 60% TQQQ / 40% AGG, per both Kelly’s own Substack posts and the community reference simulator at 9sig.networthcast.com. Starting at 80/20 over-weighted the leveraged sleeve from day one and inflated the drawdown that followed.
Thirty-Down lookback was an all-time high. Kelly defines the trigger as the stock ETF closing at or below 70% of its highest close over the past eight quarters, i.e. a rolling two-year high. Using an all-time high means the rule almost never triggers during a multi-year grind, because the reference point gets stuck at the pre-crash peak. The rule was designed to respond to crashes relative to recent conditions, not to ancient history.
Thirty-Down effect was “dump all bonds.” The actual rule is far less aggressive: when triggered, the plan skips the next two sell signals (Kelly changed it from four to two for 6Sig and 9Sig, in a revision he documented publicly). Buys still happen normally. The plan does not throw the reservoir at the bottom.
No buying-power throttle or bond floor. Kelly’s canonical quarterly buy is capped at 90% of the current bond balance. We also add a hard 10% bond floor relative to total portfolio value, so the sleeve cannot be drained to zero in a closed system. Our previous code did neither, which meant the first serious crash burned the reservoir completely.
No base reset or spike reset. When bonds grow past ~30% of the portfolio after a sell quarter, the plan is supposed to snap back to 60/40 and reset the signal line. The 9Sig spike variant also forces a reset when TQQQ doubles in a single quarter. Neither of these safety valves was in our previous engine.

The corrected engine

Our rewritten engine implements the rulebook as we now understand it:

60% TQQQ / 40% AGG starting and base-reset allocation.
Signal line grows 9% per quarter, compounding, never adjusted down.
Quarterly rebalance: sell surplus above signal, buy shortfall below.
Buys capped at 90% of current bond value (Kelly throttle) and further capped so bonds never fall below 10% of portfolio value.
Thirty-Down: evaluated against an 8-quarter rolling high; trigger at 70% of that high; skips the next 2 sell signals; window exits after 2 skips, on recovery above the threshold, or after 8 quarters (forced base reset).
Base reset when bonds exceed 30% after a sell: snap to 60/40, reset the signal line.
Spike reset (9Sig only): if TQQQ returns 100% or more in a single quarter and the plan is not in a Thirty-Down window, snap to 60/40.

That last point is the subtle one. The rules taken together are not arbitrary. They are what keeps the closed system from blowing up: the throttle and floor prevent bond depletion in a single bad quarter, the 8-quarter safety exit prevents the plan from being trapped indefinitely, and the base reset prevents the sleeve from becoming a dead-weight cash position after a parabolic run. Skip any one of them and the math gets worse than what Kelly himself publishes.

What the corrected backtest actually shows

Each variant is tested with its canonical Kelly asset pair: 3Sig runs IJR (small-cap 1x) with BND, 6Sig runs MVV (mid-cap 2x) with AGG, and 9Sig runs TQQQ (Nasdaq-100 3x) with AGG. Our earlier post leaned on TQQQ for all three; that was a simplification we should not have taken, and it is fixed in the corrected engine.

We re-ran each variant on two windows. The first uses real ETF data only, starting when both the stock leg and the bond leg have a live track record (IJR + BND from 2007-04, MVV + AGG from 2006-06, TQQQ + AGG from 2010-02). The second extends history with synthetic proxies for the leveraged legs (2x IJH for MVV pre-2006, 3x QQQ for TQQQ pre-2010) and a 4%-per-year proxy for AGG pre-2003. The second window includes the dot-com crash and is the stress test.

Variant	Assets	Window	CAGR	Max DD	Sharpe
3Sig v2	IJR + BND	2007-04 to 2026-04 (real)	9.0%	-52.3%	0.31
6Sig v2	MVV + AGG	2006-06 to 2026-04 (real)	11.7%	-78.3%	0.23
9Sig v2	TQQQ + AGG	2010-02 to 2026-04 (real)	39.4%	-72.1%	0.82
3Sig v2	IJR + BND	2000-09 to 2026-04 (extended)	9.17%	-46.32%	0.61
6Sig v2	MVV + AGG	2000-09 to 2026-04 (extended, synth MVV pre-2006)	11.31%	-82.75%	0.47
9Sig v2	TQQQ + AGG	1999-07 to 2026-04 (extended, synth TQQQ pre-2010)	8.28%	-99.73%	0.44

The "extended" rows above are what you'll see on our public strategy pages at /strategies/kelly-3sig, /strategies/kelly-6sig, and /strategies/kelly-9sig. Same engine, same window, same numbers. The "real" rows are a companion analysis from our validation script that restricts each variant to its earliest common real-ETF data (no synthetic leverage extrapolation). We report both so that skeptical readers can see how much of the headline result depends on the synthetic pre-inception history.

Two things stand out. First, the real-data 9Sig returns 39% CAGR in a closed system over 16 years, which is in the right ballpark for what Kelly subscribers report once you account for the fact that their simulated portfolios also include monthly contributions. That is the faithful rules doing their job over a single, very Nasdaq-friendly decade and a half.

Second, the real-data 6Sig (MVV + AGG) shows a -78% drawdown despite being marketed as the moderate variant. Mid-cap 2x leverage through 2008-2009 and 2020 is enough to produce that, even with the signal rebalancing and the bond sleeve. The moderate label belongs on 3Sig (IJR + BND), not 6Sig. 3Sig’s -52% drawdown is comparable to holding an unlevered equity index through the same period, which is the honest way to describe its risk.

The extended-history column is where the thesis holds up. Run 9Sig through the 2000-2002 Nasdaq crash with no contributions and the closed-system drawdown is 99.7%. Our earlier post reported -99.3% on the same window; the corrected engine is in the same catastrophic neighborhood, just with the rules faithfully applied. The earlier figure was arrived at through a wrong path (no 8-quarter lookback, no bond floor, no skip-2-sell-signals) but landed close to the right number for the wrong reason. The reason is physics: 3x daily leverage on QQQ through a 78% peak-to-trough decline compounds to something very close to total capital loss, and the bond sleeve is not large enough to refill a leg that has lost 99.9% of its value. 6Sig fares better through the same period (-83%) because MVV is 2x instead of 3x. 3Sig is effectively unharmed (-53%) because it is not leveraged at all.

Why the networthcast numbers look better

The community simulator at 9sig.networthcast.com starts a $10,000 portfolio in Q1 1999 and models a monthly $500 contribution, 100% to the bond sleeve. That changes the mathematics completely. Every month, new cash refills the reservoir and lifts the signal line by half of the contribution, exactly as Kelly describes in his Letter. Over 27 years, the contributor adds $162,000 of real money to the system. The strategy survives the dot-com crash not because its rules are self-healing but because its operator is constantly bailing the boat.

That is the real distinction between a strategy backtest and a financial plan. A backtest measures what the rules produce in isolation. A plan measures what a disciplined investor produces across decades of contributions, withdrawals, tax events, and emergencies. 9Sig passes as a plan for someone who can guarantee monthly contributions and never needs to touch the portfolio during a 99% drawdown. It fails as a strategy for anyone else, because a closed-system ruin of 99.7% is a real possibility in a real bear market.

What we are keeping and what we are changing

We are keeping the core decision: 9Sig, 6Sig, and 3Sig remain admin-only on BestFolio. We are not going to show a Pro subscriber a ranked leaderboard entry whose downside can credibly be total loss. The corrected CAGR numbers do not change the risk profile enough to warrant promotion.

We are restating the headline claim in our previous post. The “-99.3% drawdown in a 9Sig backtest” statement was based on a flawed implementation that happened to land near the right answer. The faithful-rule number, over the full window, is -99.7%. Over the real-TQQQ-only window, it is -72.1%. Both numbers are catastrophic in a closed system. Neither number is what Kelly’s subscribers experience because they are adding new money every month.

We are also publishing the corrected engine’s source code and the rule parameters in our rejection log, so anyone who wants to reproduce the backtest with different assumptions (different lookback, different floor, different contribution schedule) can do so against the same code we used.

What we learned

Getting the rules wrong was the lesson. We wrote a post arguing that a popular strategy would ruin retail investors, and we based it on numbers that did not faithfully reproduce the strategy. The critics who pointed this out on Reddit and in email were correct, and the fix has been in our tracker since April 19.

The broader point about closed-system leveraged value averaging does not go away with the corrected engine. It gets sharper. 9Sig, run without the monthly contributions that Kelly’s Letter assumes, drew down 99.7% through the dot-com crash in our corrected simulation. 9Sig, run with real TQQQ data after 2010 and no contributions, drew down 72% through the 2022 Nasdaq selloff. Either of those is enough to end a retail portfolio that cannot tolerate a multi-year underwater period. The strategy works for someone with a 30-year contribution horizon and iron discipline. For everyone else, the quiet assumption that the bond sleeve refills itself is still the flaw, and the faithful rules do not repair it. They just make the failure mode more interesting to read about.

Thank you to the r/LETFs readers who flagged the specific errors. The follow-up comment on the original thread will point here.

Share this article

Twitter LinkedIn Reddit

Related Strategies

kelly-3sig kelly-6sig kelly-9sig

Try these strategies on BestFolio

Browse 63+ tactical allocation strategies with monthly signals, walk-forward validation, and portfolio blending. Free to start.

Get Started Free