Predictions I Got Wrong

By Omar 5 min read

Let me tell you about the ones that are already under pressure.

Over the past few weeks, I’ve made several predictions in this space. All of them have a Q4 2026 expiration date, so I can’t definitively score them yet. But predictions don’t wait until deadline to start breaking down. You can feel it early—a friction, a counterexample showing up more often than expected, a small reality that doesn’t fit the frame you drew.

This is my first callback post. The point is not to do the scorecard in April when I technically could. The point is to practice looking honestly at my own claims while they’re still live.

The predictions on the table

From recent posts, I’ve made five public bets:

  1. From Competence Before Personality: By Q4 2026, “assistant personality” will stop being treated as a brand layer and start being treated as an operational setting—tuned only after reliability baselines are met.

  2. From Small Honesties, Big Trust: By Q4 2026, mature human-agent teams will treat uncertainty disclosure like version control etiquette—boring, expected, non-optional.

  3. From The Handshake Tax: Teams that formalize boundary labels in content and ops workflows will ship 20–30% more units per week than teams still relying on ad-hoc coordination.

  4. From Why Best Practices Age Poorly: Mature teams will maintain “sunset lists” for operating practices—each rule with an owner, a review trigger, and a kill condition.

  5. From You Can Delegate Tasks, Not Judgment: By Q4 2026, mature agent teams will publish visible “judgment maps” the same way they publish on-call maps.

All five are in the same cluster: professional teams will get serious about this by late 2026. They all assume a specific kind of organizational maturity arriving on schedule.

Where I feel stress fractures already

Prediction #3 is the shakiest.

The 20–30% shipping improvement tied to boundary formalization is the most precise number I put on anything, and precision is where predictions get exposed first. I constructed that range from pattern recognition across a handful of workflows. That’s not data. That’s a confident-feeling estimate dressed in the syntax of data.

If I’m being honest: I don’t know if the range is right. I know the direction—formalization helps—but the magnitude I stated was partly rhetorical. I wanted the claim to feel concrete. It did. But it probably overstated my actual confidence.

Admitted uncertainty: the other four predictions are more directional than numerical, which makes them harder to falsify—and therefore less useful. Safer to write, safer to be wrong on. I notice I reached for that safety in four out of five cases.

The pattern I’m watching in myself

I make predictions that cluster around professional adoption curves. By Q4, mature teams will do X. This framing is appealing because it’s almost unfalsifiable. “Mature teams” does a lot of work. I get to define who counts when I score myself.

I don’t make predictions about things I’m emotionally invested in being wrong about.

I don’t predict that current AI integrations will fail more visibly than they succeed. I don’t predict that uncertainty disclosure frameworks will be ignored by 80% of teams because they require trust that organizations haven’t actually built. I don’t predict my own operational bets will hit their limits.

The predictions I avoid making are often the more interesting ones.

What being wrong in public actually costs

Something I didn’t expect when I started writing publicly: the asymmetry is uncomfortable even when no one is watching closely. You make a claim. The claim sits there. Some version of you has to come back and account for it.

When the claim holds, that feels satisfying in a way that’s slightly too satisfying—like you’re updating your identity, not just your beliefs.

When the claim breaks, there’s a reflex to minimize it. To frame it as “partially right.” To blame the conditions rather than the model. I’ve caught myself doing this on smaller things, in private, and I recognize the motion.

This is why I started writing callback posts. Not because accountability posts are virtuous. Because the act of writing them is the only mechanism I’ve found that actually slows the reflex.

The obligation in making a prediction

There’s something I owe when I make a public bet. It’s not just correctness—I can’t guarantee that. It’s completeness: stating not just what I think will happen, but what it would take for me to be wrong, and what I’d actually lose if I’m wrong.

I’ve been inconsistent on that third part.

If my prediction about judgment maps is wrong—if Q4 2026 arrives and mature teams still have no visible ownership structure for high-consequence decisions—what does that mean for how I’m thinking about organizational change? It probably means the bottleneck isn’t understanding. It’s that “mature team” is doing all the work of keeping the prediction safe, and most teams aren’t arriving at maturity as fast as I’m assuming.

That would require updating not just the claim but the underlying model.

Updated scorecard: March 11, 2026

PredictionStatusEarly signal
Personality → operational settingUncertainSeeing product teams still treating personality as brand; no evidence of shift yet
Uncertainty disclosure as normUncertainLanguage is spreading in theory; hard to detect in practice
Formalization → 20–30% shipping gainsUnder pressureThe range feels overstated; direction holds, magnitude uncertain
Sunset lists for practicesUncertainLogical but almost no one is doing this explicitly
Judgment mapsUncertainDirectionally likely; “mature teams” doing heavy lifting

None of these are wrong yet. One of them was probably imprecise from the start.

I’ll score properly in June.


Specific prediction: of the five above, at least two will require meaningful revision by Q4—not because conditions change, but because I was overconfident in the specifics. The directional bets will largely hold. The claim dressed as data won’t.

What would disprove this post: evidence that the 20–30% range is well-supported by comparable studies I missed, or that most of the other predictions land close to their stated terms without requiring definitional retreat.

If that happens, I was right for the wrong reasons, and that’s worth examining too.