Skip to content

Why framing matters

The model is rarely the problem. The question is.

Two stories for you. One is a piece of research the whole field framed wrong until one person reframed it. The other is a product that hit every technical metric and still failed, until the team changed the question instead of the model. Neither turned on a better model. Both turned on a better question.

FlashAttention: the same answer, a different question

For years the field treated fast attention as a math problem. Attention is O(N²): every token attends to every other token, so the obvious move was to do less math. Sparse attention, approximate attention, clever schemes to skip pairs. Many of these had better theoretical complexity than brute force. On real GPUs they ran slower.

Tri Dao asked a different question. Not "how do we do less math," but "what is actually slow." The answer was memory. A GPU spends most of attention shuttling the N×N matrix between slow high-bandwidth memory and fast on-chip cache, and that data movement, not the arithmetic, is the bottleneck. So FlashAttention stopped optimizing FLOPs and started optimizing memory access: tile the computation to fit in cache, compute softmax incrementally, recompute instead of store. The result was exact attention, not an approximation, three to ten times faster, and it is the reason context windows went from two thousand tokens to a million.

He didn't approximate anything. The answer was the same exact attention everyone else was computing; what changed was how it got computed. He tiled the matrix so it stayed in fast memory, borrowing a decades-old trick from high-performance computing, and used an obscure online-softmax result from a 2018 NVIDIA paper to make the tiling work. By reframing the problem away from approximation and into data-movement problem, he created a 3X to 10X improvement in attention. Great example of thinking differently about a problem

JP Morgan Chase: the same model, a different question

JP Morgan Chase built an AI system to help people understand privacy policies. The build was impressive: finely tuned models, a clean interface, a hierarchy of a hundred models, infrastructure upgraded just to deploy it. Accuracy was high. Retrieval quality was high. They hit every technical milestone. Then they launched, and to quote their former Head of AI Innovation, the response was "crickets." Nobody used it.

The model worked exactly as built. The assumption underneath it did not. The team had decided users wanted to chat with privacy policies, and built every metric around that assumption: accuracy, retrieval quality, faithfulness. The one metric they never had was whether anyone completed a decision with the tool. Nobody did.

So they changed the question, not the model. Users didn't need answers about a privacy policy. They needed help deciding whether to accept one. Same accuracy, different shape. The team put the system where the decision was actually being made, showed users why it was flagging what it flagged, and kept a human in the loop on the high-stakes calls. The output went from "this policy is risky" to "be concerned about this policy, because this segment says they can sell your data forever to third parties." People used it. The model never changed. The framing did. The bill for getting there was about a year of rebuilding.

How bad framing shows up

Stand the two stories side by side and the shape is the same. The technology worked. The team's model of the problem decided the outcome. Bad framing is not a model that underperforms. It is a model that performs exactly as built, pointed at the wrong question, and it stays invisible because nothing about it looks broken while it is happening. JP Morgan's dashboards were green the whole time.

It shows up two ways. The first is a frame that was wrong from the start and nobody questioned: the field chasing FLOPs, a team assuming users wanted a chatbot. The second is a frame that was right at launch and quietly stopped being right, when the data shifts or the benchmark and the users drift apart and nobody reframes. This course is about catching both, ideally before the year of rebuilding.

Different from product framing

Most courses on framing AI projects are product management courses adapted for AI. They teach how to write the first frame: the PRD, the user research, the success metrics, the stakeholder alignment. That work is necessary, and this course does not replace it.

What a product course will not teach you is that the bottleneck for attention is memory bandwidth, or that an accurate privacy-policy chatbot can hit every metric and still be the wrong product. Those are framing decisions about the technical and human system around the model, and they are where AI projects are won or lost. A product course teaches you to write a clear PRD for a chatbot. It will not teach you why the chatbot was the wrong thing to build.

Applies generally

The two stories are about as far apart as AI gets. One is a CUDA kernel with no users; the other is a customer product that was nothing but users. The framing question carried across both anyway: what is this actually for, and what is the real constraint. That question does not care whether you are working on traditional ML, GenAI, or agents. The tools inside the answer change every year. The question does not.

See how the strategies change per class →

Walk through the framework →

Companion site for the AI Problem Framing course on Maven.