Desirable Difficulty: Why the Strategies That Feel Worst Work Best
The previous three posts in this series documented a pattern. Retrieval practice depresses accuracy during study but outperforms rereading a week later. Spacing slows acquisition but produces stronger retention across every meaningful interval. Interleaving reduces practice scores but sharpens discrimination on delayed tests. In each case, the strategy that felt less productive during learning turned out to be more effective for learning — and learners predicted the wrong outcome.
That pattern is not coincidental. It reflects something systematic about how memory works, and about why our intuitions about learning are so reliably wrong. The concept that ties these findings together is called desirable difficulty, a term introduced by Robert Bjork in the 1990s. But the label is easy to misunderstand. Desirable difficulty is not a strategy, not a technique, and not something you directly apply. It is a conceptual lens for understanding why certain effective learning conditions feel counterproductive while they are happening.
Two Kinds of Strength
The crossover patterns documented in Posts 2–4 become less puzzling once you draw a distinction between two properties of any memory.
The first is retrieval strength: how accessible a piece of information is right now. Can you recall it quickly? Does it come to mind without effort? Retrieval strength is what you experience during a study session when the material feels fluent and familiar. It is also what standardized confidence ratings capture — your subjective sense of how well you know something in the moment.
The second is storage strength: how durably that information is embedded in long-term memory, how richly connected it is to other knowledge, and how resistant it is to forgetting over time. Storage strength is what determines whether you'll still know that material next month. You cannot directly observe it during a study session. It reveals itself only later, on delayed tests or in real-world application.
These two properties do not move in lockstep (Bjork & Bjork, 1992; Soderstrom & Bjork, 2015). In fact, they often move in opposite directions. Rereading a chapter three times in an evening produces high retrieval strength by the end of the session — the material feels solid, accessible, known. A week later, much of that accessibility has evaporated. Conversely, struggling through a retrieval practice session where you get half the answers wrong feels unproductive in the moment, but the effortful reconstruction builds the kind of durable trace that resists forgetting.
The theoretical account proposes that gains in storage strength are largest precisely when retrieval strength is low at the time of practice. When information comes to mind easily, additional encounters produce diminishing returns for long-term learning. When it requires effort to retrieve, that effortful processing generates the conditions under which storage strength increases most (Soderstrom & Bjork, 2015). This is the mechanism behind each of the strategies covered earlier: spacing allows retrieval strength to decay so the next encounter involves genuine reconstruction. Retrieval practice forces generation rather than recognition. Interleaving disrupts the fluency that blocked repetition creates.
Why We Choose Wrong
If the evidence for these strategies is so strong, why do most learners avoid them? The metacognitive research offers a clear answer: people use how productive studying feels as their guide to how productive it actually is, and the two are poorly correlated.
In surveys of study habits, the numbers are stark. When asked to choose between massed and spaced study, the vast majority of students endorse massing as the more effective approach, despite spacing being one of the most reliably replicated findings in the entire memory literature (Soderstrom & Bjork, 2015). Similarly, most students rank rereading among their top strategies, while very few report using retrieval practice at all. The strategies that feel most fluent during practice are the ones learners gravitate toward, even though they produce weaker long-term outcomes.
The research on judgments of learning shows the same bias at a more granular level. When learners predict how well they'll remember specific items, their predictions are heavily influenced by how easily those items come to mind during study — not by factors that actually predict later recall, like the length of the retention interval or the depth of processing (Soderstrom & Bjork, 2015). Items that feel easy now get high confidence ratings. But ease of access during study often reflects recency and surface fluency, not the kind of durable encoding that supports recall days or weeks later.
One study illustrates the mismatch vividly. Participants learned to identify twelve artists' painting styles under either blocked conditions (all paintings by one artist, then all paintings by the next) or interleaved conditions (paintings by different artists mixed together). On a subsequent identification test, interleaving produced clearly better performance. But when asked afterward which condition had helped them learn more effectively, a large majority endorsed blocking — the condition that had produced the worse outcome (Bjork & Bjork, 2011). Their experience during practice told them one thing. The evidence told them the opposite.
A workplace example sharpens the point. When the British Post Office needed to train postmen to type postal codes on a new sorting-machine keyboard, researchers compared training schedules ranging from one hour a day to four. The most distributed schedule, a single hour per day, reached proficiency in the fewest total training hours and retained the skill best when workers were retested months later. Yet the workers on that schedule were the least satisfied with their training. They resented spending roughly twelve weeks to learn what colleagues on massed schedules covered in about three (Baddeley & Longman, 1978). The arrangement that produced the most efficient, most durable learning was the one that felt worst to the people living through it.
When Difficulty Stops Being Desirable
Nothing in the preceding section should be read as "harder is always better." The original framing of desirable difficulties included an explicit warning: difficulties are desirable only when they trigger the encoding and retrieval processes that support learning. When the learner lacks the background knowledge or skills to manage those difficulties, they become undesirable, producing confusion and frustration rather than productive struggle (Bjork & Bjork, 2011).
A more principled account of where the line falls comes from the Challenge Point Framework, developed in motor learning research and subsequently applied to cognitive tasks (Guadagnoli & Lee, 2004; Nelson & Eliasz, 2023). The framework proposes an inverted-U relationship between difficulty and learning. When practice is too easy, there is insufficient challenge to drive learning forward. As difficulty increases toward an optimal point, learning improves — but performance during practice declines. Beyond that point, difficulty exceeds the learner's capacity to process available information, and both performance and learning deteriorate.
The optimal challenge point is not fixed. It shifts with the learner's skill level and the complexity of the task. What constitutes productive difficulty for an advanced learner may be overwhelming for a beginner. And what's appropriately challenging for a simple task may be far too easy for a complex one. This means difficulty cannot be prescribed generically. It must be calibrated to where the learner actually is.
This framework helps explain a set of puzzling results around combining strategies. Spacing retrieval practice across sessions has sometimes produced additive benefits. Anatomy studies pairing distributed scheduling with self-testing found stronger retention than either strategy alone. But other combinations have fallen flat. In one study of foreign-language vocabulary, retrieval practice clearly outperformed passive restudy, yet layering an interleaved schedule on top of it added nothing further (Abel & Roediger, reported in Nelson & Eliasz, 2023). If each strategy independently increases difficulty, layering them may push the total past the point where additional challenge helps. The practical lesson: these strategies are not meant to be stacked indiscriminately. Each works, but whether adding another layer of difficulty helps depends on how much the learner is already managing.
The framework also clarifies why some superficially difficult manipulations fail entirely. A widely discussed finding suggested that presenting materials in a hard-to-read font could enhance memory. The idea had intuitive appeal — if difficulty helps, then perceptual difficulty should help too. But subsequent studies have had difficulty reproducing that exact effect (Nelson & Eliasz, 2023). A disfluent font adds perceptual difficulty that doesn't engage the processes through which learning actually occurs. It increases extraneous demands without triggering the effortful retrieval, comparison, and discrimination that build durable knowledge. Not all difficulty is created equal. The difficulty must be the right kind.
What This Looked Like for Me
When I entered medical school, I studied like students do at first. I prepared for school lectures, reread my notes, highlighted key passages, and reviewed material until it felt familiar. This certainly felt productive. However, after the first four in-house exams, it felt like my energy and effort wasn't being mirrored by my course performance. Don't get me wrong — I was doing okay on exams, hovering around the class average. However it felt like with so many passes of the information, I should have known the material at a deeper and more comprehensive level. The material felt solid going in to exams. I could look at a page of notes and think "yes, I know this." My confidence was high, but my performance didn't match. Perhaps more importantly, was that it ultimately felt like I was studying like I did in college again, but just in greater volume.
Over the first four class exams, I was tinkering with gradually incorporating various evidence-based study strategies as we outlined in this blog. By the fourth exam, I was tired of the effort-performance mismatch and decided to go all-in on what I perceived to be an "unconventional," and even risky, study routine. Even though I wasn't dedicated a greater total volume of time studying, this routine felt much harder, which truthfully was a source of anxiety.
What I didn't understand at first was that the doubt was the signal. The struggle that came with retrieval practice, reviewing content from days to weeks prior, and removing the guardrails and false sense of security that came from re-reading and highlighting, all of it was driving my performance in a way I couldn't truly appreciate at the time. In retrospect, those were the conditions under which storage strength was quietly increasing most. My old approach produced confidence, the new methods were producing competence. The strategies that felt least productive were the ones building the kind of knowledge that showed up months later on board exams.
The hardest part wasn't learning the strategies. It was tolerating the discomfort that came with using them: trusting that feeling uncertain during practice didn't mean I was failing to learn, and that feeling certain during rereading didn't mean I had learned.
The Core Lesson
Desirable difficulty is not a prescription for suffering. It is the observation that the strategies most supported by evidence share a common feature: they make practice harder in ways that engage the cognitive processes responsible for durable learning. Retrieval rebuilds memories rather than merely re-exposing you to them. Spacing forces reconstruction after decay. Interleaving demands discrimination among alternatives. Each of these processes depresses immediate performance while strengthening long-term retention.
But difficulty has limits. When the learner lacks sufficient background knowledge, when task demands exceed processing capacity, or when the difficulty is perceptual rather than cognitive, the benefits disappear. Desirable difficulty does not mean any difficulty. It means specific kinds of difficulty, the kinds that require the learner to reconstruct, generate, compare, and discriminate, applied at a level the learner can productively manage.
The most practically important takeaway is about metacognition: how you evaluate your own learning. If you are choosing study methods based on how productive they feel during the session, you are almost certainly optimizing for the wrong outcome. The fluency and confidence that come from rereading, massed practice, and blocked study are real — but they track retrieval strength, which decays rapidly. The discomfort that comes from testing yourself, spacing your reviews, and mixing your practice tracks something more valuable. Learning to tolerate that discomfort, and to distrust the false comfort of fluency, may be the most consequential adjustment a learner can make.
AceMedEd
This post is part of a series on the science of learning. Each post covers one evidence-based principle and how to apply it to your own studying. Follow us on Instagram @acemeded to keep up with future blog posts and related content
References
Baddeley, A. D., & Longman, D. J. A. (1978). The influence of length and frequency of training session on the rate of learning to type. Ergonomics, 21(8), 627–635.
Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the Real World (pp. 56–64). Worth Publishers.
Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From Learning Processes to Cognitive Processes: Essays in Honor of William K. Estes (Vol. 2, pp. 35–67). Erlbaum.
Guadagnoli, M. A., & Lee, T. D. (2004). Challenge point: A framework for conceptualizing the effects of various practice conditions in motor learning. Journal of Motor Behavior, 36(2), 212–224.
Nelson, A., & Eliasz, K. L. (2023). Desirable difficulty: Theory and application of intentionally challenging learning. Medical Education, 57, 123–130.
Soderstrom, N. C., & Bjork, R. A. (2015). Learning versus performance: An integrative review. Perspectives on Psychological Science, 10(2), 176–199.