Teaching to the test

Conversations & News

Teaching to the test

This post analyzes the disconnect between modern educational goals and outdated assessment methods. While curricula increasingly emphasize high-level competencies like critical thinking, collaboration, and creativity, traditional exams continue to prioritize surface-level memorization and individual performance under pressure. Research suggests that this misalignment leads to curriculum narrowing, where teachers focus primarily on tested content rather than the deep learning required for the future workforce. Furthermore, the rise of generative AI has exposed the fragility of standard testing, as machines can now easily replicate the tasks typically used to measure student success. CANeLearn advocates for a shift toward formative and performance-based assessments which better predict post-secondary readiness and reflect real-world professional environments. Ultimately, we argue that current standardized testing serves as a barrier to meaningful educational reform and fails to value the skills students actually need.

The Assessment Paradox

Educational institutions today operate within a profound contradiction. Our mission statements and modern curricula are saturated with the rhetoric of the 21st century—promising to cultivate creativity, critical judgment, and complex problem-solving. Yet, the moment we seek to measure success, we retreat to 19th-century testing architectures designed for industrial compliance. This is the “Assessment Paradox”: we claim to value the innovator, but we certify the memorizer.

Assessment is never a neutral diagnostic; it defines our systemic behaviour. As Michael Fullan (2016; 2023) argues, assessment remains the “least changed and most constraining” element of education reform. It acts as a cognitive ceiling, incentivizing surface-level content recall while penalizing the deep, transformative learning required in a rapidly changing, AI-driven world.

Tell Me What You Test, and I’ll Tell You What You Value

A fundamental principle of the research on learning is that assessment directly informs pedagogy. Research by Black & Wiliam (1998; 2009) confirms that teachers—often acting in a “survival mode” dictated by institutional pressure—will inevitably align their instruction with the specific content and format of high-stakes exams.

The result is what Wayne Au (2007) terms “curriculum narrowing.” When a system prioritizes basic cognitive recall over complex competencies, teachers are compelled to abandon broader, essential skills to focus on the “testable” skills. This is not a failure of teacher professionalism; it is a systemic design flaw. The OECD (2023) continues to highlight this global misalignment: we have 21st-century goals being strangled by 19th-century metrics.

“Tell me what you test, and I’ll tell you what your system actually believes.”

The “Cram, Dump, Forget” Cycle vs. Deep Learning

Traditional high-stakes exams typically reward “surface learning”—the short-term retention of isolated facts. This creates a wasteful cycle where students “cram” for an exam, “dump” information onto a page, and promptly forget it once the pressure subsides. Roediger & Karpicke (2006) demonstrate that while this may produce passing marks tomorrow, it fails to generate long-term retention.

In contrast, the National Research Council (2012) emphasizes “deep learning,” defined as the ability to transfer and apply knowledge to new situations and contexts. The PISA framework (OECD, 2017) is an assessment that measures 15-year-olds’ capacity to apply their reading, mathematics, and science knowledge to real-life challenges, rather than just reproducing curriculum content. It identifies this “transfer” as the true mark of competence. Yet, because transfer is harder to measure, we continue to prioritize the 48-hour memory window over the development of lifelong expertise.

“We’ve confused remembering for 48 hours with learning for life.”

The AI Litmus Test: If a Machine Can Pass It, the Test is Broken

The emergence of Generative AI has acted as a stress test for modern education, exposing the fragility of our metrics. ChatGPT and other Large Language Models can now perform traditional academic tasks—essays, coding, and standard problem-solving—at or above human levels (OpenAI, 2023; UNESCO, 2023).

Rather than engaging in pedagogical redesign, many institutions have responded defensively, with a retreat to increased surveillance and handwritten exams. As Selwyn (2024) and the OECD (2025) note, this is an attempt to “outsmart the photocopier” rather than acknowledging a hard truth: if an algorithm can pass your test, your test is no longer a reliable indicator of human ability. This technological disruption makes human-only skills—like collaborative ethics and critical judgment—more important than ever, yet our summative tests and exams continue to ignore them.

“If a machine can pass your test, maybe your test isn’t measuring what matters.”

Individual Performance in a Collaborative World

There is an astounding disconnect between the isolation of the exam hall and the collaborative modern workplace. The World Economic Forum (2023) ranks collaboration and team-based problem-solving among the most vital workforce skills. However, our assessment systems are almost exclusively individual and tool-restricted.

While Barron (2003) has shown that collaborative environments produce deeper understanding, we continue to test students in an artificial vacuum. We prohibit the use of the very tools and social networks that students will be expected to master in their careers. By ignoring “student agency” and “co-agency” (OECD, 2023), we are essentially preparing students for a world that has already ceased to exist.

“We prepare students for a world that no longer exists—and then test them as if it still does.”

The Case of Québec: A “Great Curriculum” Held Hostage

The Québec Education Program (QEP) is built on “Cross-Curricular Competencies”—intellectual, methodological, and social goals designed to develop over multi-year cycles. However, this sophisticated framework is frequently undermined by an “episodic” assessment system that prioritizes short-term snapshots over long-term progression.

QEP Curriculum Goals	Ministry Exam Reality
Cross-Curricular Competencies: Intellectual & Social growth.	Numerical Ranking: Performance reduced to a single percentage.
Collaboration: Cooperating and working in teams.	Isolation: Prohibiting assistance or social interaction.
Methodological Competency: Effective use of ICT and tools.	Tool Restriction: Banning technology and “outsmarting” AI.
Real-World Application: Meaningful, contextualized learning.	Artificial Tasks: Decontextualized and predictable formats.
Progression Over Time: Development across multi-year cycles.	Episodic Snapshot: High-stakes performance in a few hours.
Creativity: Generating original ideas (Lucas & Spencer, 2017).	Standardization: Rewarding predictable, “safe” answers.

“Québec didn’t design a bad curriculum. Quite the opposite—it designed a very good one. We just assess it as if none of it matters.”

The Solution Exists—We Just Don’t Scale It

We are not lacking in evidence for what works. Formative assessment—the iterative process of feedback during learning—boasts a massive effect size of 0.68 to 0.90 (Hattie, 2009; 2023). Crucially, Black & Wiliam (1998) found that formative practices particularly benefit lower-achieving students, making it one of our most effective tools for reducing achievement gaps.

Furthermore, performance-based tasks and portfolios better predict college readiness than any standardized test (Darling-Hammond et al., 2014). Why, then, is it the least used? Because it is “inconvenient.” It requires teacher expertise, time, and professional judgment rather than automated scoring (OECD, 2013). Moreover, assessment remains anchored by its role in driving university admissions, creating a systemic barrier to the paradigm shift we desperately need.

“The thing that works best is the thing we use least—because it’s inconvenient.”

Beyond the Bubble Test

The ultimate goal of education must pivot. In the age of AI, producing efficient test-takers who can replicate information is a redundant exercise. We must move toward assessing what only humans can do: exercise ethical reasoning, apply creativity to novel problems, and adapt within collaborative frameworks.

We must decide what we want our schools to produce. Do we want individuals who can navigate a rubric, or problem-solvers who can navigate the world? The current crisis is not about a lack of data; it is about a lack of courage to measure what truly matters.

We’re not against assessment. We’re against assessing things that don’t matter.

References

Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36(5), 258–267. https://doi.org/10.3102/0013189X07306523

Barron, B. (2003). When smart groups fail. The Journal of the Learning Sciences, 12(3), 307–359. https://doi.org/10.1207/S15327809JLS1203_1

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74. https://doi.org/10.1080/0969595980050102

Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5–31. https://doi.org/10.1007/s11092-008-9068-5

Darling-Hammond, L., Adamson, F., & Cook-Harvey, C. (2014). Beyond the bubble test: How performance assessments support 21st century learning. Jossey-Bass.

Fullan, M. (2016). The new meaning of educational change (5th ed.). Teachers College Press.

Fullan, M. (2023). The right drivers for whole system success (updated ed.). Centre for Strategic Education.

Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.

Hattie, J. (2023). Visible learning: The sequel. Routledge.

Kumar, S., et al. (2024). Large language models and reasoning: A systematic analysis. arXiv preprint. https://arxiv.org/abs/2404.01036

Lucas, B., & Spencer, E. (2017). Teaching creative thinking: Developing learners who generate ideas and can think critically. Crown House Publishing.

National Research Council. (2012). Education for life and work: Developing transferable knowledge and skills in the 21st century. National Academies Press. https://doi.org/10.17226/13398

OECD. (2013). Synergies for better learning: An international perspective on evaluation and assessment. OECD Publishing. https://doi.org/10.1787/9789264190658-en

OECD. (2017). PISA 2015 results (Volume V): Collaborative problem solving. OECD Publishing. https://doi.org/10.1787/9789264285521-en

OECD. (2018). The future of education and skills 2030. OECD Publishing.

OECD. (2023). OECD future of education and skills 2030: Progress report. OECD Publishing.

OECD. (2025). AI adoption in education systems. OECD Publishing.

OpenAI. (2023). GPT-4 technical report. arXiv preprint. https://arxiv.org/abs/2303.08774

Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249–255. https://doi.org/10.1111/j.1467-9280.2006.01693.x

Selwyn, N. (2024). AI and the future of education: Critical perspectives. Polity Press.

Sturgis, C. (2016). Reaching the tipping point: Insights on advancing competency education in New England. CompetencyWorks.

UNESCO. (2023). Guidance for generative AI in education and research. UNESCO. https://unesdoc.unesco.org

World Economic Forum. (2023). The future of jobs report 2023. World Economic Forum.