AI Featured

LLM “intelligence” is a dark pattern

LLMs leverage scaffolding and user psychology to appear as slabs of raw intelligence in service of a costly illusion.

19 Nov 2025 • 19 min read

Robot, Prohibited, and Brain using Facebook's emojis.

This post is a follow-up to last week’s post: A less technical guide for understanding large language models. You don’t need to read it, but it might make parts of this post easier to follow. Part three of this series can be found here.

In 2010, user experience (UX) consultant Harry Brignull began documenting software design choices he dubbed dark patterns (now called deceptive design patterns by Brignull and some researchers). Dark patterns are subtle ways of shaping user behavior within a software interface. Can’t find the button to make your Threads post private?^[1]¹ Contrast that with how easy it is to create an account and publish posts. In a former life, I was a consumer educator writing weekly about data breaches. I quickly realized that to keep my audience safe, I’d need to inform them about dark patterns, as many online services steer users toward insecure and privacy-poor settings. For example, Venmo transactions are public by default^[2]² because, for some reason, the app has always pretended to be a social media platform.

Dark patterns, or deceptive patterns, weren’t brand new in 2010. Since the dawn of advertising, companies have worked with psychologists to shape consumer behavior. For instance, US supermarkets are a behavioral laboratory optimized to weaken impulse control. Even the checkout lane is a trial: candy bars are placed at a child’s eye level, with the wait long enough for temptation to take hold. While the psychology of deception hasn’t changed, the medium has. Given that American adults spend an estimated 18 cumulative years online, Brignull’s dark patterns show how manipulation became standardized in the interfaces that dominate our waking hours. If supermarkets are laboratories, then dark patterns are transforming our entire society into a research complex. That’s why learning about dark patterns is necessary to understand how the internet contributes to mental health crises, misinformation, democratic erosion, and political violence.

What makes deceptive patterns so “dark?”

Although dark patterns collectively have societal impacts, understanding the anatomy of a dark pattern requires looking at a specific example to see how it harms individual users.

Regular readers of the blog may recognize that I’ve talked about dark patterns before. That’s because they’re clear examples of information asymmetry, a concept that’s central to Misaligned Markets. In a post titled The lethal economics of corporate kung fu, I alluded to my first experience with a dark pattern. Over a decade ago, I subscribed to The Economist, and the subscription auto-renewed without my knowledge.^[3]³ Upon logging in, I couldn’t find a cancellation button or a subscription management page. After a few minutes, I discovered a dedicated UK phone line, available only on weekdays during standard business hours. As a US resident who discovered this at 4 pm on a Saturday, I was livid.

The real insidiousness of a dark pattern can be illustrated by contrasting it with the opposite of the action it distorts. Want to subscribe to The Economist? You can do so online in a few clicks from their homepage, where each click is part of a transparent and logical flow. The homepage takes you to the subscription page, and then to a payment area. But if you want to unsubscribe, you must call a hidden phone number because... the transatlantic cables turn off for teatime? To this day, The Economist (still?!) prefers to increase the marginal cost to unsubscribe for the average reader rather than believe in the quality of their content.

This experience, while far from the worst thing that’s happened to me, is etched into my mind because it was a disrespect of my autonomy and a violation of my trust. At their core, that’s exactly what dark patterns are: they disregard our intentions to nudge us toward choices that benefit a service provider at our expense. Companies can get away with this behavior because they know more about what they’re offering than we do, which enables them to frame and present information in a way that advantages them.

Fifteen years after Harry Brignull first catalogued them, dark patterns are now ubiquitous. While a single dark pattern is merely annoying, navigating multiple in succession across services can be mentally exhausting. Did you remember to cancel that new trial you just signed up for? Don’t forget your gym membership auto-renews tomorrow! Strange charge on your credit card? Must be because your phone bill’s sticker price differs from the actual cost! The result is a death-by-thousand-cuts style assault on your attention and mental energy. Philosopher Peter Wolfendale created the concept of cognitive economics to describe how modern systems create and exploit asymmetries in cognitive resources like time, attention, and knowledge. He doesn’t mention dark patterns by name, but they perfectly fit into his framework of demand-side inefficiencies.^[4]⁴

Market capitalism, especially in countries in the Anglo-capitalist tradition, celebrates consumers as the center of the economy—as implied by idioms like “freedom of choice” or “one dollar, one vote.” However, this notion hides the comprehensive asymmetry between buyer and seller.^[5]⁵ It’s just trivially true that producers almost always know more about their products and services; if we knew as much as they did, we likely wouldn’t rely on them to solve a problem. This dependency can easily be abused, and subtle exploitation of trust is at the heart of why our markets are misaligned. We live in a world where the path of least resistance to profits lies in increasingly innovative ways of deception rather than treating customers and workers with dignity and respect. So long as companies continue to increase the monetary, social, and cognitive costs of exercising choice, we won’t truly have “free” markets.

Contemporary dark patterns in AI

Dark patterns require us to remain vigilant if we want to avoid having them shape our behavior, but as technology evolves, so too do design patterns. As personalized, interactive systems, large language models (LLMs) are the logical next frontier to watch for emerging deceptive design. Although LLM interfaces are nascent, they’re already showing worrying signs in this regard. This year, there’s been growing concern over chatbot behaviors like ~~glazing~~ sycophancy, overconfidence in falsehoods, and misaligned intent, where a chatbot insists on carrying out tasks in a manner against a user’s wishes. Some of these tendencies have been explored in technology literature like Benchmarking Dark Patterns in LLMs (DarkBench).^[6]⁶ Researchers are even investigating dark patterns in AI-generated code, downstream of LLM services.

I think efforts to document dark and deceptive design patterns in LLM interfaces are extremely important, but risks missing the forest for the trees. Individual LLM dark patterns don’t exist in isolation. Together, they sustain the illusion that LLMs possess “human-level” intelligence; this then increases users’ dependence and engagement. Essentially, the way LLM service providers launder intelligence—through dark design patterns, marketing, and hidden orchestration layers—is the defining feature of LLMs. The consequences of this are costly for everyone involved. Providers, locked into perpetuating this illusion, must spend vast sums scaling infrastructure and staff. Users, in turn, pay the price financially, psychologically, and socially.

Arguably, everything—from failed and expensive AI pilots to cases of chatbot psychosis, general delusion, and even murder—is downstream of the belief that today’s models are plug-and-play, human-like minds ready to serve as service workers, customer support, white-collar employees, friends, film stars, therapists, teachers, doctors… basically anything and everything to everyone. As AI pilots fail, “cyberpsychosis”^[7]⁷ grows, and people are (re)hired to clean up the mess from thoughtless pivots to AI, we will come to reckon with the costs of what is effectively a “mass delusion event,” as The Atlantic’s Charlie Warzel put it.

Seeing LLMs as a dark pattern

To truly appreciate the idea of LLMs as a dark pattern, you first need to understand the concept of sociotechnical systems. OpenAI researchers actually used this term in a September 2025 paper titled Why Language Models Hallucinate.^[8]⁸ Though it sounds like a buzzword, the term is critical to understanding systems where expectations and continuous human feedback shape a technology in a feedback loop.^[9]⁹

The paper argues that the way language models are trained and tested matters as much, if not more than, their underlying architecture. Training and benchmarking are institutional processes informed by research but driven by norms (hence “socio”), while model design is “technical.” Hallucinations emerge both from the statistical limits of probabilistic models, as well as training and evaluation regimes that reward producing answers over expressing uncertainty. This is how we get language models’ trademark behavior of being confidently wrong about obvious falsehoods.

The idea of intelligence as a dark UX pattern is similar. Media hype and public discourse prime users to interact with LLMs in specific, almost human-like ways, which then shape how models and their orchestration layers are adjusted. The result is a reinforcing cycle best illustrated by GPT-4o’s high degree of sycophancy in early 2025. By OpenAI’s own admission, this happened because users rated responses with sycophancy favorably, and engineers adjusted internal systems accordingly. The relationship between culture, user, system, and designer is much more complex than this, but I’ll save a deeper dive into the layers of this sociotechnical process for a future post.

I want to make it clear that by referring to intelligence as a dark UX pattern, I’m not suggesting that LLMs are completely useless or that any efficacy is entirely imagined. However, by their nature, LLMs are unscoped systems with poorly defined limitations and capabilities. But good user design provides affordances intended to guide users exclusively toward appropriate uses of an interface. For example, most cups have handles for your fingers. Touch interfaces are so intuitive that people grasp how to use them in seconds. Even something as complex as a video game like Super Mario Bros. has a score for feedback, as well as glowing coins to guide you toward optimal routes within a level.

Unless you're playing a Mario Maker level... then coins are used to mislead you

Conversely, most LLM interfaces consist of a single textbox inviting users to prompt models to do anything imaginable. In the absence of technical knowledge, users default to whatever expectations that marketing and the media have set as a baseline. Then, through trial and error, they must learn which capabilities and use cases have been embellished or exaggerated.

For most, this process involves copious amounts of vibes. Do the model’s outputs feel right? Then they’re “good.” For circumstances where you’d want to extensively evaluate an LLM’s output, you’ll replicate much of the effort that would have gone into doing the task yourself. This is without even considering contexts, like education, where an LLM is explicitly relied on as an expert. Thus, LLM intelligence in part depends on a specific psychological affect that keeps users sufficiently amazed rather than critically aware of limitations. In response to heightened scrutiny, companies are pushing users toward use cases where accuracy doesn’t matter as much to sustain positive affect. Sexbots anyone?^[10]¹⁰

Despite this flawed design, there are high-value (and boring) LLM use cases, mostly around business process automation. But arguably, these don’t compensate for the fact that, by design, LLMs are poorly scoped. Nor are they likely to justify the frenzied multi-hundred billion dollar spending spree and social costs of the tech in its current state. However, this story has played out before, and there are ways to make new technologies accountable to people.^[11]¹¹ One way I hope to contribute to this effort is by having honest conversations about the technology’s limitations.

The role of orchestration in LLM intelligence

Aside from the sociotechnical nature of LLMs, models rely on a hidden orchestration layer consisting of multiple subsystems to guide behavior. This was something that, even as a power user of these systems, I didn’t appreciate until I began working directly with LLMs on my home network. Over the summer, I built a local inference server, and I’m currently learning about retrieval-augmented generation (RAG), knowledge graphs, Model Context Protocol (MCP), and more. This experience was the direct inspiration for this post. Planning local LLM projects has shown me firsthand that the locus of an LLM’s intelligence, if you're inclined to use that word, is not in the model itself. Rather, it’s in how the model is set up in relation to subsystems that enable it.

There are teams of humans whose job is deciding what tools a model needs, how to enable these within a model’s environment, and then testing tooling configurations to ensure there’s an acceptable rate of error on well-defined tasks. Just like any other UX, LLM intelligence is sculpted, even if there’s a non-deterministic switch at the core of that experience “deciding” when to take specific actions.

I learned this lesson rather painfully after wasting weeks attempting to replicate the “vibe” of a large frontier model for brainstorming. Soft use cases like this fundamentally rely on armies of raters and engineers tweaking large models, with engaging affective design prioritized above all else. I’m now working on “point solutions” (to use IT speak). Instead of building a single system that can do everything, I’m going to build separate tools, such as a local project manager that can organize my calendar, a natural language search interface for my huge game library, and a search engine for resources on my network.

I’m sure there’s some way to create a larger-than-life multi-agent super AGI workflow that can serve me breakfast in bed. But after lurking in subreddits like r/RAG, r/AI_Agents, and r/LLMdev, I’ve learned that bigger is not always better. Scope needs to be the utmost consideration when it comes to getting value out of LLMs.

Multiple studies have vindicated this view. The frequently cited MIT report, The State of AI In Business 2025, indicates that just 5% of AI pilots successfully operationalize the technology. What are these 5% doing? They’re building systems that act autonomously “within defined parameters” by using tooling like MCP and RAG systems to provide “memory” and “adaptability.”

This means that the value of LLMs isn’t in adopting them for everything because they’re smart. Their utility can only be realized when technical people, with no ulterior motive, build scaffolding for specific workflows that have been tested for reliability.

But even when these conditions are met, there are still things LLMs aren’t very good at, and it’s hard to know in advance what those are. This places the cognitive burden and responsibility of how to use LLMs on every user, even technically capable ones. By pushing non-technical people, en masse, to these systems without explaining any of this to them, we are failing them. This isn’t a gotcha for anyone who finds LLMs impressive or who is ontologically committed to the idea that LLM systems and computers more generally resemble minds. Frankly, I don’t want to have that discussion.^[12]¹² My main point is that successful LLM use cases are designed experiences, with orchestration in one context not always generalizing to others.

It’s true that no UX is entirely neutral, as all interfaces are designed. But the completely hidden nature of LLM orchestration quite literally contributes to a dark UX pattern. And because of this, users are driven to believe that models alone drive outputs. Combine this with our discourse around models being “human-level” in most productive tasks, and you get users gaslighting themselves into accepting falsehoods from these systems. Even for the discerning user, most LLM interfaces don’t provide ways of interrogating or correcting outputs aside from prompting.^[13]¹³ I think the sooner we can get away from naive discussions that anthropomorphize LLMs, the sooner we can move on from this mass delusion event and have productive conversations about LLMs and the role of machine learning in society more broadly.

The cost of the intelligence illusion

I’ve spent a lot of time highlighting how LLMs are currently failing users, but honestly, it’s hurting the tech industry too. I’m not asking you to shed a tear for the largest corporations in history, but it looks like the commoditization of LLMs might have come too early. What this means is that the eye-popping levels of investment going into infrastructure for today’s transformer-based language models^[14]¹⁴ are limiting investment into more capable architectures that could actually secure long-term value. This isn’t me saying this: Llion Jones, one of the researchers who created transformers, voiced this opinion last month. Jones is not alone, with other leaders in the field like Andrej Karpathy (coiner of the term vibe coding) having also recently spoken candidly about the limitations of transformer-based LLMs.^[15]¹⁵

Most of the money, hardware, workforce, electricity, data centers, and training data being allocated today is going towards scaling transformer language models and little else. Demand for today’s models is not high enough to justify their costs, and so prices are heavily subsidized—even the $200 per month ChatGPT Pro subscription loses money.^[16]¹⁶Providers are promising that scale will bring better models, but collectively in 2026 alone, the industry is projected to spend nearly the entire GDP of Norway to finance this growth.^[17]¹⁷Even assuming there’s greater demand for the next generation of transformer models, they’re going to be much more expensive to train and possibly run.

This is why there are now fears of an industry-wide bubble. It’s become so apparent that prominent voices in tech, from Jeff Bezos to Mark Zuckerberg to Sundar Pichai to Sam Altman himself, are saying it. The media, in tandem with these leaders, is suggesting this might be a productive bubble,^[18]¹⁸though there’s the critical question of what will be left when the bubble pops that no one seems to be answering.^[19]¹⁹

Aside from data center buildings, most investment is going to consumables—labor, computer hardware, and electricity. These are things that will not completely pass on to the next business cycle. Large-scale transformer language models, too, could also be abandoned in favor of smaller, more efficient architectures, given how expensive and unreliable current LLMs are without extensive scaffolding. But there’s no guarantee that future architectures will be compatible with today’s hardware or orchestration workflows.

Even if you want to make the bullish case for transformers, the question is how quickly companies can move from a “scale is all you need” paradigm towards models that are cheaper to train and that perform better. Performance benchmarks for most general-use LLMs are converging, with many models performing similarly. This is partly a flaw with benchmarking as a paradigm, but this also partly captures the fact that most state-of-the-art models, including ones that can be run locally, are interchangeable for the most common use cases.^[20]²⁰

It’s true that frontier providers like OpenAI and Anthropic offer larger models with industry-leading orchestration, which gives their models a more robust “common sense,” but this doesn’t always translate to business value. Even as early as 2023 and 2024, industry investors and leaders were telling us not to expect a massive leap between GPT-4 and GPT-5, partly because many suspected that scale alone would not make a difference.

Given all this, then, why is the industry still scaling? This is partly out of inertia; some scaling was expected given the excitement around transformers. But now, much of this expenditure is literally locked in. Permitting and contracts for data centers take years, so they are a long-term commitment, and some companies are going as far as to take on debt (and putting it off the books) to finance these obligations. There is also a perverse incentive to announce new deals too. Companies’ valuations and their ability to borrow capital depend on the illusion of progress. So long as stock prices move upwards with every deal announcement, there’s motivation to double down on growth through scale.

What’s so strange to me is that the market had a similar bubble scare back in January 2025, when the open source Chinese model DeepSeek launched, and the exact same fears came to light. DeepSeek was smaller and nearly at parity with GPT-4. The question back then, as it is now, was what is all this scaling for? Had investors done any homework in the past ten months, they would understand that there’s a whole ecosystem of companies whose technologies are addressing the limitations of transformers today.^[21]²¹

Investing in these companies does not require burning gas or putting stress on our electrical grid. There’s a saner version of the LLM investment cycle, where excitement is focused on the LLM orchestration layer, and computer science researchers can focus on more promising forms of machine learning. Unfortunately, we do not live in that world.

There’s so much more to talk about...

Originally, I wrote a version of this blog post where I tried to fit most of my thoughts. I decided to give each of my ideas breathing room, so there are at least two more posts I want to publish after this one, but possibly six. Here’s what I’m potentially thinking about writing next:

What to expect when your economy is expecting a bubble. This will mainly be in the form of residual effects. Just like economists couldn’t predict 2008, I’m not trying to predict when the next crash will happen or how bad it will be. Rather, I want to discuss the idea that this will be like the dot-com bubble and why bubbles happen at all.
How to read stories about AI hype. This will be a set of heuristics for cutting through the noise when it comes to stories around machine learning or large language model developments.
The sociotechnical nature of LLM intelligence. I want to elaborate more on the relationship between society, user, and LLM engineer. I had some of that content in this post, but cut it for clarity.
LLMs are the perfect microcosm of what’s wrong with capitalism. What do alienation, commodity fetishism, capitalist serialization, LLMs, and F.A. Hayek have in common? Hopefully, you find out soon.
A ranking of LLM use case categories. Based on my personal experience, trying to build stuff with these systems locally. I’m not sure how committed I am to this one, though, as I’m trying to avoid explicitly IT-centric posts. A less technical guide for understanding large language models is an exception, as it helped me organize my thoughts for this dark pattern post.
How societies push for positive technological change. I’m still working, though planning this one, and it has me dusting off some old books from college. It might be a while before I get to this, but the goal is to write something that generalizes beyond AI.

If any of these topics sound interesting, I’d love to hear your thoughts on what you think should come next. Subscribe to share your opinions and to stay updated about upcoming posts!

I've never used Threads, but originally, I had Facebook here and realized that wasn't relatable (I don't really use new social media 😝).
In addition to making transactions public, Venmo used to have a public API that let anyone on the internet scrape this data. The API is no longer active, but the 2018 project I linked to was done when the API was available.
I promise I’m not a 40-year-old white dude. It was mandatory for a college course.
Wolfendale's demand side inefficiencies have some overlap with the concept of a phishing equilibrium from Phishing for Phools, a book I've talked about before. Even though it's just a sketch, Wolfendale's approach is more robust and a better jumping-off point for the adversarial profit-seeking tendencies that I talk about here at Misaligned Markets.
I’ve started outlining this in ideas like corporate kung fu, capitalist serialization, and the paradoxes of market capitalism.
DarkBench is an attempt to evaluate dark patterns in chatbots; it grew out of the authors' earlier work. See here and here or read the current DarkBench paper.
Fans of Mike Pondsmith's Cyberpunk have taken to calling AI delusions cyberpsychosis after a condition that occurs in that universe's lore. Also fans of the SCP collaborative web fiction community are using the term.
This paper is kind of very high-level and not exactly groundbreaking, though I don't think it was intended to be. It highlights the ways pre- and post-training of LLMs influences model behavior.
Some people argue all technologies are like this, in that society and culture always mediate how we use systems. This may be true, but understanding LLMs requires making this notion more explicit which is what I'm aiming to do.
Open AI is allowing adult content on their platform, supposedly starting next month.
Anglo-capitalist societies have long embraced the Kehoe approach to measuring the social costs and harms to technology. Yes, everything goes back to leaded gasoline for me. This and Karl Polanyi's Double Movement have informed my thinking on most issues of social progress.
While I like philosophy, I'm kind of burnt out on this particular topic. Go read Elizabeth Sandifer's recent post at Eruditorum Press instead. She covers some of the history of computational functionalism ("brain as a computer").
I'm using the notion of interface very broadly here. For most foundational model services like ChatGPT, custom prompts and prompting are pretty much the way you interact with the model. For more tailored services or with custom local LLMs, you often have other ways of tweaking model behavior like temperature, top_k, and penalties that influence vocabulary and word usage. These still don't guarantee reliability, but provide some control.
Transformer refers to the machine learning architecture of LLM systems.
Richard Sutton, Yann LeCun, and Gary Marcus are other machine learning veterans saying this. Though, as I pointed out in a footnote in "What do AI alignment fears reveal about market capitalism?" Scholars who are skeptical of the concept of AGI have been calling out this technology's limitations far longer.
Ed Zitron, who recently recieved Microsoft's OpenAI revenue share numbers revealed that Open AI's costs for running models are going up steadily, quarter-over-quarter, much faster than revenue. AI is a loss leader for every player, but foundation model services like Anthropic and OpenAI are the most at risk, given they're not decades-old tech giants. They are where a bubble could start.
Industry spend numbers are notoriously tricky to nail down, but at a casual glance I've seen anywhere between $400-$600bn being estimated in spend next year depending on the source. The GDP of Norway is currently about $506bn.
Bubbles can be "productive" if they leave behind residual productive capital that can kick off the next economic boom. The media has taken to calling this a "good bubble" which is the dumbest thing I've heard. Bubbles are never good, even with silver linings. We also live in a strange time when our tools for fighting recessions/depressions are constrained and money is only flowing to a handful of asset classes. This will make even a "good" bubble more painful than productive bubbles of the past.
I'm working on a framework called The Great Relay Race to help me think through this. I'll hopefully be able to write about this soon.
I'm sure someone's pulling out a benchmark trendline in response to what I wrote. IMO, Benchmarks really only matter to enthusiasts, which I guess I'm technically one. It is really difficult to extrapolate from benchmarks alone how well a model is going to do on a task. And even if benchmarks were strong signals of performance, for most LLM use cases, this extra performance benefits edge cases. Companies are going to need to make models more reliable across a much broader range of tasks to bring in new users.
Neo4j, Chroma, Pinecone, CrewAI, and Letta are ones I'm passingly familiar with. The fact that the average person doesn't understand the role frameworks and digital infrastructure play in making LLMs actually work outside a chat interface is criminal. These are all private companies, so it's hard to know what they're valued at, but given how much investment has gone to inference, this ecosystem feels undervalued.