Why Businesses Are Turning to Lower-Cost AI Models for Everyday Tasks
As AI costs rise, companies are increasingly exploring cheaper AI models to handle routine workloads. Discover how cost-efficient models could reshape enterprise AI spending and adoption.
The rapid growth of artificial intelligence has largely been driven by a simple belief: larger models deliver better performance, and the most capable models ultimately dominate the market. But the industry may soon face a different reality if that assumption changes.
Rising AI costs have already prompted many users to reconsider smaller, more affordable models. This shift toward cost-conscious model selection is still in its early stages, and its long-term impact remains uncertain. However, the consequences could be significant if the trend continues to gain momentum.
One of the clearest forecasts comes from Coinbase co-founder Brian Armstrong, who believes most AI workloads will eventually migrate to lower-cost systems.
“[D]emand for intelligence is near infinite, but 80% of workloads will be running on 99% cheaper models within 12-18 months,” Armstrong wrote on X. “20% of workloads will still run on latest gen models where IQ maxing is important.”
If that prediction proves accurate, it could represent a major turning point for the AI sector.
Until now, AI providers have largely competed on performance, encouraging customers to adopt the most advanced models available. However, if the same tasks can be completed with smaller and less expensive models while maintaining comparable quality, the economics of AI could change dramatically. A substantial portion of those savings would likely come at the expense of major AI labs, potentially impacting companies such as OpenAI and Anthropic as they move closer to public market offerings.
The prospect raises an important question for the industry: are organisations prepared to transition to smaller models?
Early evidence suggests that lower-cost models can handle many tasks without sacrificing performance when deployed effectively. In a recent experiment conducted by the legal AI company Harvey, inference costs were reduced by roughly threefold without a decline in output quality. The test, carried out alongside inference platform Fireworks AI, combined Claude Opus with Fireworks’ GLM 5.1 model, reserving Opus only for the most demanding workloads. The result was lower infrastructure usage and significantly reduced operating costs.
“Quality comes first, and in legal it always will,” Harvey co-founder Gabe Pereyra said. “However, the definition of quality is evolving from simply using the most powerful model for everything to using the best model that gets the right answer most efficiently.”
Discussions about this shift are often framed as a battle between major Western AI labs and Chinese or open-weight models. But that interpretation may overlook the more important distinction.
The key divide is increasingly between large models and small ones. While businesses can cut costs by moving from GPT-5.5 to a model such as DeepSeek V4 Flash, they may achieve similar savings by adopting smaller offerings from the same providers, such as GPT-5.4-mini.
Meanwhile, competition over pricing continues to intensify between proprietary AI services and independently hosted open-weight alternatives. Yet when it comes to the broader debate over model size, the specific source of a smaller model may matter less than its ability to deliver acceptable results at a lower cost.
Although the idea of using only the amount of computing power necessary may seem obvious, it contrasts sharply with the scaling-focused strategy that has defined the AI industry for years. Influenced by the belief that larger models lead to better outcomes, AI labs have invested heavily in training increasingly compute-intensive systems that push the boundaries of model capabilities. With investor funding helping to subsidise pricing, customers had little incentive to choose anything other than the most advanced option.
Now, however, token costs are rising, and subsidies are becoming less generous, exposing users to cost pressures that were previously less noticeable.
Whether those pressures will push enterprises toward smaller models remains unclear. Organisations may instead reduce the number of AI requests they make, limit context windows, or abandon deployments that generate less value.
Still, if companies discover that most AI applications can operate just as effectively on smaller, cheaper models, it could significantly reduce demand for expensive inference resources and create new challenges for companies seeking to justify the enormous costs of training frontier AI systems.
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0