2025 AI Milestones & 2026 Predictions: Expert Review

If looking at 2025 through the lens of hype headlines, the year passed under the slogan "another model release, another demo video." But when I sat down to tally my personal results, it turned out that entirely different things stick in the memory: not the loudest announcements, but the shifts that changed my routine, my expectations of tools, and the very bar of "what is even normal to ask of AI." In this article, I am deliberately not doing a "top news" review; instead, I am gathering what feels like it actually pushed practice forward: technologies that created a new class of tasks, provided a tangible economic effect, brought a fresh UX, and most importantly — proved to be reproducible rather than one-off conference tricks.

I am writing this as a practitioner who lived inside this "zoo" of models, hardware, agentic workflows, and regulatory news all year. The selection here is extremely subjective and intentionally based on personal experience: what simplified daily work, what I was willing to pay for voluntarily, what I recommended to friends, colleagues, and even parents, and what I safely forgot after a week. In some places, a new UX caught my attention; in others, the effect of scale; and in some, the feeling that "this is now the baseline, not a toy." These are the pieces that made up my 2025.

The High Bar of 2025 Models

If we simplify the picture of the year to a single axis, it would be the axis of "which model do you open by default when you need to do something non-trivial." In 2025, this upper bar shifted significantly: it became normal to expect not just a "smart chat," but stable speed, long reasoning chains, adequate code, and an absence of sudden failures in the middle of a task. It feels as though a new understanding of what constitutes "normal" AI performance in daily development and life has formed around a few flagships.

GPT‑5.1: The Daily Driver

For me, the end of 2025 is a time when GPT-5.1 became something of a reference point: if you want to understand "if this works well," you involuntarily compare it to this model. It sets the bar for the quality of long reasoning, for speed, and for how predictably it behaves in complex tasks — from architectural discussions to careful code refactoring.

In November and December, I practically live in GPT-5.1 Thinking mode: it is my daily work tool, the one I reach for first. I like the combination of its style — calm, structured — with how quickly it responds and how rarely it "crumbles" in the middle of a complex sequence of actions. At some point, you catch yourself thinking: if another model behaves noticeably worse, it’s no longer "well, it's also not bad," but "why should I even tolerate this when 5.1 exists."

Yes, GPT-5.2 is already available, but in the first two weeks after its release, I still couldn't understand how it is practically better than GPT-5.1. Given that the price is 40% higher ($1.25/$10.00 vs. $1.75/$14.00), the difference in feel doesn't justify the difference in the bill yet.

Gemini 3: Betting on Speed and a Sense of Presence

If GPT-5.1 sets the bar for quality, Gemini has aggressively claimed the title of champion for speed and the feeling of a "live" response this year. An important shift: when the answer arrives almost instantly, communicating with the model stops being like a "feedback form" and turns into an interactive loop — you change queries, edit code, and move in small steps rather than writing walls of text and waiting for a miracle.

I actively used Gemini 2.5 Flash and 2.5 Pro since their release at the end of March: I wrote code with them, just chatted, tested various ideas — and during that period, they seemed like a very successful balance of speed and quality. But in the fall, a sharp decline was felt: the model increasingly got confused, especially in agentic scenarios for writing code, where you need a sequence of actions with files and context rather than a one-off "complete the function." With the release of Gemini 3 in December, I want to consciously give the ecosystem a second chance and try using it as my primary model: for me, it is the main competitor to GPT-5.1 Thinking in the "daily driver" role. At the same time, Gemini 2.5 Flash Lite remains my number one model when I need to quickly process large volumes of information and get a coherent summary without unnecessary waiting.

Claude 4.5: The Model as a Full Participant in the Development Process

Claude 4.5 clearly defined another line of competition: not just "writing a piece of code," but integrating into work chains — planning, making changes, and careful iterations on a project. It is no longer a "chat that helps," but a component of the process: you give it a task, it suggests a plan, performs steps sequentially, saves artifacts, and returns when the architecture needs to be rethought.

Judging by Reddit discussions and developer reviews, Claude Sonnet 4.5 is what many are now calling the best model for writing code. In tandem with Claude Code, it truly feels like an ultimate tool: it’s one thing to get a one-off snippet, and another when the system itself leads you through the project, changes files, and tests hypotheses. The only problem is that you have to pay for this level of "magic": due to the high cost, I cannot fully switch to Claude for all coding and use it selectively, mainly in planning and complex analysis modes, where every run is truly worth the money.

Local Models: The Practical World of Open-Weight

2025 became the year when the open-weights ecosystem stopped being a purely experimental toy for enthusiasts. The ability to run a model locally and control data and infrastructure became a perfectly viable alternative, rather than an exotic choice for those who enjoy struggling with drivers. And most importantly — it’s not just about text models anymore: voice (tts/stt) and embeddings have seriously entered the game, used as separate building blocks in products.

In this field, gpt-oss, Qwen3, Gemma 3, and Llama 4 are prominent — entire stacks are forming around them for private installations and hybrid "cloud + local" scenarios. One of my most popular articles this year was specifically a note about running LLMs on your own hardware, where I break down configuration options and trade-offs in quality and speed. I like this topic to a ridiculous degree: there is something very right about the feeling that you have a personal "intellectual accelerator" under your desk that doesn't depend on the mood of external services and regulators.

Chinese Models: Pressure on Quality and Price

A separate line is Chinese models, which in 2025 significantly increased pressure on both quality and compute costs. DeepSeek 3.2, Qwen3, Kimi k2, MiniMax M2, GLM 4.7 — these are no longer "regional alternatives," but serious players that force teams to rethink their "build vs. buy" strategies. When you have access to models that are affordable and decent in quality, the economic calculation of a product changes very sharply.

The hype around DeepSeek r1/v3 in January became almost a cultural moment in Russia: people who had previously only heard the word "chatbot" found out about neural networks. An important detail — the model is free and doesn't require workarounds, so it is currently being used for masterclasses and internal corporate courses to explain to employees how to work with such tools. I personally recommended DeepSeek to my parents as an entry point into the world of neural networks: the low entry barrier and lack of "voodoo magic" around access help a lot when a person is just beginning to master this new layer of reality.

Multimodality, Video, and Science

At some point in 2025, it began to feel that "just a text chat" was already yesterday's news. If in the first act we talked about the upper bar of response quality, then the interesting part begins where models stop being purely textual and start to see, hear, and model the world — from short clips to protein structures. And here it suddenly turned out that the main event of the year was not a single demo, but the fact that stable production processes can be built from all of this.

Sora 2 and Veo: Video Grows Up

Video generation in 2025 finally moved out of its teenage years, when everyone rejoices at a one-off "wow" but can't really repeat the result twice in a row. Now the key measurable thing is not just the beauty of the clip, but how predictably the model behaves in a series of tasks: ad inserts, explainer videos, educational videos where frame clarity, character stability, and plot controllability are important.

At the release of Veo 3 in June, I conducted a small scientific experiment with my own wallet: in one evening, I spent about two hundred euros in Vertex AI just trying different prompts and scenarios. The feeling was very bittersweet: on one hand — "a bit expensive for an evening of play," on the other — for the first time, there was a feeling that this is no longer magic for a presentation, but a tool from which you can assemble a real video production pipeline. Then Sora 2 hit the stage, and OpenAI essentially made an analog of TikTok, but for generated clips — a feed you can scroll through endlessly for synthetic video. At one point, the internet simply drowned in memes about the reseller grandma: millions of people watched these clips not because "oh, it's a neural network!", but because it’s just funny and works like regular content. At that moment, something clicked for me personally: if the viewer no longer cares how it’s made, it means the technology has truly reached the "adult" stage.

Images: From Memes to Production

The story with images is similar but even more grounded. The era of "six fingers and melted faces" is in the past: in 2025, image generators moved from the realm of entertainment to the sphere of normal work routines. What became important was not how "beautifully" a model can draw a dragon against a nebula, but how much you can control the result: adding text to a banner, carefully replacing an object, redrawing a cover in the same style as yesterday.

For me, this evolution was very utilitarian: in the second half of the year, I needed to mass-produce illustrations for blog articles, and I got more involved in generation. At first, I actively used Google's Imagen — it gave quite decent quality if you had a little patience with prompt quirks. But I truly fell in love with GPT Image: it drew exactly how I needed, without the constant struggle for adequate style and detail. In parallel, there was a qualitative leap in capabilities: models learned to confidently print text directly on images, edit existing pictures, and maintain style from iteration to iteration. At the end of the year, Nano Banana Pro and GPT Image 1.5 arrived, delivering such a level of quality that for many tasks, the question of "generate or order from an illustrator" no longer arises.

Multimodal Models: Understanding Better than an Ex

When text, images, sound, and video converge in one interface, not only convenience changes, but the very usage model as well. In the second half of 2025, it is almost no longer surprising that you can send a document, a voice note, a photo, a screenshot, or a video to the same system, and it will digest it all. At some point, you stop thinking "this is for text, and this isn't" and start perceiving the model as a universal interface to the tasks around you.

A very telling marker is short videos on social media where people solve mundane but non-trivial problems through the model's "vision": changing oil in a car, assembling complex furniture, or fixing wiring using the Vision mode in the ChatGPT app, which literally guides them step-by-step. This isn't about "evaluating a picture," it's about the link: "I see → I understand context → I suggest what to do next." I analyzed this shift separately in an article on multimodal AI: it clearly shows how products are gradually moving away from being centered on the chat window and are starting to be designed around the scheme "model sees the world and takes action."

AlphaFold 4 and Models for Science: Beyond Office Tasks

Amidst all this beauty with images and video, it’s easy to forget that in 2025, another, less noisy but much more fundamental front was forming — models for science. The story with AlphaFold 4 and other systems of this class is important not because they "also predict something," but because they radically accelerate the cycle of hypotheses, experiments, and research in biology, chemistry, and materials science. Where scientists previously waited weeks for simulation results or sorted through options in the lab, a significant part of the work is now moving into an interactive dialogue with a model.

I covered this topic in detail in an article on how LLMs are changing fundamental science. I was greatly inspired by an interview with Andrey Doronichev: he talked with such fire in his eyes about startups at the intersection of science and medicine that it becomes clear — this is not just another market, but a chance to quite literally improve the quality of life for millions of people. In this sense, 2025 is remembered not only as the year of multimodal memes but as the moment when artificial intelligence seriously took its place in laboratories and research centers, expanding the picture far beyond "content and office routine."

The Engineering Kitchen: Agents, Devices, and Hardware

After all the talk about multimodality and scientific models, it becomes clear: the magic ends where engineering begins — specific workflows, constraints, and the very hardware that either handles it all or doesn't. This is where the main shift happened for me in 2025: AI stopped being a "place you visit when you're in the mood" and turned into a normal working layer — in the terminal, in the IDE, in the phone, and under the desk in the form of a system unit.

Developer Agents: Vibe Coding as the New Normal

For me, the word of the year in development is "vibe coding." After Vastrik's article, I finally stopped writing code in a web chat and moved to tools that live right next to the repository. Instead of explaining a task to the model piece by piece in a browser, it is much more pleasant to watch it work in the same space as you: in the console, in the editor, in a real project.

After that, it clicked very quickly: Cline, Aider, Cursor, and other similar things became the primary entry point for work rather than an experiment. Now, about 80% of the code I write is through Kilo Code or Claude Code: I formulate a goal, look at the proposed plan, let the agent go through the steps, and only then manually refine the important parts. This radically speeds up iterations; writing code has now become so fast and cheap that instead of fixing code already written by an agent, it’s much easier to ask it to rewrite everything from scratch. In the end, developer agents feel not like an "autopilot," but like another team member who still needs code reviews and rules of engagement.

On-Device Intelligence: Apple, Android, and Life Without the Cloud

In parallel, the feeling grew throughout the year that the second important front is right inside the device (on-device AI). When part of the logic works locally, three advantages emerge immediately: lower latency, better privacy, and the feeling that this isn't a separate service but a natural function of the system.

Over the year, I managed to handle a Google Pixel 8 Pro, Galaxy S24 Ultra, and iPhone 16 Pro, and on each, I pushed all the "smart" features to the max. On Android, this all arrived earlier and looked more organic: built-in assistants, smart image and video editing, local scenarios — the feeling that AI is truly baked into the platform. Apple is moving more cautiously but has clearly accelerated: the company essentially admitted it couldn't handle it alone and began cooperating with OpenAI and Google to pull Siri and system scenarios up to the new level of expectations. In 2025, discussing a phone without mentioning its "smart" capabilities and hybrid "local + cloud" mode already feels strange.

Hardware: Why RTX 5090 is More Important than Another Demo

And finally, the quiet but key hero of the year — hardware. It is hardware that determines whether you can actually run large models locally, pull multimodal pipelines and complex agents — or remain in the role of a "thin client" to someone else's cloud, no matter how beautiful the demos look at conferences. The gap between what a model can do on slides and what a team can launch in production is often explained not by a lack of talent, but by a simple fact: you don't have the right class of video cards, memory capacity, or disks to get it all off the ground.

This year, I built myself a workstation for local AI model runs — and I very quickly felt where the real pain lies: it's not just about top-tier GPUs like the RTX 5090, but also about how much memory and disk space a "normal" stack requires to work with models. By December, another unpleasant acceleration was visible: RAM and SSD prices rose 3-4 times, so "adding a few terabytes and sticks" stopped being a harmless upgrade and turned into a significant decision that has to be budgeted as seriously as the choice of model. If you are interested in chip architecture, I've already dived into that rabbit hole and gathered everything in a separate article. After such an immersion, you start looking at cloud bills and home builds as parts of the same equation: the cost of intelligence starts not in API rates, but in how many megabytes, gigabytes, and watts you are willing to give to that intelligence.

Economics and the Rules of the Game

Falling Prices: When "Intelligence" Becomes Default

Over 2025, a switch finally flipped in my head: "intelligence in a product" no longer looks like a premium feature that you need to apologize for to the CFO. The cost of calling flagship models and, most importantly, the emergence of lightweight lines made it normal for AI to run not just "on special occasions," but constantly in the background — from log summaries to automated pipeline checks.

Major players have tiered their lineups: there are heavy flagships for complex tasks and, alongside them, cheaper options with prefixes like Flash, mini, and nano, which allow for very budget-friendly handling of routine work. Chinese models initially enter with lower prices and decent quality, and this creates real competition: when you calculate the cost of a million requests, the choice of engine stops being "religious" and turns into pure product math. As a result, multiple layers are increasingly appearing in architectures: a "expensive brain" for complex things and a "mass working class" of models that you aren't afraid to call a thousand times an hour.

For me, this all translated into a very simple practical effect: it became morally acceptable to run more background processes, more checks, and more "default" automation. You no longer think "what a waste of tokens," but rather "how to make the AI layer work quietly behind the scenes to improve product metrics rather than get in the way."

Regulation: From Memes about the AI Act to Engineering Routine

In 2023–2024, we mostly joked about regulation: "Europe came up with a way to ban everything again," "NIST and slides about responsibility will save everyone in the US." In 2025, the picture became a bit more prosaic: regulation began to slide into the realm of engineering routine — logging, data storage, explainability, content labeling, and employee training. Below is a very brief summary across four regions where we, as a Russian-speaking audience, have to live and work either conceptually or physically.

Europe (EU AI Act)
In 2024–2027, the AI Act is phased in as a single legal framework for all EU countries: the regulation formally entered into force in August 2024, the first bans and general provisions began to apply from February 2025, and the main requirements for high-risk systems and related products stretch out to 2026–2027. For companies, this isn't one "Day X," but a series of deadlines for different classes of systems and roles in the supply chain. In parallel, the European AI Office is unfolding, and oversight of general-purpose models is being detailed: their obligations began working from August 2025. On top of this, the Commission is increasing the "soft layer" — sub-legislative acts, standards, and voluntary codes, such as the draft code for labeling and tagging AI-generated content published in December 2025. For product teams, this means moving into a mode of mandatory documentation, risk management, and formalized transparency, rather than "we quietly hooked up a model and told no one."
USA
At the federal level, the center of gravity is still in the executive branch: in October 2023, a broad Executive Order 14110 on "safe, secure and trustworthy" AI development and use appeared, giving instructions to agencies and establishing a framework around safety, human rights, and national security. In 2025, the line continued with new decrees, including the December act on national AI policy, which tries to set a common federal framework and limit discrepancies between states, but does so through executive power rather than a single law. Translation: there is still no full "American AI Act," but a layer of requirements for risk assessments, reporting, and interaction with regulators is rapidly growing around it, especially in sensitive areas — from medicine and critical infrastructure to elections.
China
China is following perhaps the most "command-and-administrative" path: on August 15, 2023, the Interim Measures for the Management of Generative AI Services came into force, detailing the duties of providers — from requirements for training data and content moderation to mandatory labeling of generated materials and oversight procedures. In 2023–2024, regulators supplemented this with a draft national standard on basic security requirements for generative AI services, which describes technical and organizational measures and is gradually taking shape as a full standard. For major players, this is no longer an "experiment in a sandbox," but a strictly licensed and audited activity where models develop in conjunction with mandatory filters, audits, and built-in safety valves.
Russia
Russia is currently following the path of soft, "hybrid" regulation: instead of a single strict law on AI, there is a national strategy for the development of artificial intelligence until 2030, concepts of responsible use, and targeted amendments to relevant laws (on personal data, information, healthcare, etc.). On top of this, soft tools are increasing — industry standards, codes, methodological recommendations, and elements of self-regulation, including around generative AI, which allows the regime to be formed through a combination of "hard law" and "soft law" without immediately stifling the industry. For practitioners, this means a familiar landscape: formally, the space for experimentation is wide, but uncertainty is high — rules can be fine-tuned post-factum, especially in the areas of personal data, copyright, and liability for model errors.

Forecast for 2026

If I combine my entire 2025 into one picture, what follows aren't abstract claims that "AI will change everything," but rather grounded bets — about work modes, the engineering loop, and where intelligence will simply become the background.

1. Deep Research as the Default for Complex Tasks

Everything that resembles serious investigation — from market research to technical analysis of contradictory sources — will move to Deep Research-class modes in 2026. Companies are already clearly highlighting this as a separate line of development: special deep research modes are appearing in the web interfaces of Gemini, ChatGPT, and Perplexity, which launch not a single model response, but a multi-step cycle of searching, reading, comparing, and assembling a report.

My bet is that this direction will be purposefully developed and brought to an increasingly objective and verifiable state. Users are less and less satisfied with a "smart" answer without evidence: they want to see which sources were used, where they contradict each other, and why the final conclusion looks this way and not another. Therefore, deep, transparently assembled answers with a clear basis in external materials will become the standard where money, reputation, or strategic decisions are at stake, while simple "chat with AI" will remain a tool for quick, non-risky questions.

2. Code: From a "Chat Nearby" to an Environment Where AI is a Team Member

The second bet — 2026 will be the year when tools for writing code finally shift from the "chat next to the IDE" model to a full environment where AI is a full member of the development team.
On top of this, AI-first patterns and frameworks will appear, initially designing code for joint "human + agent" work: specifications as the source of truth, step-by-step planning, and breaking down into tasks that an agent can perform.

GitHub Spec Kit already looks like the seed of such an approach: it’s an open set of tools where a specification and plan are formed first, and then agents act as executors. Cursor 2.0 is moving in the same direction, turning "vibe coding" into a more structured collaborative process where the AI doesn't just append pieces but lives inside the "spec → plan → tasks → implementation" cycle. In this case, the user might not interact with the code at all, communicating only with the agent in a chat.

3. Dubbing and Video Translation as a Factory Operation

The third bet — in 2026, dubbing and video translation will finally turn from a "wow" demo into a mundane conveyor operation for creators, primarily on platforms like YouTube. The platform is already capable of automatically duplicating a video into other languages: you upload a video in Russian, and the system itself adds English and Spanish tracks, expanding the audience with almost no extra effort.

But in its current form, it’s more of an "automatic bonus" than a professional tool: not your voice, not your intonation, artifacts in places, and little control over quality. My bet is that in 2026, the focus will shift from just "auto-dubbing" to accessible tools for the authors themselves: so that in a couple of clicks, you can get dubbing in several languages in your own voice, preserving speech mannerisms, tempo, and the ability to edit it all. Then "making a video" and "scaling it to new languages and markets" will finally merge into one normal production pipeline, and reach growth will become as mandatory a step as exporting a project from the editing software is now.

4. The Engineering Loop Wins, Not the Secret Prompt

The fourth bet sounds almost boring, but it is honest: in 2026, the winners will not be those who find a "secret prompt," but those who have built a proper engineering loop around the models.
Context, memory, instructions, constraints, checks, repeatability — all this tedious but adult part that I wrote about separately in the article on context engineering.

This is the most honest way to improve quality without waiting for an "even smarter model": you improve the input and the process, automate checks, and normalize data and logging, rather than sitting in a chat looking for a magic formulation. At some point, it will simply become embarrassing to reduce "working with AI" to exchanging prompts in a Telegram bot when full practices of prompt engineering, memory, and validation already exist around us.

5. Sorting Tasks into Layers and Intelligence Hidden in Devices

The fifth bet — the final stratification of tasks by model "caliber."
Large models will remain tools for complex, rare, and high-stakes tasks where reasoning quality and complex multimodality are important. Local models with 1–3B parameters will quietly take over daily, private, cheap, and fast scenarios — from on-device suggestions to local agents that the user might not even know exist.

The market will increasingly hide intelligence "inside the hardware": in phones, laptops, routers, smart speakers, and smart home hubs. A logical step is the emergence of home control centers with a built-in local model: a hub that can coordinate devices itself, remember family habits, and yet doesn't leak every action to the cloud.

Counter-Bet: Fully Autonomous AI Will Not Go Mainstream

And separately — a counter-bet that I think is more important than all the previous ones.
Fully autonomous systems without human control will not become mainstream in 2026, not because "models are weak," but because the cost of error and legal/reputational liability is growing faster than convenience.

Hence the main vector: the key market for 2026 is not "removing the human from the loop," but "making human control cheap, built-in, and whenever possible, automated." AI takes on the volume and routine, while the human remains the one who sets the boundaries, accepts the risk, and can press stop at the right moment.

Conclusion

For me, everything that happened with models, hardware, and regulation in 2025 is not a race for superintelligence, but a shift in gravity: experimental toys for geeks have turned into normal infrastructure on which work and daily life already rest. Many of the topics we covered throughout the article I have already lived through in the form of separate texts — on local models, multimodality, agents, and memory — and each time I caught the same feeling: we live in an era of small but reproducible improvements that can be deployed again and again.

The main lesson of 2025 for me is simple: the winner is not the one with the "smarter" model, but the one who turns that smartness into sustainable processes and knows how to pay honestly for mistakes. In a world where artificial intelligence is gradually becoming the new electricity, the most interesting question for 2026 sounds like this: who do we choose to be in a system where everything around us becomes smart by default, except us.

Stay curious.