Blog

Multimodal AI in 2025: How GPT-5.1, Gemini, Claude, and Grok Learned to Understand Everything

AI

LLM

Multimodal AI in 2025: How GPT-5.1, Gemini, Claude, and Grok Learned to Understand Everything

In 2020, AI was an archipelago of isolated models. By 2025, everything has changed. This article is a deep dive into the unified Next Token Prediction paradigm that enabled GPT-5.1, Gemini, Claude, and Grok to understand text, images, and video simultaneously. We break down how it works and what the flagship models are capable of today.

November 18, 2025

An indie hacker's take on AI and development: a deep dive into language models, gadgets, and self-hosting through hands-on experience.
© 2025 Gotacat Team