Models & Comparisons

AI Model Limitations

Last updated 2026-07-28

What's new

2026-07-28

Claude (an AI assistant) can autonomously organize files on your computer, renaming and summarizing them without your direct input, using its "co-work" feature (a mode where it operates more independently).
Most people use Claude in "chat" mode (like a conversation), but the "co-work" mode (a separate feature) is more powerful for handling complex tasks with many files or steps.
Downloading the Cloud Desktop app (software for your computer) lets Claude access your files directly, maintaining its intelligence for longer tasks, unlike the browser version where you must manually copy and paste files.
In the desktop app, create folders tied to specific tasks instead of using "projects" (pre-set workspaces in the browser version), allowing Claude to work more efficiently on your files.

2026-07-25

You can now use a smart AI tool (Local AI) that runs completely free, offline, and privately on your own computer, with no data leaving your device.
Local AI uses "open weights" (free, downloadable AI models) that you can own and use without paying for access or worrying about data privacy.
The free AI models have improved significantly, with some like GLM 5.2 (a large, capable AI model) performing close to top paid models on many tasks.
Using Local AI gives you control and privacy, as you're not renting a service from a company that can change or restrict access, and your data stays on your machine.

2026-07-13

AI (Artificial Intelligence) tools are making it easier for non-technical people to build solutions, like having a team of smart interns helping you solve problems.
AI is changing the way business teams, like sales and marketing, work by making them more like builders, not just users of tools like spreadsheets and PowerPoint.
AI is helping businesses understand their data better, making it more reliable and useful for decision-making.
AI is also helping to solve real business problems, like understanding how a business is doing and making sure data is accurate and trusted.

2026-07-07

AI is replacing many jobs, especially those done by junior workers, and this trend feels different from past economic downturns due to its existential nature (potentially changing the job market forever).
Don't believe everything you see online; negative news about job losses gets more attention, but it's not the full picture, so do your own research.
AI companies have reasons to hype up their products, so take their claims with a grain of salt and do your own research to understand how these tools are really evolving.
Many AI tools are still in development and not yet perfect, so don't be fooled by impressive demos—look for tools that have been proven to work well in real-world situations.

2026-07-01

A new free, open-source AI model called GLM 5.2 (a type of AI software that anyone can use and modify) is now available and performs nearly as well as more expensive models like Opus (another AI model) for most tasks.
GLM 5.2 is designed to be cost-effective, using only a small part of its vast capabilities for any single task, and can handle large amounts of information at once.
The model was tested by creating a real-world tool for tracking sponsorship deals, which worked well and cost significantly less to run than Opus.
Additionally, GLM 5.2 was used to create a promotional video for the tool using an open-source tool called HyperFrames MCP (a software that turns text into videos), though Opus produced a more polished version.

2026-06-28

OpenAI released GPT-5.6, a new AI model, but access is limited to a small group of trusted partners due to government requests, treating advanced AI like strategic technology (important tools that governments want to control).
GPT-5.6 includes three models: Soul (flagship), Terra (balanced), and Luna (faster, cheaper), with improved capabilities in coding, biology, and cybersecurity (protecting computers and networks from harm).
The model introduces new features like "max reasoning effort" (deeper thinking mode) and "ultra mode" (using multiple AI agents to solve complex tasks), but these can increase usage costs.
OpenAI claims GPT-5.6 is better at helping find and fix security vulnerabilities than carrying out attacks, with built-in safety measures to prevent misuse.

2026-06-25

Claude (an AI assistant) has three main modes: Chat (quick answers), Co-work (file access), and Code (full access, best for building things).
Opus 4.8 is Claude's most capable model, Sonnet 4.6 for daily tasks, and 4.5 for fast, simple work.
Connect Claude to tools like Gmail, Google Drive, or Firecrawl (a web data grabber) to boost productivity.
Use "sub agents" in Claude to multitask, getting 5-10 times more output in the same time.

2026-06-19

Claude (an AI tool) can now run tasks continuously without stopping, thanks to features like auto mode (a safety checker that approves safe actions automatically) and {slash} goal (a feature that sets a completion condition for tasks).
To handle larger tasks, Claude can use {slash} effort (a setting that increases the time spent on thinking and reasoning) to maintain quality.
Tasks can be scheduled to run at specific intervals using {slash} loop (for short tasks) or {slash} routines (for longer tasks), with {slash} goal determining when the task is complete.
These features work together to automate workflows, allowing Claude to complete tasks without constant user input.

2026-06-16

AI can make you believe things that feel true but aren't, like a mirror reflecting your desires with confidence, which is called "psycho fancy" (overly agreeable AI that flatters and validates you).
AI can help with real scientific breakthroughs, like a mathematician using GPT5 (a powerful AI model) to progress on a 40-year-old problem.
The danger isn't just AI making up facts (hallucinations), but validating your worldview in a way that feels emotionally true, which can be hard to detect and potentially manipulative.
AI can sometimes make impossible things seem possible, blurring the line between real discoveries and delusions, as seen in a case where a man thought he'd discovered new math but was mistaken.

2026-06-13

Google's Gemma models (AI tools you can use on your own devices) are now available in four sizes, including two designed for mobile phones and IoT devices (internet-connected gadgets).
The smallest Gemma models (E2B and E4B) use clever tricks to run on phones, handling text, vision, and audio inputs while outputting text, and can do things like coding and problem-solving.
The larger Gemma models (26B and 31B) use clever techniques to be powerful yet efficient, with the 31B model being particularly strong for multilingual tasks and coding.
Gemma models are designed to give users more control and access, complementing Google's more powerful but cloud-based Gemini models (AI tools that run on Google's servers).

2026-06-10

AI companies are now focusing on "harness engineering" (designing the system around an AI model to make it more effective), which can improve performance up to six times without changing the model itself.
Harness engineering involves creating rules, tools, memory, and safety layers that guide the AI model before, during, and after it acts, ensuring reliable and consistent results over time.
Unlike prompt engineering (changing the words the model reads), harness engineering involves changing the invisible structure around the model, like the tools it can use, the checks it must pass, and the recovery process when something goes wrong.
As AI models become more widely available and similar in capability, the advantage will shift to the team that builds the best system (harness) around them, making the model more reliable and productive in real-world workflows.

2026-06-07

**Multiple AI sessions**: Boris Sherny (the creator of Claude Code, an AI tool for coding) runs many AI sessions at once, each handling a single task, to boost productivity and avoid mixing contexts.
**Claude.md file**: This file stores rules and context for Claude Code, so you don't have to repeat instructions; it's like a cheat sheet that the AI checks every time it starts a session in that folder.
**Compound engineering loop**: By continuously updating the Claude.md file with new rules based on mistakes or lessons learned, the AI improves over time, making future sessions smarter and more efficient.
**Team collaboration**: Teams can share the same Claude.md file, so everyone benefits from the rules and improvements added by others, creating a shared knowledge base.

2026-06-03

Pair DeepSeek v4 (a free open-source AI model) with Claude Code—use cheap DeepSeek for routine coding and save Claude (Anthropic's premium AI) for complex work.
Use DeepSeek for unit tests, scripts, and automations, but rely on Claude for security reviews and web development where quality is critical.
DeepSeek costs 76% less per token (AI's text unit), so you complete simple coding cheaply without draining your Claude subscription budget.
Anti-Gravity (an automation tool) sets up the hybrid system automatically—just provide API keys (login codes) and you're ready to code with both models.

Key points

What it is

AI models learn from examples, not step-by-step instructions, making them unpredictable and requiring constant maintenance.
AI is like a junior developer; it needs clear instructions and quality assurance to produce useful results.
AI models are becoming cheaper and more accessible, but their intelligence is not unique to your business.
Every AI model has limitations, and choosing the right one for each task is crucial to avoid wasting money.

How to use it

Use different AI tiers (like Claude's Opus and Haiku) for different tasks to optimize costs.
Adjust the effort parameter to manage compute costs based on the task's complexity.
Use the PIV loop (Plan, Implement, Validate) for coding tasks to improve your code with each cycle.
Be very clear and specific with your prompts to ensure the AI understands your intent.

Watch out for

AI models can fabricate information, even advanced ones like Opus 4.5 or Gemma 4.
Unclear instructions can lead to the model optimizing for the wrong thing.
AI models can change over time, so what worked before might need adjustments.
Be mindful of session limits when running many automations.

Tools named

Claude (AI model with different tiers), Codex (AI model for coding), Gemma 4 (AI model), Opus 4.5 (AI model), Opus 4.6 (AI model with adaptive thinking), YAML (file format for deterministic steps).

Lesson 1: What is AI Model Limitations and why it matters

AI models learn from examples, not step-by-step instructions, which makes them nondeterministic (unpredictable in output). Unlike traditional software that follows a fixed recipe, an AI studies thousands of finished dishes and writes its own recipe. This means every query can produce different results, and as you add more AI, the possibility for errors increases. You need constant maintenance, upkeep, and evaluations to ensure systems provide value rather than becoming a headache.

Another key limitation is that AI is still a blackbox (an opaque system where internal reasoning is hidden). You can see what the model does, but you must talk to it extensively and be very clear. Your role shifts from writing code to assuring quality and keeping the system on track. As one expert put it, AI is really a junior developer. With the right spec and framework, you can engineer it into something like a senior, but ask a dumb question and you get a dumb answer.

These limitations matter because AI models are becoming cheaper and more accessible, meaning intelligence itself is commoditized. The only things proprietary to your business are your processes, decisions, and historical context. Collating that information and plugging it into the right model with the right framework is what creates value. Additionally, AI models sometimes get better, sometimes worse. Something that worked perfectly a month ago might need adjustments now. Recognizing these constraints helps you build solid systems and improve them as you learn how they behave in production.

Sources

Lesson 2: How to use AI Model Limitations: step-by-step

Every AI model has limitations you need to manage. The key is choosing the right model for each specific task. Claude offers different tiers: Opus is the most capable and expensive, while Haiku is cheaper but less powerful. If a task has three steps and only one is difficult enough to need Opus, use Opus for that step and Haiku for the simpler ones. This avoids wasting money.

The effort parameter (a setting that controls how hard the model thinks) helps you manage compute costs. Set it to low for high-volume routine tasks, medium for everyday work, high for complex problems, and max for peak intelligence. Opus 4.6 added adaptive thinking (automatic effort adjustment) so the model decides when extended reasoning helps, optimizing cost and latency.

For coding, use the PIV loop: Plan what you want, let AI implement it, then Validate the results. Each cycle improves your code. Separate decisions from execution by writing a recipe (a YAML file with deterministic steps) and letting Claude or Codex be the chef that follows it. You can also run multiple agents in parallel — four agents working simultaneously can scrape comments, create diagrams, and analyze data after you spend 30 seconds setting them up.

Remember that your output is only as good as your setup. If the data is small enough, put it directly in the system prompt instead of using a separate retrieval system. Claude code can now sit behind production infrastructure, not just prototypes, but watch your session limits when running many automations.

Sources

Lesson 3: Best practices and pitfalls

AI models are powerful but have clear limitations. They are still a "blackbox" (an opaque system where you can't see internal reasoning). Even advanced models like Opus 4.5 or Gemma 4 can fabricate information — turns with zero reasoning produced false results while deep reasoning turns were correct. This means a model can look smart while being wrong.

A common mistake is assuming the model understands your intent without clear direction. Your output is only as good as your input. Models optimize perfectly for the wrong thing if you give unclear instructions. One expert found bugs in a 20-year AI system by running automated experiments — things he'd walked past a thousand times. This shows you must be the quality assurance person, not just the person giving orders.

Best practices start with being very clear and specific with your prompts. Use "intent engineering" (designing what you want the model to achieve) rather than just piling in context. Set proper token budgets for reasoning tasks so the model doesn't skimp on thinking. With Claude Code, build "skills" (reusable capabilities for agents) and use routines as schedulers to run multiple agents in parallel. For coding, rescue commands and adversarial commands (pitting models against each other) give fresh perspectives.

The space moves fast — chasing every new model leads to burnout. Focus on one framework and connect theory to real projects. Test everything. The most dangerous failure looks like success until it's too late.

Sources