Claude Code

Gemini RAG File Search

Last updated 2026-07-16

What's new

2026-07-16

**Skills need evaluation (evals)**: Skills are like instructions for AI to do specific tasks, and they need testing (evals) to ensure they work well, but many people skip this step.
**Skills improve AI performance**: Skills can boost AI performance by about 15%, but AI-generated skills can sometimes make things worse, so it's important to keep skill instructions under 500 words.
**Two types of skills**: Capability skills teach AI new tasks, while preference skills encode specific workflows or styles, and both need proper evaluation to work well.
**User-invoked skills are powerful**: Unlike model-triggered skills, user-invoked skills let users directly ask the AI to perform specific tasks, which can be very useful for repetitive jobs.

2026-07-13

Google launched a new smart speaker (a device that responds to voice commands) called Google Home, which is designed to work with Gemini AI (a new, more advanced AI assistant).
Unlike older versions, Gemini AI can handle complex tasks, creative brainstorming, and multi-turn conversations, making it feel more like a genuine assistant than a simple command tool.
The speaker can search camera history (recordings from security cameras) to answer specific questions, like "Is my back gate open?" or "Did the dog get on the couch today?"
Google Home can also provide a summary of what happened at home while you were out, using a feature called home briefs that combines notifications and camera events into a spoken recap.

2026-06-28

A new AI tool called Jarvis (an AI assistant) helps manage and summarize team activities, ensuring security and control within a company's own AWS (Amazon Web Services, a cloud computing platform) account.
This setup is designed for larger companies, non-profits, or organizations with strict guidelines, allowing them to securely use tools like Salesforce (a customer relationship management platform) or Slack (a communication tool) on mobile devices.
The platform built on AWS Bedrock (a service for building and scaling generative AI applications) can be emulated in other cloud environments like Azure or GCP (Google Cloud Platform, a suite of cloud computing services).
Users can create and manage multiple AI agents, set their roles, and connect them to communication tools like Telegram (a messaging app) or Slack, with all data and interactions secured within the AWS environment.

2026-06-19

Ace is a new AI tool (software that uses artificial intelligence) that lets you create "loops" (AI agents that build on their own work) to automate tasks.
Ace can also build simple apps (like a weather app) quickly by asking you questions to understand what you need.
You can customize the AI team (group of AI agents) working on your project and even replace default AIs like Gemini with others like Codex.
Ace has just launched, and the creator is offering a 30% lifetime discount to encourage user feedback.

2026-06-16

Some AI models (like Claude 5) can be taken away without warning, so running models locally (on your own computer) ensures you always have access and saves money.
Local models (AI software running on your computer) are private, work offline, and can't be shut down or restricted by others, though they may not be as powerful as the latest cloud-based models.
You can use local models to run software like Notebook LM (a tool for working with AI models) without paying for subscriptions, and even build your own custom AI-powered applications.
To run a local model, you need to check your computer's capacity (like memory and storage), download a suitable model (like Qwen 3), and connect to it using an AI assistant (like Claude).

2026-06-13

Web MCP (Model Context Protocol) is a new web standard that lets websites offer AI agents (software that acts on your behalf) a clear menu of tools and actions, making it easier for them to navigate and interact with the site.
To prepare for AI agents, focus on making your website accessible to everyone, using clear HTML, strong accessibility standards, and fast loading times, as this also benefits AI agents.
Web MCP improves the performance and reliability of AI agents by allowing websites to define their capabilities as structured tools, reducing the need for agents to guess or work around site features.
The Model Context Tool Inspector is a Chrome extension that shows the tools available on a website for AI agents to use, helping developers see and test how their site interacts with AI.

2026-06-10

Learning one AI tool like Claude (a popular AI assistant) isn't wasted time because the skills you gain can transfer to other tools like Codex (a newer AI assistant).
AI tools like Claude, Codex, and Open Claw (different AI assistants) work similarly, using folders and context files on your computer, making it easy to switch between them.
Focus on understanding the fundamentals of AI tools, not just the specific tool, to avoid feeling overwhelmed by new releases and stay adaptable.
Your work in one AI tool can often be used in another, as they share similar structures and can access the same files and connected tools (like Gmail or Slack).

2026-06-07

AI tools often start from scratch each time you use them, causing a "friction tax" that wastes your time and energy, like reexplaining your business or style repeatedly.
The "information hierarchy" is a fix for this tax, organizing your business details once so any AI can access and learn from it, making them smarter and saving you time.
An AI "agent" (a tool that uses AI to perform tasks) is only as good as the information it can access and how clearly its job is described, similar to onboarding a new team member.
The information hierarchy has two tiers: a "my business folder" with details about you, your business, voice, and offers, and a second tier with project-specific folders for easy AI access.

2026-06-03

Gemini CLI v0.40 runs Gemma (a lightweight AI model) locally on your computer, keeping your data private without uploading to the cloud.
Auto memory (AI learning from your past conversations) now extracts useful patterns from your old sessions, so the AI gets smarter about how you work.
You can now run a team of AI agents (separate assistants working together) with shared memory, coordinating their work through a single interface.
Shell validation (checking commands for security risks) now blocks injection attacks before they execute, making it safer to run AI-suggested commands.

Key points

What it is

Gemini RAG File Search is a tool that lets you upload documents to Google’s AI and ask questions about them, with the AI pulling specific facts from your files to answer.
RAG (retrieval-augmented generation) means the AI uses your files to answer questions, rather than guessing.
Gemini automatically splits (chunks) and converts your files into a searchable format, handling storage and searching for you.
It provides citations, showing exactly which part of your document it used to answer, reducing made-up (hallucinated) facts.

How to use it

Upload a file using the media upload method, create a tool in the Gemini API, and point it to your file set.
Start a chat session and ask questions; the model will decide if it needs to look at your files and answer with citations.
Pre-process files to ensure they are clean and well-scanned before uploading to avoid poor results.
Be intentional about what you upload to avoid duplicate data, which can lower response quality.

Watch out for

Garbage in is garbage out: messy or poorly scanned files will result in poor answers.
Duplicate data can explode your database with redundant information, lowering response quality.
For ultra-specific chunking strategies or billions of documents, a traditional RAG pipeline may be better.
While cost-effective, chunk-based retrieval may not be ideal for all tasks.

Tools named

Gemini RAG File Search (Google’s AI tool for searching uploaded documents), Claude Code (AI assistant for code analysis)

Lesson 1: What is Gemini RAG File Search and why it matters

Gemini RAG File Search is a tool that lets you upload documents to Google’s AI and immediately ask questions about them. RAG (retrieval-augmented generation) means the AI pulls specific facts from your files to answer, rather than guessing. When you upload a file, Gemini automatically chunks (splits it into pieces) and embeds (converts those pieces into vectors for search). It then indexes them for high-speed retrieval. This matters for AI development because you don’t have to build your own search system or database—Gemini handles storage and searching. The setup is simple: upload a file using the media upload method, create a tool in the Gemini API (file search is just a tool you enable like function calling), point it to your file set, and start a chat session. The model decides if it needs to look at the files, performs the search, and answers with citations. Unlike a standard LLM that might hallucinate, Gemini file search gives you receipts—it tells you exactly which part of your document it used. It also handles huge files up to 2 million tokens of context. However, garbage in is garbage out: if your scanned files are messy, the results will be poor. Gemini charges only for uploading files, making it very cost-effective. It abstracts away the boring parts of RAG, letting you focus on building applications instead of managing infrastructure. If you need ultra-specific chunking strategies or an on-premise solution, this may not fit, but for most use cases it removes the hardest parts of creating a chat-with-PDF feature.

Sources

Lesson 2: How to use Gemini RAG File Search: step-by-step

Gemini’s File Search API lets you build a RAG agent (a system that retrieves relevant document pieces to answer questions) without setting up a complex data pipeline. First, upload your file using the media upload method. Gemini automatically chunks and embeds your document (converts pieces into vector representations) and indexes them for high-speed retrieval. The API handles huge files — up to 2 million tokens of context in the model, but the store can hold much more.

Next, create a tool. File search is a tool you enable, just like function calling (a way for the model to run specific actions). Point it to your uploaded file set. You don’t need to manually retrieve context. Finally, start a chat session. You send a user message like "summarize the Q3 financial results." The model decides if it needs to look at your files, performs the search, and generates an answer with citations — it shows you exactly which part of the document it used. This reduces hallucination (the model making up facts) compared to a standard chat.

Remember, this is not magic. Garbage in is garbage out — if your document is poorly scanned or messy, the output quality will suffer. You may need to pre-process files. Also, avoid duplicate data that lowers response quality. The API is very cheap, making it fast and cost-effective for finding a needle in a haystack, though chunk-based retrieval may still be better for some tasks.

To use this with Claude Code, you can upload API documentation as a file, then ask it to analyze the content and create a resource guide. Claude can also use RAG with a local vector database, but for code-specific questions, direct terminal navigation tools sometimes work better than retrieval.

Sources

Lesson 3: Best practices and pitfalls

Gemini RAG File Search Pitfalls, Mistakes, and Best Practices

A common mistake when using Gemini’s file search is treating it like magic. The tool automatically chunks and embeds (converts text into searchable vectors) your files, but garbage in is garbage out. If your source documents are scanned poorly or contain messy data, the retrieval quality drops. Pre-processing files before uploading is essential—don't skip cleanup.

Another pitfall is duplicate data. If you upload overlapping files, you explode your database with redundant information, which lowers response quality. Be intentional about what you upload.

For best practices, remember that semantic search (meaning-based search) excels at finding a needle in a haystack quickly and cheaply. But if you need ultra-specific chunking strategies (custom ways of splitting text) or have billions of documents, a traditional RAG pipeline with a vector database like Pinecone may still be better. Gemini file search shines for rapid prototyping, internal tools, and apps that chat with user-uploaded documents on the fly. It handles files up to 2 million tokens of context.

Gemini’s file search also gives you "receipts"—it tells you exactly which part of your document was used to answer. Enable file search as a tool (like function calling) and you’re done; the model decides when to search. Finally, use a master system prompt (like a `gemini.md` file) to make agent output more predictable.

Sources