Claude Code + RAG-Anything = LIMITLESS

Almost every rag system suffers from the exact same problem. They can only handle text documents. So if you try to give it images, charts, graphs, whatever, most rag systems just can't handle it. And when I showed you light rag yesterday, it suffered from the exact same problem. But today, I'm going to show you the fix. And that fix is rag anything. Rag anything solves this document problem for us. It can handle images. It can handle charts. It can handle graphs. And it allows us to create a rag system that actually deals with the documents you use. Rag anything is from the same team that built light rag. It plugs in directly into the light rag system we already built yesterday. So it's really easy to introduce this into our stack. And so today I'm going to show you exactly how to set it up and how it works under the hood so you can begin using one of the most powerful rag systems out there. So in case it wasn't obvious enough from the opener, I'm going to assume you've already watched yesterday's light rag video. I'll put a link above if you haven't done that already because today I'm going to assume you've already set up your light rag server. You understand how rag works and you understand this whole knowledge graph thing because rag anything is essentially going to be a wrapper around light rag. We're still going to have the same light rag web UI with some differences but everything that gets pushed into rag anything you know these non-ext documents eventually find their way to the same knowledge graph. We're going to be asking it the same questions. We're going to be using the same API to query it through cloud code that we did yesterday. And the functionality we are going to be adding today is significant. It's not enough to build a rag system that is purely text. We don't operate in a world that's purely text. How many of you have been given a PDF document that isn't even technically text? It's just scanned. Light rag can't really handle that. Rag anything can. Now, we will go a little technical today. We'll get under the hood and I'll explain exactly how this whole system works. But big picture, what is it doing? Rag anything is just looking at the documents that aren't text, it's basically doing exactly what light rag does except to these non-ext documents. And after it creates its own knowledge graph in its own vector database, it merges it with the light rag one, which is why everything ends up being in one nice neat little place for us to ask questions about. Now, the only downsides about rag anything is it's a bit heavier. We have to download some models that live on our computer that help parse some of these non-ext documents. And when it comes to actually ingesting non-ext documents, we can't do it really through the light rag UI. We have to use a script. Luckily, this is where cloud code comes in. So for you, the user, after you set all this up, all you have to do to ingest non-ext documents is tell cloud code, hey, go ahead, use the rag anything skill and ingest this document. It's that simple. And you ask the questions the same way you did before. So really not too bad. And again, you get all this functionality just by doing that. Now, before we go into how Rag Anything actually works, just want to give a quick plug for my Claude Code Masterass. Just came out a couple weeks ago, and it's the number one place to go from zero to AI dev, especially if you don't come from technical background. I update this literally every week. There's a new update coming tomorrow. So, if you're someone who is really trying to master cloud code and has no idea where to start, well, this is for you. There's a link to that in the comments. It's inside Chase AI Plus. I also have the free Chase AI community. if this is just too much for you, you're just getting started, link to that is in the description. That is where you also will find the prompts and the skills that I'm going to talk about today. So, make sure you check that out regardless. Now, let's talk about Rag Anything and how this thing actually works. To be honest, it's pretty simple, pretty self-explanatory. So, not to waste your time, I'm just going to keep this image up for like 10 seconds and then we'll move on to the next thing. All right, pretty good, right? Let's move on. I'm just kidding. [laughter] It's it's there's actually a bit going on. This image makes it more confusing than it actually is. And if you understand what we did the other day with light rag, remember all this conversation, you're going to be good. Rag anything kind of operates in a similar fashion just with a few extra steps. And I want to go through because I think it's important to understand how these things work. You know, I think in AI in general, it's easy to become like super practical focused like I just want to know how I install it Chase and then how to use it. That's fine. You can skip ahead if that's you. But I think if you want to become a more mature AI dev and you kind of want to separate yourself from the monkey I could replace you with that just hits accept accept accept and copies prompts and skills then I think it's important to have some you know understanding of architecture because this is what's going to separate you from other people and not just in terms of like how you can use this rag system but in bigger level and higher level bigger projects right this is how you begin to sort of like create your own skills like actually become good at this stuff so let's talk about it so rag anything. Let's talk about the problem, right? The problem is I have a PDF that is a scanned PDF and it's not really text and yet I need to put it into my rag system. Light rag can't handle it. So, in comes Rag Anything, right? It's got the cool llama with the six shades. So, the first thing that happens is I'm going to ingest this document into rag anything. And the first thing it's going to do is it's going to use a program called MinerU, which runs on your computer completely locally for free. And it's going to essentially break down this document into its component parts. Minor U is an open- source project. Again, it's essentially a document parser that includes a bunch of like miniature specialized models. All you need to know is if you're scared of this, it's open source. I'll put a link down below. And again, this is what's going to be running and doing most of the work for us today. So minor U is looking at this document and it says okay this is a header. It creates a box around the header. It says this is text. It says this is a chart. It says this is an image of a bar graph and it says this is an equation written in latex. What it's done is it's looked at the document and it's broken it out okay into its special parts. Minor U doesn't understand what's inside here. Minor U isn't reading the text. It doesn't get the text. It doesn't understand what the chart is about. It just knows chart text image. Okay. From there, it's going to send these component parts to individual specialized models that are part of minor U. So, this is all invisible to you. This is all happening automatically under the hood. So, the model, one of the models is called like paddle OCR. That's what's going to look at the text. So minor is sending this text block to paddle OCR on your computer and it's going to pull out the text. Okay. So now instead of being scan text, it's actual text that reads company X reported strong Q323 results with revenue growth blah blah blah blah blah, right? Same for this text. Same for the chart, right? It's also going to turn it into text, right? Something an LLM can handle. Same thing with latex equations. It has a whole model that handles that, right? This is now no longer latex. it's actually text except for images. So whether this is a bar chart or just it's really anything that it can't transform to text. What it's going to do instead is it's going to take a screenshot of it. And this is important. All right. So now this is a screenshot. It's an image. Screenshot. Love that. So what do we have? We inserted a non-ext document. It's been identified into its component parts. And we've taken those component parts and we've broken it down into two buckets, right? We have the text bucket and we have the image bucket. It's important to realize this. There's two paths that can go down. Image or text. All right, you with me? So, what it's going to do now is we're done using these internal models. Now, we need to bring in the big boys. Now, we need to bring in something like GPT 5.4 mini. Of note, that isn't necessarily the case. You could keep this all local if you wanted to. You could use something like Olama. So now I take the text bucket and I push it to GPT 5.4 mini and I include a prompt that says I want you to break out this text for two things. I want you to take that text and break it out into entities and relationships. Remember entities and relationships. Remember our knowledge graph entity entity and sort of the relationship between them. Okay? And I want you to break it out into what will be embeddings for a vector database. So embeddings embed and then I'm just going to say entities plus relationships. Now thinking ahead, what's going to happen there? Well, the embeddings are going to become embeddings in a vector database and the entities and relationships are going to become a knowledge graph just like we did with light rag, right? Same thing. Same thing except now now it's from the text bucket. But what about those images we had, right? What are we going to do with these guys? Same thing. This is going to get pushed to 5.4 as well, but it's going to be as a screenshot, as an OCR. So, we're telling GBT 5.4, take a look at this screenshot and break it out into two things, right? Embeddings and also entities plus relationships. Now, why do we do that? Why don't we just shove it all into the same exact prompt and have it just OCR this entire thing, right? Why don't we just treat this entire thing as a screenshot? Because it's expensive and slow. What Rag Anything decided to do, and I think it's kind of smart, is it kind of takes a scalpel to this on your computer at the local level, breaking it out into text, breaking out into screenshots. So, when we go through these two paths, you're saving a ton of money and time. Because imagine you were trying to have chat GPT look at 10,000 screenshots and then break out all the text and from the text break it out into embeddings and entities and relationships. It take a lot of time and money. This is smarter. So entities and relationships from the image side same exact thing. It also gets a vector database and it also gets a knowledge graph. So what does that mean? That means from one document we've now created four kind of things, right? We have two vector databases and we have two knowledge graphs from our single non-ext document. You with me? Now, what do we have to do? Well, it's kind of obvious. We need to merge these. So, it's going to take these four things and just push them together, right? They're going to pretty much overlay on top of one another. It's going to match them based on entities essentially. And you're just going to get, you know, at the end one vector database and one knowledge graph. Pretty much the exact same thing we did up here with lighter rag. Simple enough. If we were just using rag anything, that would kind of be the extent of it. However, remember, we're trying to lay rag anything on top of light rag. I want all the power of light rag and I want all the power of rag anything. So what happens now? Well, what happens is just a repeat of what you just saw. So let's kind of bring this guy down. So now we have our rag anything set with a vector database and a knowledge graph and we have our light rag set. So what do we do? We just merge those together. So then what happens is we get the rag everything and the light rag combined which gives us finally one vector database and one knowledge graph and from there it's just like it was before with light rag on its own right you ask a question about whatever that get that question gets turned into a vector up here it pulls the relevant vectors and then it also goes down here finds the correct entity and then takes a look at what's nearby. Okay, maybe that was a little confusing. I hope I explained that. Okay, the kind of recap to confuse you even more. What happens when I add a document that cannot be text? It goes into rag anything. Rag anything breaks out what text it can and then breaks out what images it can as well. It sends both of those to chat GPT or whatever AI system you want. It breaks that out into embeddings, entities, and relationships. Those get turned into knowledge graphs and vector databases. We then merge those together. We now have one vector database and one knowledge graph for rag anything. And since we've already been running this in light rag or if you've added any more documents on top of that, you have an existing vector database and an existing knowledge graph. To solve that, we simply merge them. And in the end, you didn't notice a dang thing. Again, as the user, all of this is invisible to you. Okay? None of this really matters to you. The only thing that might matter to you is what's happening over here with GBD 5.4 cuz it's going to cost you some money. But for educational purposes, that is how the rag anything system integrates with the light rag system. And at the end of the day, it just means that you have a rag system that can handle non-ext documents. And if you're still around after all that, now we can go into how you actually install this thing and use it. So now let's talk about the install and how to actually use it and a couple things you need to watch out for. So I created a oneshot prompt that you can give Claude Code that will install everything for you and update the proper models and all of that. All you need to do is just make sure you're in your light rag directory when you run this. So there's really three things it's going to be doing. First of all, it's going to make sure we update that correct storage path since you already have a Docker light rag instance running. Two, we want to update the model because based on the GitHub, it, you know, was created a little while ago originally. So all the example scripts and all that use things like GPT 40 mini. So I have it on 5.4 nano. Understand you can change that if you want to, but I had it use 5.4 nano as well as keep text embedding 3 large so that we can just use OpenAI for everything. It just keeps it simple. Play with it as you wish. Lastly, since we're using rag anything as essentially a wrapper on top of light rag, some of the example scripts given in the GitHub repo are kind of wrong. So there's like this embedding double wrap bug which again we just tell cloud code to fix and it will fix it. So you're just going to use this prompt again. It is inside the free school community link is in the description. Just look up rag anything and you will find it there. And once you run that prompt, it will begin downloading everything. and understand it's a little heavier because it needs to download minor u and all those dependencies as well. Now let's talk about ingesting documents cuz this is kind of annoying and a pain in the butt. In a perfect world, the lightrag plus rag anything situation would be very streamlined and I could dump whatever I wanted to into light rag/rag anything through a singular interface. I could come into the UI, I could go to upload and I could do that. You really can't with rag anything with light rag. You can still do this for text documents. So you can still do the normal workflow that I showed in the previous video where you go to the UI or you use the light rag skill to upload documents. You can't do that with rag anything. It has to go down essentially a different tunnel, a different pathway. But that different pathway with rag anything is a Python script. There's no UI. There's no button to press. It's literally a script. It's code you have to run. Now, luckily, this is where cloud code comes in and makes it very simple because we're just going to turn that script inside of the repo into a skill. So, for you, once that skill is created, all you have to do is say, "Claude code, use the rag anything skill to upload all these documents, all these non-ext documents." And when it does that, it will go through the minor u process. It will take some time because it has to do all these you know things to it like we explained in the kind of technical section but it will upload it to light rag and it will show up inside of your documents and inside of your knowledge graph. Okay, that's the only weird part you need to know. The other weird part to be honest is once you do that it also requires you to restart the docker container but as part of the skill that happens automatically. So again from your point of view as the user the only difference is you just need to invoke the skill. Now this skill the rag anything upload skill is also inside the free community. So just download it and then put it in your dotcloud folder and then it will work just fine. Now the one note on minor u taking a while that's because the way rag anything works when you download it it's going to run on your CPU. If you want it to run on your GPU you have to have a different version of pietorch. If that all went over your head, just if it's too slow for you, just tell cloud code, hey, can we run PyTorch, can we run minor U on our GPU? And it will walk you through it. Or in fact, it'll just do it all on its own. But by default, it's just going to run on your CPU. So just know that. So let's see an example of this in action. So one of the documents we ingested was this PDF of Novatech, right? SAS revenue analysis. It's totally fake. But the point is we ingested something that has this sorted bar chart, right? So this is something that obviously would have been pulled out as an image sent to chat GBT yada yada yada. Normally light rag wouldn't be able to handle this because it's just an image. It's charts. It's hard for it to sort of break that out. But since we ran this through rag anything, we can now ask a question via cloud code about this. So I asked Claude code, can we query our lighter rag database about the monthly revenue trend for Novatech Inc. for January through September 2025? You can see here it actually didn't even use the skill. It just straight up did the API request, which is fine as well with the query. What was the monthly revenue trend for Novatech Inc. from blah blah blah blah blah. Now, it gave a full response. So, I can take a look at the raw response if I wanted to. But what did it do? It came back with the full monthly breakdowns. We see January 4.6, 4.6, February 4.9, 4.9, March 5.4, 5.4, on and on and on. So, in terms of asking questions about these new documents, same thing as before. The only difference is the upload. All you need to do is to invoke that skill that I'm giving you and then tell Claude Code what you want to put in there. You could point it at a whole folder. You can point it at a specific download. It's just this easy. This is the only really weird thing you got to get used to is these two upload paths. But the actual question and answer, it's just plain language. Plain language. Even if you have you have the skills as well, which I also gave in the last video, but Cloud Code's also smart enough to understand the API structure of this whole thing because it's it's local. It's on your computer. So that's really it when it comes to rag anything. I know the majority of this video was focused sort of on the technical aspects, but as you see once we built that light rag foundation, actually adding rag anything on top of it isn't too hard, especially if we just use that oneshot prompt I gave you. There are some things you can tweak along the edges like anything when it comes to quering it, but really with claude code, it's kind of in charge of all the weights that you can tune inside of light rag. And for that I'm talking about if we go to the retrieval section all the parameters here on the right. Again claude code knows which ones tend to be best for you. So overall I hope this kind of explained how easy it is to set up rag anything and also how easy it is to add this level of functionality to your rag systems which in many rag systems just isn't possible or it's very expensive. And this is relatively cheap especially with that whole minor u local parsing system we're able to set up. So as always let me know what you thought. Make sure to check out Chase AI Plus if you want to get your hands on that cloud code master class and I'll see you

Claude Code + RAG-Anything = LIMITLESS

Full Transcript

Need a transcript for another video?