At this point there are multiple home RAG systems, pretty much using the same components, so does anyone know how this compares to others? I see that it imports GraphRAG which most other don’t
I can't speak for the internals, but I've found it to be dead simple to spin up locally and use, with decent results on the docs I've tested it with.
That said, I think a lot of AI chat services have recognized that document search is table stakes and are building this functionality into their tools, so I don't know whether Kotaemon as a standalone tool will be needed for much longer.
For example, my company was originally going to push out Kotaemon for private document search, but we have now put that on pause because we're exploring whether we can get the same results through our primary AI chat service, without having to point users to a separate tool.
> whether Kotaemon as a standalone tool will be needed for much longer.
I think a lot of AI startups will find themselves in this situation. Searching and summarizing docs is a no-brainer for OpenAI, Anthropic, etc. The only issue they have right now is that their models might not be reliable enough due to the non-deterministic nature of LLMs. In the long term, I believe Google, Amazon and Microsoft will probably be the big winners in this area since they can offer multiple models from major providers to de-risk things.
Unless AI complements an existing solution that is unique and/or is done well by existing businesses, it will be very difficult to compete.
How well do these kind of pre built systems work? RAG in my experience usually requires a decent amount of customization to your input data for chunk formatting and other things to work well, are these systems flexible enough?
What if you want to integrate it into an existing system, are these for local use only? So everyone on e.g. a team has to set it up themselves?
Kinda curious what the exact use case is I have seen a few of these repos with 10k+ stars but don't really get what it's used for.
I share some of your concerns with generalised/pre-built RAG but on your local use only question, this is in the readme:
> Host your own document QA (RAG) web-UI: Support multi-user login, organize your files in private/public collections, collaborate and share your favorite chat with others.
Kotaemon does support customisation, so maybe that should allay your concerns, but I do wonder how tricky it would be to implement and then maintain.
> RAG in my experience usually requires a decent amount of customization
Like what? I'm curious because I just upload the documents to OpenAI and make it available to the Assistant, and it seems to work fine as a generic solution. Are they doing anything magical?
> Are they doing anything magical?
It seems to me like they're doing a ton of magical stuff, but it's really hard to know exactly without seeing the actual source, since sadly their company name is a bit of misnomer. Judging by the results of using it, they seem to be doing some pre/post-processing to make it work with various of formats and etc better.
>I just upload the documents to OpenAI and make it available to the Assistant, and it seems to work fine
That is also my experience, OpenAI assistant attached docs do seem to have a good amount of magic. Migrated over from an admittedly basic/naive custom RAG solution and results are similar/better but just have to work with a doc instead of dealing with RAG. One thing I found is I have to add a strong text to the prompt to force it to always check the doc, apart from that works great.
When you do that you are just putting the whole book into context for openAI to reason about. That works if the work of smaller than the context.
For longer documents or for groups of documents you need a kind of search to extract the most relevant passages to throw in the context.
That is RAG. That search you do.
It's usually a semantic search using embedded data created by a specialized model to create these embeds and a specific algorithm to chunk the document into smaller pieces to derive meaning from.
So you have multiple pieces involved into the job. A chunker, an embbeder model, a vector database, etc.
I have a question about RAG in general (I am quite ignorant regarding LLM and I an trying to reason about possible hurdles before starting experimenting).
I would like to "train" an LLM with a few thousand analysis documents that detail the various enhancements applied to a in-house app over the last twenty years.
My question is: some of the modules that are part of my app have been totally revamped, sometimes more of once. So while the general requirements for module Foo are more or less consistent, documents talking of it from 2005 to, say 2018 describes either bug fixes or small enhancements. In 2019 our main Foo provider completely changed their product and therefore the interface, so the 2 docs talking of Foo in 2019 are "more authoritative" than anything before that date... but then COVID happened so we have now Foo 3.0 which was implemented in late 2022 and is now being idly maintained with, again, small enhancements and fixes.
Documents have IDs which include an always increasing number (they start their life as Jira Issues)so just saying "newer=more accurate/valid/authoritative" could help, but I hope we do not need to rank/tag/grade every single document manually in order to assess how much weight it has on any specific topic.
Is this something that needs special treatment or will it just "work"?
This looks great. Integrations with Graphrag framework is helpful.
Are you able to monitor Token and Cost per user and per session?
Also, lots of times users have the same question worded differently .Is there a cost effective way of answering them from cache?
[dead]
not this again, we've already seen hundreds of such things...
I won't comment on here much, but why even post some of this response? This is a nice piece of work.
this is a very thoughtful comment, thank you for sharing it!