AI augmentation doesn’t work without the right databases and data infrastructure. Each approach (RAG, CAG, KAG) relies on different types of databases to make information accessible, reliable, and actionable.
RAG – Retrieval-Augmented Generation
Databases commonly used
- Pinecone – Vector Database | Cloud SaaS | Proprietary license
- Weaviate – Vector Database | v1.26+ | Apache 2.0 License
- Milvus – Vector Database | v2.4+ | Apache 2.0 License
- FAISS (Meta AI) – Vector Store Library | v1.8+ | MIT License
How it works:
- Stores text, documents, or embeddings in a vector database.
- AI retrieves the most relevant chunks during a query.
Real-World Examples & Applications
- Perplexity AI → Uses retrieval pipelines over web-scale data.
- ChatGPT Enterprise with RAG → Connects company knowledge bases like Confluence, Slack, Google Drive.
- Thomson Reuters Legal → Uses RAG pipelines to deliver compliance-ready legal insights.
CAG – Context-Augmented Generation
Databases commonly used
- PostgreSQL / MySQL – Relational DBs for session history | Open Source (Postgres: PostgreSQL License, MySQL: GPLv2 with exceptions)
- Redis – In-Memory DB for context caching | v7.2+ | BSD 3-Clause License
- MongoDB Atlas – Document DB for user/session data | Server-Side Public License (SSPL)
- ChromaDB – Contextual vector store | v0.5+ | Apache 2.0 License
How it works:
- Stores user session history, preferences, and metadata.
- AI retrieves this contextual data before generating a response.
Real-World Examples & Applications
- Notion AI → Reads project databases (PostgreSQL + Redis caching).
- Duolingo Max → Uses MongoDB-like stores for learner history to adapt lessons.
- GitHub Copilot → Context layer powered by user repo data + embeddings.
- Customer Support AI Agents → Redis + MongoDB for multi-session conversations.
KAG – Knowledge-Augmented Generation
Databases commonly used
- Neo4j – Graph Database | v5.x | GPLv3 / Commercial License
- TigerGraph – Enterprise Graph DB | Proprietary
- ArangoDB – Multi-Model DB (Graph + Doc) | v3.11+ | Apache 2.0 License
- Amazon Neptune – Managed Graph DB | AWS Proprietary
- Wikidata / RDF Triple Stores (Blazegraph, Virtuoso) – Knowledge graph databases | Open Data License
How it works:
- Uses knowledge graphs (nodes + edges) to store structured relationships.
- AI queries these graphs to provide factual, reasoning-based answers.
Real-World Examples & Applications
- Google’s Bard → Uses Google’s Knowledge Graph (billions of triples).
- Siemens Digital Twins → Neo4j knowledge graph powering industrial asset reasoning.
- AstraZeneca Drug Discovery → Neo4j + custom biomedical KGs for linking genes, proteins, and molecules.
- JP Morgan Risk Engine → Uses proprietary graph DB for compliance reporting.
Summary Table
Approach | Database Types | Providers / Examples | License | Real-World Use |
---|---|---|---|---|
RAG | Vector DBs | Pinecone (Proprietary), Weaviate (Apache 2.0), Milvus (Apache 2.0), FAISS (MIT) | Mixed | Perplexity AI, ChatGPT Enterprise, Thomson Reuters |
CAG | Relational / In-Memory / NoSQL | PostgreSQL (Open), MySQL (GPLv2), Redis (BSD), MongoDB Atlas (SSPL), ChromaDB (Apache 2.0) | Mixed | Notion AI, Duolingo Max, GitHub Copilot |
KAG | Graph / Knowledge DBs | Neo4j (GPLv3/Commercial), TigerGraph (Proprietary), ArangoDB (Apache 2.0), Amazon Neptune (AWS), Wikidata (Open) | Mixed | Google Bard, Siemens Digital Twin, AstraZeneca, JP Morgan |
Bibliography
- Pinecone. (2024). Pinecone Vector Database Documentation. Pinecone Systems. Retrieved from https://www.pinecone.io
- Weaviate. (2024). Weaviate: Open-source vector database. Weaviate Docs. Retrieved from https://weaviate.io
- Milvus. (2024). Milvus: Vector Database for AI. Zilliz. Retrieved from https://milvus.io
- Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. FAISS. Meta AI Research. Retrieved from https://faiss.ai
- PostgreSQL Global Development Group. (2024). PostgreSQL 16 Documentation. Retrieved from https://www.postgresql.org
- Redis Inc. (2024). Redis: In-memory data store. Redis Documentation. Retrieved from https://redis.io
- MongoDB Inc. (2024). MongoDB Atlas Documentation. Retrieved from https://www.mongodb.com
- Neo4j Inc. (2024). Neo4j Graph Database Platform. Neo4j Documentation. Retrieved from https://neo4j.com
- Amazon Web Services. (2024). Amazon Neptune Documentation. AWS. Retrieved from https://aws.amazon.com/neptune
- Wikimedia Foundation. (2024). Wikidata: A Free Knowledge Base. Retrieved from https://www.wikidata.org