Quick Start

OpenRAG is an open source Retrieval Augmented Generation (RAG) solution. This guide is a step-by-step walkthrough to help you get started with OpenRAG.

Docker

Prerequisites

Docker and Docker Compose
Your hardware should meet these specifications:
- CPU deployment: Minimum 13 GiB RAM for light PDF parsers (PyMuPDF4LLMLoader, PyMuPDFLoader), or 23 GiB RAM for heavier parsers like MarkerLoader (refer to this section for details)
- GPU deployment: 16 GB GPU memory recommended (for systems with separate CPU and GPU memory)

Installation and Configuration

1. Clone the repository:

git clone --recurse-submodules git@github.com:linagora/openrag.git

cd openrag/
git checkout main # or a given release

2. Create a `.env` File

Create a .env file at the root of the project, mirroring the structure of .env.example, to configure your environment and supply blank environment variables.

cp .env.example .env

Here is a brief overview of key environment variables to configure:

# LLM
BASE_URL=
API_KEY=
MODEL=

# VLM (Visual Language Model) you can set it to the same as LLM if your LLM supports images
VLM_BASE_URL=
VLM_API_KEY=
VLM_MODEL=

## FastAPI App (no need to change it)
# APP_PORT=8080 # this is the forwarded port
# API_NUM_WORKERS=1 # Number of uvicorn workers for the FastAPI app

## To enable API HTTP authentication via HTTPBearer
# AUTH_TOKEN=sk-openrag-1234

# SAVE_UPLOADED_FILES=true # usefull for chainlit (chat interface) source viewing

# Set to true, it will mount chainlit chat ui to the fastapi app (Default: true)
## WITH_CHAINLIT_UI=true

# EMBEDDER
EMBEDDER_MODEL_NAME=jinaai/jina-embeddings-v3 # or other embedder from huggingface compatible with vllm
# EMBEDDER_BASE_URL=http://vllm:8000/v1
# EMBEDDER_API_KEY=EMPTY


# RETRIEVER
# RETRIEVER_TOP_K=20 # number of top documents to retrieve, before reranking (lower (~10) is faster on CPU | on GPU, you can try to increase the value (~40) ).

# RERANKER
RERANKER_ENABLED=true # deactivate the reranker if your CPU is not powerful enough
RERANKER_MODEL=Alibaba-NLP/gte-multilingual-reranker-base # or jinaai/jina-reranker-v2-base-multilingual

# Prompts
PROMPTS_DIR=../prompts/example3_en # you can change it to ../prompts/example3 for french prompts

# Ray
RAY_DEDUP_LOGS=0 # turns off ray log deduplication that appear across multiple processes
RAY_ENABLE_RECORD_ACTOR_TASK_LOGGING=1 # # to enable logs at task level in ray dashboard
RAY_task_retry_delay_ms=3000
RAY_ENABLE_UV_RUN_RUNTIME_ENV=0 # critical with the newest version of UV

# Indexer UI
## 1. replace X.X.X.X with localhost if launching local or with your server IP
## 2. Used by the frondend. Replace APP_PORT (8080 by default) with the actual port number of your FastAPI backend
## 3. Replace INDEXERUI_PORT with its value in the INDEXERUI_URL variable

INCLUDE_CREDENTIALS=false                       # set true if fastapi authentification is enabled, i.e AUTH_TOKEN is set
INDEXERUI_PORT=3042                             # Port to expose the Indexer UI (default is 3042)
INDEXERUI_URL='http://X.X.X.X:INDEXERUI_PORT'
API_BASE_URL='http://X.X.X.X:APP_PORT'          # Base URL of your FastAPI backend.

3. File Parser configuration

All supported file format parsers are pre-configured. For PDF processing, MarkerLoader serves as the default parser, offering comprehensive support for OCR-scanned documents, complex layouts, tables, and embedded images. MarkerLoader operates efficiently on both GPU and CPU environments.

4. Deployment

`Simple and quick` launch for testing

OpenRAG repository contains a ready-to-use docker-compose.yml file in the quick_start folder. This setup is ideal for local testing and quick deployments.

Directoryquick_start
- Directoryextern reranker and embedder utils
  - Directoryvllm cpu dockerfile for different architectures
    Dockerfile.cpu for x86 CPU
  - infinity.yaml reranker service
- Directoryvdb
  - milvus.yaml
- docker-compose.yml
- .env the configured .env file

Navigate to the quick_start directory or copy it
Place your .env file in the quick_start folder
Run the appropriate command for your system:

GPU
x86 CPU

GPU deployment, recommended for optimal performance

docker compose up -d

# run the following command to stop the application
# docker compose down

CPU deployment

docker compose --profile cpu up -d

# to stop the application
# docker compose --profile cpu down

Development Environment

For development builds, use the --build flag to rebuild images: Execute these commands from the project root directory or the cloned repository:

Directory.github/
- …
Directory.hydra_config/
- …
…
Directoryextern/
- …
Directoryvdb/
- …
docker-compose.yml
README.md
.env.example
pyproject.toml
uv.lock
.env the configured .env file

GPU deployment

docker compose up -d

# run the following command to stop the application
# docker compose down