Skip to content

Ray Cluster

⚡ Distributed Deployment in a Ray Cluster

Section titled “⚡ Distributed Deployment in a Ray Cluster”

This guide explains how to deploy OpenRAG across multiple machines using Ray for distributed indexing and processing.


Ensure your .env file includes the standard app variables plus Ray-specific ones listed below:

.env
# Ray
# Resources for all files
RAY_NUM_GPUS=0.1
RAY_POOL_SIZE=1
RAY_MAX_TASKS_PER_WORKER=5
# PDF specific resources when using marker
MARKER_MAX_TASKS_PER_CHILD=10
MARKER_MAX_PROCESSES=5 # Number of subprocesses <-> Number of concurrent pdfs per worker
MARKER_MIN_PROCESSES=3 # Minimum number of subprocesses available before triggering a process pool reset.
MARKER_POOL_SIZE=1 # Number of workers (typically 1 worker per cluster node)
MARKER_NUM_GPUS=0.6
SHARED_ENV=/ray_mount/.env
RAY_DASHBOARD_PORT=8265
RAY_ADDRESS=ray://X.X.X.X:10001
HEAD_NODE_IP=X.X.X.X
RAY_HEAD_ADDRESS=X.X.X.X:6379
# RAY_ENABLE_RECORD_ACTOR_TASK_LOGGING=1 # to enable logs at task level in ray dashboard
RAY_task_retry_delay_ms=3000
# Ray volumes
DATA_VOLUME=/ray_mount/data
MODEL_WEIGHTS_VOLUME=/ray_mount/model_weights
CONFIG_VOLUME=/ray_mount/.hydra_config
UV_LINK_MODE=copy
UV_CACHE_DIR=/tmp/uv-cache

✅ Use host IPs instead of Docker service names :

.env
EMBEDDER_BASE_URL=http://vllm:8000/v1
EMBEDDER_BASE_URL=http://<HOST-IP>:8000/v1 # ✅ instead of http://vllm:8000/v1
VDB_HOST=milvus
VDB_HOST=<HOST-IP> # ✅ instead of VDB_HOST=milvus

All nodes need to access shared configuration and data folders.
We recommend using GlusterFS for this.

➡ Follow the GlusterFS Setup Guide to configure:

  • Shared access to:
    • .env
    • .hydra_config
    • /data (uploaded files)
    • /model_weights (embedding model cache)

First, prepare your cluster.yaml file. Here’s an example for a local provider:

cluster.yaml
cluster_name: rag-cluster
provider:
type: local
head_ip: 10.0.0.1
worker_ips: [10.0.0.2] # Static IPs of other nodes (does not auto-start workers)
docker:
image: ghcr.io/linagora/openrag-ray
pull_before_run: true
container_name: ray_node
run_options:
- --gpus all
- -v /ray_mount/model_weights:/app/model_weights
- -v /ray_mount/data:/app/data
- -v /ray_mount/.hydra_config:/app/.hydra_config
- -v /ray_mount/logs:/app/logs
- --env-file /ray_mount/.env
auth:
ssh_user: ubuntu
ssh_private_key: path/to/private/key # Replace with your actual ssh key path
head_start_ray_commands:
- uv run ray stop
- uv run ray start --head --dashboard-host 0.0.0.0 --dashboard-port ${RAY_DASHBOARD_PORT:-8265} --node-ip-address ${HEAD_NODE_IP} --autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
- uv run ray stop
- uv run ray start --address ${HEAD_NODE_IP:-10.0.0.1}:6379

🛠️ The base image (ghcr.io/linagora/openrag-ray) must be built from Dockerfile.ray and pushed to a container registry before use.

Terminal window
uv run ray up -y cluster.yaml

Use the Docker Compose setup:

Terminal window
docker compose up -d

Once running, OpenRAG will auto-connect to the Ray cluster using RAY_ADDRESS from .env.


With this setup, your app is now fully distributed and ready to handle concurrent tasks across your Ray cluster.

If you encounter errors like Permission denied when Ray or Docker tries to access shared folders (SQL database, model files, …), it’s likely due to insufficient permissions on the host system.

👉 To resolve this, you can set full read/write/execute permissions on the shared directory:

Terminal window
sudo chmod -R 777 /ray_mount