Quickstart on MacOS

Docker

Prerequisites

Docker and Docker Compose
Your hardware should meet these specifications:
- A minimum of 24 GB of unified memory (32 GB recommended). 16 GB may work with varying degrees of success.
- An Apple Silicon based Mac

Installation

We provide precompiled Docker images for OpenRAG and its dashboard companion, Indexer-UI.

You will need the following docker-compose.yaml and .env files to get started:

docker-compose.yaml
.env

x-openrag: &openrag_template
  image: linagoraai/openrag:macOS_poc
  volumes:
    - ./data:/app/data
    - ./.cache/huggingface:/app/model_weights # Model weights for RAG
    - ./ray_mount/.env:/ray_mount/.env # Shared environment variables
    - ./ray_mount/logs:/app/logs
  ports:
    - 8090:8080
    - 8265:8265 # Disable when in cluster mode
  networks:
    default:
      aliases:
        - openrag
  env_file:
    - .env
  environment:
    - APP_PORT=8090
    - AUTH_TOKEN=OpenRAG
    - RERANKER_ENABLED=false
    - MARKER_MAX_PROCESSES=1
    - INDEXERUI_COMPOSE_FILE=true           # Does not serve any purpose but needs to be enabled until PR is merged
    - INDEXERUI_PORT=8067                   # Here as well
    - INDEXERUI_URL=http://localhost:8067   # Here as well
    - RAY_DEDUP_LOGS=0
    - RAY_ENABLE_UV_RUN_RUNTIME_ENV=0s
    - RAY_memory_monitor_refresh_ms=0
  shm_size: 10.24gb

services:
  openrag:
    <<: *openrag_template
    deploy: {}
    depends_on:
    - milvus
    - ollama

  rdb:
    image: postgres:15
    environment:
      - POSTGRES_PASSWORD=root
      - POSTGRES_USER=root
    volumes:
      - ./db:/var/lib/postgresql/data

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ./volumes/ollama:/root/.ollama
      - ./ollama-entrypoint.sh:/entrypoint.sh
    restart: unless-stopped
    entrypoint: ["/usr/bin/bash", "/entrypoint.sh"]

  etcd:
    image: quay.io/coreos/etcd:v3.5.16
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ./volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - ./volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  milvus:
    image: milvusdb/milvus:v2.5.4
    command: ["milvus", "run", "standalone"]
    security_opt:
    - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ./volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
    depends_on:
      - "etcd"
      - "minio"

  indexer-ui:
    image: linagoraai/indexer-ui:latest
    ports:
      - "8067:3000"
    environment:
      - API_BASE_URL=http://localhost:8090
      - INCLUDE_CREDENTIALS=true
    restart: unless-stopped

# LLM - For conversation
BASE_URL=
API_KEY=
MODEL=

# VLM - For image interpretation
VLM_BASE_URL=
VLM_API_KEY=
VLM_MODEL=

# EMBEDDER - For text vectorization
EMBEDDER_BASE_URL=
EMBEDDER_MODEL_NAME=
EMBEDDER_API_KEY=

Configuration

By default, the only necessary configuration change is to set the model settings in the .env file. Make sure all three models are set (they can be the same one if it supports vision, language, and embedding) If using ollama, ensure you pull the desired models locally using the ollama CLI (keep in mind that ollama needs to be running to pull models):

ollama pull qwen3:0.6b

BASE_URL=http://ollama:11434
API_KEY=EMPTY
MODEL=qwen3:0.6b

Optimizations

As stated earlier, Docker for MacOS does not support GPU acceleration. Therefore, to maximize performance, we recommend using a non-dockerized installation of ollama or LlamaCpp, or running models from an external server. For simplicity, we still provide a dockerized setup here.