🌟 API Documentation Overview
The FastAPI-powered backend provides a comprehensive document-based question answering system using Retrieval-Augmented Generation (RAG). The API supports semantic search, document indexing, and chat completions across multiple data partitions with full OpenAI compatibility.
🔐 Authentication
Section titled “🔐 Authentication”All endpoints require authentication when enabled (by adding a authorization token AUTH_TOKEN in your .env). Include your AUTH_TOKEN in the HTTP request header:
Authorization: Bearer YOUR_AUTH_TOKENFor OpenAI-compatible endpoints, AUTH_TOKEN serves as the api_key parameter. Use a placeholder like 'sk-1234' when authentication is disabled (necessary for when using OpenAI client).
📡 API Serving Modes
Section titled “📡 API Serving Modes”This API can be served using Uvicorn (default) or Ray Serve for distributed deployments.
By default, the backend uses uvicorn to serve the FastAPI app.
To enable Ray Serve, set the following environment variable:
ENABLE_RAY_SERVE=trueAdditional optional environment variables for configuring Ray Serve:
RAY_SERVE_NUM_REPLICAS=1 # Number of deployment replicasRAY_SERVE_HOST=0.0.0.0 # Host address for Ray Serve HTTP proxyRAY_SERVE_PORT=8080 # Port for Ray Serve HTTP proxyWhen using Ray Serve with a remote cluster, the HTTP server will be started on the head node of the cluster.
🚀 API Endpoints
Section titled “🚀 API Endpoints”ℹ️ System Health
Section titled “ℹ️ System Health”Verify server status and availability.
GET /health_checkℹ️ openRAG version
Section titled “ℹ️ openRAG version”Get openRAG version
GET /version📦 Document Indexing
Section titled “📦 Document Indexing”Upload New File
Section titled “Upload New File”POST /indexer/partition/{partition}/file/{file_id}Upload a new file to a specific partition for indexing.
Parameters:
partition(path): Target partition namefile_id(path): Unique identifier for the file
Request Body (form-data):
file(binary): File to uploadmetadata(JSON string): File metadata (e.g.,{"owner": "user1"})
Responses:
201 Created: Returns task status URL409 Conflict: File already exists in partition
Upload files while modeling relations between them
Section titled “Upload files while modeling relations between them”OpenRAG supports document relationships to enable context-aware retrieval.
You can model relationships between files using the metadata field during upload. Different relationship types can be represented using the relationship_id and parent_id metadata fields, depending on the use case: folder-based relationships, email threads, etc. (see Document Relationships documentation for more details).
- To represent native simple folder-based relationships between files, rely exclusively on
relationship_id:
POST /indexer/partition/{partition}/file/{file_id}Authorization: Bearer YOUR_AUTH_TOKENContent-Type: multipart/form-data
file: <binary data>metadata: { "relationship_id": "documents/projects/2024/q1", ...}- for email threads, one can rely on both
relationship_id(to group emails in the same thread) andparent_id(to model reply hierarchies within the thread). See the Document Relationships documentation for more details and examples.
Example: Original Email (Root)
POST /indexer/partition/emails/file/email_a_idAuthorization: Bearer YOUR_AUTH_TOKENContent-Type: multipart/form-data
file: <email binary data>metadata: { "relationship_id": "thread-123", "parent_id": null, ...}Example: Reply Email (Child)
POST /indexer/partition/emails/file/email_b_idAuthorization: Bearer YOUR_AUTH_TOKENContent-Type: multipart/form-data
file: <email binary data>metadata: { "relationship_id": "thread-123", "parent_id": "email_a_id", ...}For context-aware search, see search endpoints and relationship-based file fetching.
Replace Existing File
Section titled “Replace Existing File”PUT /indexer/partition/{partition}/file/{file_id}Replace an existing file in the partition. Deletes the current entry and creates a new indexing task.
Parameters: Same as POST endpoint
Request Body: Same as POST endpoint
Response: 202 Accepted with task status URL
Update File Metadata
Section titled “Update File Metadata”PATCH /indexer/partition/{partition}/file/{file_id}Update file metadata without reindexing the document.
Request Body (form-data):
metadata(JSON string): Updated metadata
Response: 200 OK on successful update
Delete File
Section titled “Delete File”DELETE /indexer/partition/{partition}/file/{file_id}Remove a file from the specified partition.
Responses:
204 No Content: Successfully deleted404 Not Found: File not found in partition
Check Indexing Status
Section titled “Check Indexing Status”GET /indexer/task/{task_id}Monitor the progress of an asynchronous indexing task.
Response: Task status information
See logs of a given task
Section titled “See logs of a given task”GET /indexer/task/{task_id}/logsGet error details of a failed task
Section titled “Get error details of a failed task”GET /indexer/task/{task_id}/error🔍 Semantic Search
Section titled “🔍 Semantic Search”- Search Across Multiple Partitions
GET /search/Perform semantic search across specified partitions.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
partitions (optional) | array | [“all”] | Partitions to search. (optional) |
text | string | required | Search query |
top_k (optional) | integer | 5 | Number of initial results (optional) |
include_related (optional) | boolean | false | Include chunks from files with same relationship_id |
include_ancestors (optional) | boolean | false | Include chunks from ancestor files (via parent_id chain) |
related_limit (optional) | integer | 20 | Max related/ancestor chunks to fetch per result (used when include_related or include_ancestors is true) |
Responses:
200 OK: JSON list of document links (HATEOAS format)400 Bad Request: Invalid partitions parameter
- Search Within Single Partition
GET /search/partition/{partition}Search within a specific partition only.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
text | string | required | Search query |
top_k (optional) | integer | 5 | Number of initial results (optional) |
include_related (optional) | boolean | false | Include chunks from files with same relationship_id |
include_ancestors (optional) | boolean | false | Include chunks from ancestor files (via parent_id chain) |
related_limit (optional) | integer | 20 | Max related/ancestor chunks to fetch per result (used when include_related or include_ancestors is true) |
Response: Same as multi-partition search
- Search Within Specific File
GET /search/partition/{partition}/file/{file_id}Search within a particular file in a partition.
Query Parameters: Same as partition search Response: Same as other search endpoints
📄 Document Extraction
Section titled “📄 Document Extraction”Get Extract Details
Section titled “Get Extract Details”GET /extract/{extract_id}Retrieve specific document extract (chunk) by ID.
Response: JSON containing extract content and metadata
Partitions & files Management
Section titled “Partitions & files Management”- Get Files by Relationship
GET /{partition}/relationships/{relationship_id}Returns all files sharing the same relationship_id within a partition.
Parameters:
partition— partition namerelationship_id— the relationship group identifier (client-defined)
Response:
{ "files": [ { "file_id": "doc-a-id", "filename": "Document A", "relationship_id": "group-123", "parent_id": null }, { "file_id": "doc-b-id", "filename": "Document B", "relationship_id": "group-123", "parent_id": "doc-a-id" } ]}- Get File Ancestors
GET /{partition}/file/{file_id}/ancestorsReturns the complete ancestor path from root to the specified file.
Parameters:
partition— partition namefile_id— the file to trace ancestors formax_ancestor_depth(optional) — limit on ancestor depth to return. None means unlimited.
Response:
{ "ancestors": [ { "file_id": "email-a-id", "filename": "Original Email", "parent_id": null }, { "file_id": "email-b-id", "filename": "First Reply", "parent_id": "email-a-id" }, { "file_id": "email-c-id", "filename": "Second Reply", "parent_id": "email-b-id" } ]}💬 OpenAI-Compatible Chat
Section titled “💬 OpenAI-Compatible Chat”These endpoints provide full OpenAI API compatibility for seamless integration with existing tools and workflows. For detailed example of openai usage see this section
- List Available Models
GET /v1/modelsList all available RAG models (partitions).
Model Naming Convention:
- Pattern:
openrag-{partition_name}=> This model allows to chat specifically with the partition{partition_name} - Special model:
partition-all(queries entire vector database)
- Chat Completions
POST /v1/chat/completionsOpenAI-compatible chat completion using RAG pipeline.
Request Body:
curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_AUTH_TOKEN" \ -d '{ "model": "openrag-{partition_name}", "messages": [ { "role": "user", "content": "Your question here" } ], "temperature": 0.1, "stream": false }'You can also direclty use this endpoint with no RAG pipeline, i.e. to directly use the LLM.
For that, instead of using the openrag prefix for the model, you can:
- Specify no model
- Specify an empty model
- Specify the openRAG configured model, e.g.
Mistral-Small-3.1-24B-Instruct-2503.
Request Body:
curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_AUTH_TOKEN" \ -d '{ "model": "", "messages": [ { "role": "user", "content": "Your question here" } ], "temperature": 0.1, "stream": false }'- Text Completions
POST /v1/completionsOpenAI-compatible text completion endpoint.
Extra arguments
Section titled “Extra arguments”- When using the openai endpoint /v1/chat/completions, one can provide extra arguments in the request body to customize the RAG behavior:
-
spoken_style_answer: boolean (default: false) - If true, the model will generate a succint spoken style conversational answer based on the retrieved documents. -
use_map_reduce: boolean (default: false) - If true, the model will use a map-reduce strategy to aggregate information from multiple documents. For more information see the map-reduce documentation. -
llm_override: object (optional) - Route the request to a different LLM endpoint while still using OpenRAG’s RAG pipeline (retrieval, reranking, prompt construction). Accepts the following fields:base_url: string - Base URL of the target LLM API (e.g.https://api.openai.com/v1)api_key: string - API key for the target LLMmodel: string - Model name to use on the target endpoint
Any field not provided falls back to the default OpenRAG LLM configuration.
These arguments are supplied via the metadata field of the OpenAI request body. Example:
curl -X 'POST' 'http://localhost:8080/v1/chat/completions' \ -H 'accept: application/json' \ -H 'Authorization: Bearer YOUR_AUTH_TOKEN' \ -H 'Content-Type: application/json' \ -d '{ "model": "openrag-{partition_name}", "messages": [ { "role": "user", "content": "your_query" } ], "temperature": 0.3, "stream": false, "metadata": { "spoken_style_answer": true }}'curl -X 'POST' 'http://localhost:8080/v1/chat/completions' \ -H 'accept: application/json' \ -H 'Authorization: Bearer YOUR_AUTH_TOKEN' \ -H 'Content-Type: application/json' \ -d '{ "model": "openrag-{partition_name}", "messages": [ { "role": "user", "content": "your_query" } ], "stream": false, "metadata": { "llm_override": { "base_url": "https://api.openai.com/v1", "api_key": "sk-your-openai-key", "model": "gpt-4o" } }}'🔧 Tools
Section titled “🔧 Tools”Tools are useful features that can be called directly by the client.
- List available tools
GET /v1/toolsRequest:
curl http://localhost:8080/v1/toolsResponse:
[ { "name": "Tool name", "description": "Tool description" }]- Execute a tool
POST /v1/tools/executeThe parameters are given in multipart.
Request:
curl -X POST http://localhost:8080/v1/tools/execute \ -H "Content-Type: multipart/form-data" \ -H "Authorization: Bearer YOUR_AUTH_TOKEN" \ -F "file=@file.pdf" \ -F 'tool={"name":"extractText"}' \ -F 'metadata={"mime":"application/pdf","name":"test.pdf"}' \Response:
{ "message": "File content"}💡 Usage Examples
Section titled “💡 Usage Examples”Bulk File Indexing
Section titled “Bulk File Indexing”For indexing multiple files programmatically, you can use this script data_indexer.py utility script in the 📁 utility folder or simply use indexer ui.
Example OpenAI Client Usage
Section titled “Example OpenAI Client Usage”from openai import OpenAI, AsyncOpenAI
api_base_url = "http://localhost:8080" # fastapi base url of 'openrag'base_url = f"{api_base_url}/v1"
auth_key = ... # your api authentification key AUTH_TOKEN in your .env. Is authentification is disabled, use a placeholder like 'sk-1234'client = OpenAI(api_key=auth_key, base_url=base_url)
your_partition= 'my_partition' # name of your partitionmodel = f"openrag-{your_partition}"settings = { 'model': model, 'temperature': 0.3, 'stream': False}
response = client.chat.completions.create( **settings, messages=[ {"role": "user", "content": "What information do you have about...?"} ])⚠️ Error Handling
Section titled “⚠️ Error Handling”The API uses standard HTTP status codes:
200 OK: Successful request201 Created: Resource created successfully202 Accepted: Request accepted for processing204 No Content: Successful deletion400 Bad Request: Invalid request parameters404 Not Found: Resource not found409 Conflict: Resource already exists
Error responses include detailed JSON messages to help with debugging and integration.
Sequence diagrams
Section titled “Sequence diagrams”External Indexing Flow
Section titled “External Indexing Flow”An external indexer uses an admin token to create a user, then uses the returned user token for all subsequent operations.
sequenceDiagram
participant Indexer
participant OpenRAG
Note over Indexer, OpenRAG: 1. Create User (admin token)
Indexer->>OpenRAG: POST /users<br/>Authorization: Bearer {admin_token}<br/>{display_name: "alice"}
OpenRAG-->>Indexer: 201 {id: 2, token: "or-xxx..."}
Indexer->>Indexer: Store user token "or-xxx..."
Note over Indexer, OpenRAG: 2. Create Partition (user token). Automatically grants owner rights to the user
Indexer->>OpenRAG: POST /partition/{partition}<br/>Authorization: Bearer or-xxx...
OpenRAG-->>Indexer: 201 Created
Note over Indexer, OpenRAG: 3. Index a File (user token)
Indexer->>Indexer: New data
Indexer->>OpenRAG: POST /indexer/partition/{partition}/file/{file_id}<br/>Authorization: Bearer or-xxx...<br/>Body: multipart (file + metadata)
OpenRAG->>OpenRAG: Validate token, check file quota
OpenRAG-->>Indexer: 201 {task_status_url: "{task_url}"}
Note over OpenRAG: Background: serialize → chunk → embed → store
Chat Completion Flow
Section titled “Chat Completion Flow”Query indexed documents via the OpenAI-compatible chat completions endpoint.
sequenceDiagram
participant Client
participant OpenRAG
participant LLM
Client->>OpenRAG: POST /v1/chat/completions<br/>Authorization: Bearer or-xxx...<br/>{model: "openrag-{partition}",<br/>messages: [...], stream: true}
OpenRAG->>OpenRAG: Authenticate user
OpenRAG->>OpenRAG: Resolve partition from model name
OpenRAG->>OpenRAG: Check user has access to partition
OpenRAG->>OpenRAG: Retrieve relevant documents
OpenRAG->>OpenRAG: Build prompt with retrieved context
OpenRAG->>LLM: Query LLM
loop SSE Stream
OpenRAG-->>Client: data: {delta: {content: "..."},<br/>sources: [...]}
end
OpenRAG-->>Client: data: [DONE]