Skip to content

🌟 API Documentation Overview

The FastAPI-powered backend provides a comprehensive document-based question answering system using Retrieval-Augmented Generation (RAG). The API supports semantic search, document indexing, and chat completions across multiple data partitions with full OpenAI compatibility.

All endpoints require authentication when enabled (by adding a authorization token AUTH_TOKEN in your .env). Include your AUTH_TOKEN in the HTTP request header:

Authorization: Bearer YOUR_AUTH_TOKEN

For OpenAI-compatible endpoints, AUTH_TOKEN serves as the api_key parameter. Use a placeholder like 'sk-1234' when authentication is disabled (necessary for when using OpenAI client).


This API can be served using Uvicorn (default) or Ray Serve for distributed deployments.

By default, the backend uses uvicorn to serve the FastAPI app.

To enable Ray Serve, set the following environment variable:

.env
ENABLE_RAY_SERVE=true

Additional optional environment variables for configuring Ray Serve:

.env
RAY_SERVE_NUM_REPLICAS=1 # Number of deployment replicas
RAY_SERVE_HOST=0.0.0.0 # Host address for Ray Serve HTTP proxy
RAY_SERVE_PORT=8080 # Port for Ray Serve HTTP proxy

When using Ray Serve with a remote cluster, the HTTP server will be started on the head node of the cluster.

Verify server status and availability.

GET /health_check

Get openRAG version

GET /version

POST /indexer/partition/{partition}/file/{file_id}

Upload a new file to a specific partition for indexing.

Parameters:

  • partition (path): Target partition name
  • file_id (path): Unique identifier for the file

Request Body (form-data):

  • file (binary): File to upload
  • metadata (JSON string): File metadata (e.g., {"owner": "user1"})

Responses:

  • 201 Created: Returns task status URL
  • 409 Conflict: File already exists in partition
Upload files while modeling relations between them
Section titled “Upload files while modeling relations between them”

OpenRAG supports document relationships to enable context-aware retrieval. You can model relationships between files using the metadata field during upload. Different relationship types can be represented using the relationship_id and parent_id metadata fields, depending on the use case: folder-based relationships, email threads, etc. (see Document Relationships documentation for more details).

POST /indexer/partition/{partition}/file/{file_id}
Authorization: Bearer YOUR_AUTH_TOKEN
Content-Type: multipart/form-data
file: <binary data>
metadata: {
"relationship_id": "documents/projects/2024/q1",
...
}
  • for email threads, one can rely on both relationship_id (to group emails in the same thread) and parent_id (to model reply hierarchies within the thread). See the Document Relationships documentation for more details and examples.

Example: Original Email (Root)

POST /indexer/partition/emails/file/email_a_id
Authorization: Bearer YOUR_AUTH_TOKEN
Content-Type: multipart/form-data
file: <email binary data>
metadata: {
"relationship_id": "thread-123",
"parent_id": null,
...
}

Example: Reply Email (Child)

POST /indexer/partition/emails/file/email_b_id
Authorization: Bearer YOUR_AUTH_TOKEN
Content-Type: multipart/form-data
file: <email binary data>
metadata: {
"relationship_id": "thread-123",
"parent_id": "email_a_id",
...
}

For context-aware search, see search endpoints and relationship-based file fetching.

PUT /indexer/partition/{partition}/file/{file_id}

Replace an existing file in the partition. Deletes the current entry and creates a new indexing task.

Parameters: Same as POST endpoint Request Body: Same as POST endpoint Response: 202 Accepted with task status URL

PATCH /indexer/partition/{partition}/file/{file_id}

Update file metadata without reindexing the document.

Request Body (form-data):

  • metadata (JSON string): Updated metadata

Response: 200 OK on successful update

DELETE /indexer/partition/{partition}/file/{file_id}

Remove a file from the specified partition.

Responses:

  • 204 No Content: Successfully deleted
  • 404 Not Found: File not found in partition
GET /indexer/task/{task_id}

Monitor the progress of an asynchronous indexing task.

Response: Task status information


GET /indexer/task/{task_id}/logs
GET /indexer/task/{task_id}/error
  • Search Across Multiple Partitions
GET /search/

Perform semantic search across specified partitions.

Query Parameters:

ParameterTypeDefaultDescription
partitions (optional)array[“all”]Partitions to search. (optional)
textstringrequiredSearch query
top_k (optional)integer5Number of initial results (optional)
include_related (optional)booleanfalseInclude chunks from files with same relationship_id
include_ancestors (optional)booleanfalseInclude chunks from ancestor files (via parent_id chain)
related_limit (optional)integer20Max related/ancestor chunks to fetch per result (used when include_related or include_ancestors is true)

Responses:

  • 200 OK: JSON list of document links (HATEOAS format)
  • 400 Bad Request: Invalid partitions parameter
  • Search Within Single Partition
GET /search/partition/{partition}

Search within a specific partition only.

Query Parameters:

ParameterTypeDefaultDescription
textstringrequiredSearch query
top_k (optional)integer5Number of initial results (optional)
include_related (optional)booleanfalseInclude chunks from files with same relationship_id
include_ancestors (optional)booleanfalseInclude chunks from ancestor files (via parent_id chain)
related_limit (optional)integer20Max related/ancestor chunks to fetch per result (used when include_related or include_ancestors is true)

Response: Same as multi-partition search

  • Search Within Specific File
GET /search/partition/{partition}/file/{file_id}

Search within a particular file in a partition.

Query Parameters: Same as partition search Response: Same as other search endpoints


GET /extract/{extract_id}

Retrieve specific document extract (chunk) by ID.

Response: JSON containing extract content and metadata


  • Get Files by Relationship
GET /{partition}/relationships/{relationship_id}

Returns all files sharing the same relationship_id within a partition.

Parameters:

  • partition — partition name
  • relationship_id — the relationship group identifier (client-defined)

Response:

{
"files": [
{
"file_id": "doc-a-id",
"filename": "Document A",
"relationship_id": "group-123",
"parent_id": null
},
{
"file_id": "doc-b-id",
"filename": "Document B",
"relationship_id": "group-123",
"parent_id": "doc-a-id"
}
]
}
  • Get File Ancestors
GET /{partition}/file/{file_id}/ancestors

Returns the complete ancestor path from root to the specified file.

Parameters:

  • partition — partition name
  • file_id — the file to trace ancestors for
  • max_ancestor_depth (optional) — limit on ancestor depth to return. None means unlimited.

Response:

{
"ancestors": [
{
"file_id": "email-a-id",
"filename": "Original Email",
"parent_id": null
},
{
"file_id": "email-b-id",
"filename": "First Reply",
"parent_id": "email-a-id"
},
{
"file_id": "email-c-id",
"filename": "Second Reply",
"parent_id": "email-b-id"
}
]
}

These endpoints provide full OpenAI API compatibility for seamless integration with existing tools and workflows. For detailed example of openai usage see this section

  • List Available Models
GET /v1/models

List all available RAG models (partitions).

Model Naming Convention:

  • Pattern: openrag-{partition_name} => This model allows to chat specifically with the partition {partition_name}
  • Special model: partition-all (queries entire vector database)
  • Chat Completions
POST /v1/chat/completions

OpenAI-compatible chat completion using RAG pipeline.

Request Body:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_AUTH_TOKEN" \
-d '{
"model": "openrag-{partition_name}",
"messages": [
{
"role": "user",
"content": "Your question here"
}
],
"temperature": 0.1,
"stream": false
}'

You can also direclty use this endpoint with no RAG pipeline, i.e. to directly use the LLM. For that, instead of using the openrag prefix for the model, you can:

  • Specify no model
  • Specify an empty model
  • Specify the openRAG configured model, e.g. Mistral-Small-3.1-24B-Instruct-2503.

Request Body:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_AUTH_TOKEN" \
-d '{
"model": "",
"messages": [
{
"role": "user",
"content": "Your question here"
}
],
"temperature": 0.1,
"stream": false
}'
  • Text Completions
POST /v1/completions

OpenAI-compatible text completion endpoint.

  • When using the openai endpoint /v1/chat/completions, one can provide extra arguments in the request body to customize the RAG behavior:
  • spoken_style_answer: boolean (default: false) - If true, the model will generate a succint spoken style conversational answer based on the retrieved documents.

  • use_map_reduce: boolean (default: false) - If true, the model will use a map-reduce strategy to aggregate information from multiple documents. For more information see the map-reduce documentation.

  • llm_override: object (optional) - Route the request to a different LLM endpoint while still using OpenRAG’s RAG pipeline (retrieval, reranking, prompt construction). Accepts the following fields:

    • base_url: string - Base URL of the target LLM API (e.g. https://api.openai.com/v1)
    • api_key: string - API key for the target LLM
    • model: string - Model name to use on the target endpoint

    Any field not provided falls back to the default OpenRAG LLM configuration.

These arguments are supplied via the metadata field of the OpenAI request body. Example:

Enabling conversational answer with openai chat completions endpoint
curl -X 'POST' 'http://localhost:8080/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer YOUR_AUTH_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"model": "openrag-{partition_name}",
"messages": [
{
"role": "user",
"content": "your_query"
}
],
"temperature": 0.3,
"stream": false,
"metadata": {
"spoken_style_answer": true
}
}'
Using a custom LLM endpoint with OpenRAG's RAG pipeline
curl -X 'POST' 'http://localhost:8080/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer YOUR_AUTH_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"model": "openrag-{partition_name}",
"messages": [
{
"role": "user",
"content": "your_query"
}
],
"stream": false,
"metadata": {
"llm_override": {
"base_url": "https://api.openai.com/v1",
"api_key": "sk-your-openai-key",
"model": "gpt-4o"
}
}
}'

Tools are useful features that can be called directly by the client.

  • List available tools
GET /v1/tools

Request:

curl http://localhost:8080/v1/tools

Response:

[
{
"name": "Tool name",
"description": "Tool description"
}
]
  • Execute a tool
POST /v1/tools/execute

The parameters are given in multipart.

Request:

curl -X POST http://localhost:8080/v1/tools/execute \
-H "Content-Type: multipart/form-data" \
-H "Authorization: Bearer YOUR_AUTH_TOKEN" \
-F "file=@file.pdf" \
-F 'tool={"name":"extractText"}' \
-F 'metadata={"mime":"application/pdf","name":"test.pdf"}' \

Response:

{
"message": "File content"
}

For indexing multiple files programmatically, you can use this script data_indexer.py utility script in the 📁 utility folder or simply use indexer ui.

from openai import OpenAI, AsyncOpenAI
api_base_url = "http://localhost:8080" # fastapi base url of 'openrag'
base_url = f"{api_base_url}/v1"
auth_key = ... # your api authentification key AUTH_TOKEN in your .env. Is authentification is disabled, use a placeholder like 'sk-1234'
client = OpenAI(api_key=auth_key, base_url=base_url)
your_partition= 'my_partition' # name of your partition
model = f"openrag-{your_partition}"
settings = {
'model': model,
'temperature': 0.3,
'stream': False
}
response = client.chat.completions.create(
**settings,
messages=[
{"role": "user", "content": "What information do you have about...?"}
]
)

The API uses standard HTTP status codes:

  • 200 OK: Successful request
  • 201 Created: Resource created successfully
  • 202 Accepted: Request accepted for processing
  • 204 No Content: Successful deletion
  • 400 Bad Request: Invalid request parameters
  • 404 Not Found: Resource not found
  • 409 Conflict: Resource already exists

Error responses include detailed JSON messages to help with debugging and integration.

An external indexer uses an admin token to create a user, then uses the returned user token for all subsequent operations.

sequenceDiagram
    participant Indexer
    participant OpenRAG

    Note over Indexer, OpenRAG: 1. Create User (admin token)

    Indexer->>OpenRAG: POST /users<br/>Authorization: Bearer {admin_token}<br/>{display_name: "alice"}
    OpenRAG-->>Indexer: 201 {id: 2, token: "or-xxx..."}
    Indexer->>Indexer: Store user token "or-xxx..."

    Note over Indexer, OpenRAG: 2. Create Partition (user token). Automatically grants owner rights to the user

    Indexer->>OpenRAG: POST /partition/{partition}<br/>Authorization: Bearer or-xxx...
    OpenRAG-->>Indexer: 201 Created

    Note over Indexer, OpenRAG: 3. Index a File (user token)

    Indexer->>Indexer: New data

    Indexer->>OpenRAG: POST /indexer/partition/{partition}/file/{file_id}<br/>Authorization: Bearer or-xxx...<br/>Body: multipart (file + metadata)
    OpenRAG->>OpenRAG: Validate token, check file quota
    OpenRAG-->>Indexer: 201 {task_status_url: "{task_url}"}

    Note over OpenRAG: Background: serialize → chunk → embed → store

Query indexed documents via the OpenAI-compatible chat completions endpoint.

sequenceDiagram
    participant Client
    participant OpenRAG
    participant LLM

    Client->>OpenRAG: POST /v1/chat/completions<br/>Authorization: Bearer or-xxx...<br/>{model: "openrag-{partition}",<br/>messages: [...], stream: true}

    OpenRAG->>OpenRAG: Authenticate user
    OpenRAG->>OpenRAG: Resolve partition from model name
    OpenRAG->>OpenRAG: Check user has access to partition
    OpenRAG->>OpenRAG: Retrieve relevant documents
    OpenRAG->>OpenRAG: Build prompt with retrieved context
    OpenRAG->>LLM: Query LLM

    loop SSE Stream
        OpenRAG-->>Client: data: {delta: {content: "..."},<br/>sources: [...]}
    end
    OpenRAG-->>Client: data: [DONE]