Skip to content

🌟 API Documentation Overview

The FastAPI-powered backend provides a comprehensive document-based question answering system using Retrieval-Augmented Generation (RAG). The API supports semantic search, document indexing, and chat completions across multiple data partitions with full OpenAI compatibility.

All endpoints require authentication when enabled (by adding a authorization token AUTH_TOKEN in your .env). Include your AUTH_TOKEN in the HTTP request header:

Authorization: Bearer YOUR_AUTH_TOKEN

For OpenAI-compatible endpoints, AUTH_TOKEN serves as the api_key parameter. Use a placeholder like 'sk-1234' when authentication is disabled (necessary for when using OpenAI client).


This API can be served using Uvicorn (default) or Ray Serve for distributed deployments.

By default, the backend uses uvicorn to serve the FastAPI app.

To enable Ray Serve, set the following environment variable:

.env
ENABLE_RAY_SERVE=true

Additional optional environment variables for configuring Ray Serve:

.env
RAY_SERVE_NUM_REPLICAS=1 # Number of deployment replicas
RAY_SERVE_HOST=0.0.0.0 # Host address for Ray Serve HTTP proxy
RAY_SERVE_PORT=8080 # Port for Ray Serve HTTP proxy

When using Ray Serve with a remote cluster, the HTTP server will be started on the head node of the cluster.

Verify server status and availability.

GET /health_check

POST /indexer/partition/{partition}/file/{file_id}

Upload a new file to a specific partition for indexing.

Parameters:

  • partition (path): Target partition name
  • file_id (path): Unique identifier for the file

Request Body (form-data):

  • file (binary): File to upload
  • metadata (JSON string): File metadata (e.g., {"owner": "user1"})

Responses:

  • 201 Created: Returns task status URL
  • 409 Conflict: File already exists in partition
PUT /indexer/partition/{partition}/file/{file_id}

Replace an existing file in the partition. Deletes the current entry and creates a new indexing task.

Parameters: Same as POST endpoint Request Body: Same as POST endpoint Response: 202 Accepted with task status URL

PATCH /indexer/partition/{partition}/file/{file_id}

Update file metadata without reindexing the document.

Request Body (form-data):

  • metadata (JSON string): Updated metadata

Response: 200 OK on successful update

DELETE /indexer/partition/{partition}/file/{file_id}

Remove a file from the specified partition.

Responses:

  • 204 No Content: Successfully deleted
  • 404 Not Found: File not found in partition
GET /indexer/task/{task_id}

Monitor the progress of an asynchronous indexing task.

Response: Task status information


GET /indexer/task/{task_id}/logs
GET /indexer/task/{task_id}/error
GET /search/

Perform semantic search across specified partitions.

Query Parameters:

  • partitions (optional): List of partition names (default: ["all"])
  • text (required): Search query text
  • top_k (optional): Number of results to return (default: 5)

Responses:

  • 200 OK: JSON list of document links (HATEOAS format)
  • 400 Bad Request: Invalid partitions parameter
GET /search/partition/{partition}

Search within a specific partition only.

Query Parameters:

  • text (required): Search query text
  • top_k (optional): Number of results (default: 5)

Response: Same as multi-partition search

GET /search/partition/{partition}/file/{file_id}

Search within a particular file in a partition.

Query Parameters: Same as partition search Response: Same as other search endpoints


GET /extract/{extract_id}

Retrieve specific document extract (chunk) by ID.

Response: JSON containing extract content and metadata


These endpoints provide full OpenAI API compatibility for seamless integration with existing tools and workflows. For detailed example of openai usage see this section

  • List Available Models
GET /v1/models

List all available RAG models (partitions).

Model Naming Convention:

  • Pattern: openrag-{partition_name} => This model allows to chat specifically with the partition {partition_name}
  • Special model: partition-all (queries entire vector database)
  • Chat Completions
POST /v1/chat/completions

OpenAI-compatible chat completion using RAG pipeline.

Request Body:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_AUTH_TOKEN" \
-d '{
"model": "openrag-{partition_name}",
"messages": [
{
"role": "user",
"content": "Your question here"
}
],
"temperature": 0.7,
"stream": false
}'
  • Text Completions
POST /v1/completions

OpenAI-compatible text completion endpoint.

For indexing multiple files programmatically, you can use this script data_indexer.py utility script in the 📁 utility folder or simply use indexer ui.

from openai import OpenAI, AsyncOpenAI
api_base_url = "http://localhost:8080" # fastapi base url of 'openrag'
base_url = f"{api_base_url}/v1"
auth_key = ... # your api authentification key AUTH_TOKEN in your .env. Is authentification is disabled, use a placeholder like 'sk-1234'
client = OpenAI(api_key=auth_key, base_url=base_url)
your_partition= 'my_partition' # name of your partition
model = f"openrag-{your_partition}"
settings = {
'model': model,
'temperature': 0.3,
'stream': False
}
response = client.chat.completions.create(
**settings,
messages=[
{"role": "user", "content": "What information do you have about...?"}
]
)

The API uses standard HTTP status codes:

  • 200 OK: Successful request
  • 201 Created: Resource created successfully
  • 202 Accepted: Request accepted for processing
  • 204 No Content: Successful deletion
  • 400 Bad Request: Invalid request parameters
  • 404 Not Found: Resource not found
  • 409 Conflict: Resource already exists

Error responses include detailed JSON messages to help with debugging and integration.