Files Index

Query and analyze Index of Files using Cosmos.

About Index

Index are sets of documents that will be processed and analyzed together, allowing to ask questions and retrieve informations contained in the Index files.

They are referred through an unique index_uuid and a name of type string. You can find all the details in API Reference , but for a quick overview, these are the Index properties:

Index attributesDescriptionExample
index_uuidUnique identifier for the index"your-index-uuid"
nameName of the index"Financial Reports 2023"
statusIndex is active or in countdown (scheduled for deletion)active
vectorizedFile contents are embedded and ready for queryingfalse
created_atCreation timestamp"2024-11-15T15:03:00.219676+00:00
updated_atUpdate timestamp"2024-11-15T15:03:00.219681+00:00
expires_atExpiration date of the index, if scheduled for deletionNone
filesFiles linked to the the index, and storage."3 files: 147086 bytes"

Storage Limits

⚠️ Warning:

There is a total storage limit that is set for the total of files across all index. Once reached, no new files can be uploaded to existing index or new index created.

Existing index can still be queried, deleted, or some files removed from them to free space. You can also manage storage through the Dashboard .

Supported File Formats

  • Documents: .pdf, .docx, .doc, .odt, .txt, .md
  • Spreadsheets: .xlsx, .xls, .ods, .csv
  • Presentations: .pptx, .ppt, .odp
  • Notebooks: .ipynb

Step-by-step Tutorial

In this guide:

  • Section 1: Prerequisites.
  • Section 2: Setup Cosmos Python Client.
  • Section 3: Index operations. Parameters and examples.
    • Section 3.1: New Index + details: Create index and fetch index details.
    • Section 3.2: Modify Index: Rename an index, add files, delete files.
    • Section 3.3: Index Contents: Embedding & Querying: Embed an index, ask index.
    • Section 3.4: Index Management: List all index, delete index, restore index.

1. Prerequisites

Before you begin, ensure you have:

2. Setup Cosmos Python Client

Using Python Cosmos client you can perform the API requests in a convenient way.

2.1. Install Cosmos Python Client:

Get the Cosmos Python client through PIP:

pip install delos-cosmos

2.2. Authenticate Requests:

Initialize the client with your API key:

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key=your-cosmos-api-key)

2.3. Call API:

You can start invoking any Cosmos endpoints. For example, let's try the /health endpoint to check the validity of your API key and the availability of the client services:

response = cosmos_client.status_health()
print(response)

3. Index Operations

          

Index name a group of documents that are analyzed and processed together. They may be concerning the same topic, or share a common structure.

When asking a question to the Index, the Model will process these documents together, and retrieve the most relevant information in order to answer the question.

These are the Index requests available in Python cosmos_client:

GroupClient methodUsed for
Index Management.files_index_createCreate a new Index
.files_index_deleteDelete a specific Index
.files_index_restoreRestore a specific deleted Index
-------------------------------------------------------------------------------------------
Index Contents.files_index_files_addAdd files to an existing Index
.files_index_files_deleteDelete files from an existing Index
.files_index_renameRename an existing Index
-----------------------------------------------------------------------------------------
Index Details.files_index_listList all Index
.files_index_detailsSee the details of an Index
------------------------------------------------------------------------------------------
Index Querying.files_index_embedEmbed the files in the specified Index
.files_index_askQuery the files in a specific Index

Let's create a Index to work with several files. We will first send the files to create the Index, then process them, and then the Index will be ready for our queries.

          

3.1. NEW INDEX + DETAILS

1. Create new Index:

In order to create a new index, which will be shared to your team:

response = cosmos_client.files_index_create(
    filepaths=['/path/to/document1.pdf', '/path/to/document2.docx'],
    name="my_new_index",
    read_images=False)
print(response)

The Index creation performs an inner call to the /files_parse service, in order to read all files contents. Here are all the parameters for the index creation request:

ParameterDescriptionExample
nameName for new index.TestFiles
filepathsList of paths to the files to be processed.[/path/to/file1.pdf, /path/to/file2.docx]
read_images (optional)Whether to scan images or not (default).False

The parameter read_images allows to enable or not the scanning of the images and graphic elements while processing the file contents. By default it is disabled (read_images=False). This option consumes more since it requires a more complex processing.

Expected response:

{
  "request_id": "3b096969-50cb-4325-9313-d49e821090c6",
  "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597",
  "status_code": 200,
  "status": "success",
  "message": "Index created successfully.",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "name": "TestFiles",
    "created_at": "2025-02-19T14:49:50.872575+00:00",
    "updated_at": "2025-02-19T14:49:50.872599+00:00",
    "vectorized": false,
    "status": "active",
    "files": {
      "9a32bca9a2ddcdb97535aa38": {
        "file_hash": "9a32bca9a2ddcdb97535aa38",
        "filename": "financial_reports_2023.docx",
        "size": 577590
      },
      "3b09696950cb4325931003fa3": {
        "file_hash": "3b09696950cb4325931003fa3",
        "filename": "results_2024_q1.pdf",
        "size": 577590
      }
    }
  },
  "error": null,
  "timestamp": "2025-02-19T12:54:08.628050Z",
  "cost": 0.0075
}

The index_uuid received when creating the Index allows to perform any further operation on this Index.

          

2. See Index details:

You can retrieve the details of your created index by providing the index_uuid:

response = cosmos_client.files_index_details(index_uuid=index_uuid)
print(response)

The index details request only requires the UUID of the index to query:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111

The response will be similar to the following:

{
  "request_id": "3b096969-50cb-4325-9313-d49e821090c6",
  "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597",
  "status_code": 200,
  "status": "success",
  "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` (named `TestFiles`) details retrieved.",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "name": "TestFiles",
    "vectorized": false,
    "status": "active",
    "expires_at": null,
    "created_at": "2025-02-19T14:49:50.872575+00:00",
    "updated_at": "2025-02-19T14:58:00.806729+00:00",
    "storage": {
      "size_bytes": 147086,
      "size_mb": 0.01,
      "num_files": 2
    },
    "files": [
      {
        "file_hash": "9a32bca9a2ddcdb97535aa38",
        "filename": "financial_reports_2023.docx",
        "size": 577590
      },
      {
        "file_hash": "3b09696950cb4325931003fa3",
        "filename": "results_2024_q1.pdf",
        "size": 577590
      }
    ]
  },
  "error": null,
  "timestamp": "2025-02-19T14:58:00.808005Z",
  "cost": 0.0
}

The details come handy to make sure which files are in every index, and the storage details associated to them:

  • The index_uuid allows to perform operations on this index, such as adding or removing files instantly from the index. It is an unique UUID that cannot be modified or customized, and will be unique and constant for each Index.

  • The name is modifiable, but it is also expected to be unique inside an organization.

  • The vectorized status shows the Index contents readiness for being queried.

  • Index status shows whether the Index is active or scheduled for deletion (countdown).

  • The expiry_date is set from 2h from current time, at the moment the deletion of the Index is requested (and therefore is only non-None for Index with status=countdown).

  • The storage field shows the number of files and size occupied in this index. Remember Storage is limited to 100 MB per organization (across all index). Your organization storage is also managable through the Dashboard.

          

3.2. MODIFY INDEX

You may want to rename an index or modify the set of files that an index contains.

1. Rename Index

You can rename an index by using the /rename_index endpoint:

response = cosmos_client.files_index_rename(
    new_index_uuid,
    "New name",
)
print(response)

The index rename request expects:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111
nameNew name for new index.Financial Reports 2023-24

Expected response:

{
  "request_id": "3b096969-50cb-4325-9313-d49e821090c6",
  "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597",
  "status_code": 200,
  "status": "success",
  "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` name changed from 'TestFiles' to 'Financial Reports 2023-24'.",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "old_name": "TestFiles",
    "new_name": "Financial Reports 2023-24",
    "status": "active",
    "updated_at": "2025-02-19T15:06:16.575203+00:00"
  },
  "error": null,
  "timestamp": "2025-02-19T12:54:08.628050Z",
  "cost": 0.0
}

          

2. Add new Files to Index

For adding new files, use the /add_files_to_index endpoint. You can choose to enable the read_images parameter if graphic contents are relevant to your processing (by default it is disabled):


response = cosmos_client.files_index_add_files(
    index_uuid=your-index-uuid,
    filepaths=["files=path/to/document3.pdf", \\
               "files=path/to/document4.txt"],
    read_images=True,
)
print(response)

The request to add files expects:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111
filepathsList of paths to the files to be processed.[/path/to/file3.docx, /path/to/file4.pdf]
read_images (optional)Whether to scan images or not (default).False

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Files added and processed successfully.",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "new_files": ["ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f"],
    "processed_chunks": 9
  },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.0183
}

          

3. Delete Files from Index

Or to delete one or more files from the index, by providing the filehash (it can be retrieved from the index details):

files_hashes = ["ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f"]

response = cosmos_client.files_index_delete_files(
    index_uuid=your-index-uuid,
    files_hashes=files_hashes
)
print(response)

To delete files, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111
files_hashesList of hashes of files to be removed.[ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f]

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "File(s) deleted from index successfully",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "remaining_files": ["9a32bca9a2ddcdb97535aa38"]
  },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.0
}

You can request the index details again in order to make sure the files were correctly added or removed (see section 3.1 for Index details).

Also, you will be able to see the storage that those files in the index take in your quota, which is limited to 100 MB per organization (across all index).

          

3.3. EMBEDDING & QUERYING

1. Embed Index

In order to perform vectorized searches, you need to embed the index. This operation will calculate the embeddings of files belonging to the index:

response = cosmos_client.files_index_embed(index_uuid=your-index-uuid)
print(response)

To embed index, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111

Expected response:

{
  "request_id": "9bc55dfd-fe35-45ac-b639-fa4296f29058",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` successfully vectorized.",
  "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a" },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.000139
}

          

2. Query Index

Now that the index is vectorized, you can ask the index. The index will return the answer to the question based on the embeddings of files belonging to the index:

response = cosmos_client.files_index_ask(
    index_uuid=new_index_uuid,
    question="Where is located the bridge these articles mention?",
    output_language="en",
)

We can specify one or more filehashes to limit the files the Index is going to analyze in order to answer to your question:

response = cosmos_client.files_index_ask(
    index_uuid=new_index_uuid,
    question="Where is located the bridge these articles mention?",
    output_language="en",
    active_files_hashes = [
      "26a79e1e7233ef12c763c4a0e6b3221ddba54357d4",
      "f66d2345d7c64ea7e4428f87d537927b567d8eba00"
    ]
)

To ask index, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111
questionQuestion on the Index files.Where is located the bridge these articles mention?
output_language (optional)Language for the response (default: the same used in question).en
active_files_hashes (optional)List of files within this Index to access for this question, or all (default) or noneall

The responses will contain and answer to the question, as well as the sources of index file and page that contain the information to base the answer to the question.

Expected response:

{
  "status": "success",
  "message": "Query processed successfully",
  "data": {
    "answer": "The article discusses the bridge of Brooklyn 'FILE:1 PAGE:2'.",
    "sources": {
      "1": "efcb3858b45a3edb306c9d3457820e41ec78e3492f833ad189254c02886df260"
    }
  }
}

          

3.4. INDEX MANAGEMENT

1. List All Index

You can list all the Index in your team that are in active or countdown (scheduled deletion) status:

response = cosmos_client.files_index_list()
print(response)

(This function does not receive any parameter).

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Retrieved 2 index.",
  "data": {
    "index": [
      {
        "index_uuid": "your-index-uuid",
        "name": "my_new_index",
        "status": "active",
        "vectorized": true,
        "created_at": "2024-11-15T15:03:00.219676+00:00",
        "updated_at": "2024-11-15T15:03:00.219681+00:00",
        "expires_at": "2024-11-16T15:03:00.219682+00:00",
        "storage": {
          "size_bytes": 147086,
          "size_mb": 0.01,
          "num_files": 2
        }
      },
      {
        "index_uuid": "another-index-uuid",
        "name": "2024 Sales results",
        "status": "active",
        "vectorized": false,
        "created_at": "2024-11-15T15:03:00.219676+00:00",
        "updated_at": "2024-11-15T15:03:00.219681+00:00",
        "expires_at": "2024-11-16T15:03:00.219682+00:00",
        "storage": {
          "size_bytes": 577590,
          "size_mb": 0.55,
          "num_files": 3
        }
      }
    ],
    "total_storage": {
      "bytes": 289172,
      "mb": 0.028,
      "limit_mb": 100,
      "usage_percentage": 2.8
    }
  },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.0
}

          

2. Delete an Index (⚠️ *warning*: delayed opperation)

You can delete an index if you no longer need to access it. Unlike the other endpoints, which perform the requests live, this endpoint provides a security marge to be effective. It will delete the index after 2h, giving time to reverse the operation in case of errors. Index that are marked for deletion receive the status "countdown" once the expiry date is set, instead of the "active" status.

response = cosmos_client.files_index_delete(new_index_uuid)
print(response)

To delete an index, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Index marked for deletion",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "expires_at": "2025-02-19T17:41:52.180933+00:00"
  },
  "error": null,
  "timestamp": "2025-02-19T15:41:52.180933Z",
  "cost": 0.0
}

          

3. Restore an Index scheduled deletion

After an index is marked for deletion, but before the expiry date, you can restore it. This will allow you to revert the operation in case of errors. It will restore the "active" status and cancel the scheduled deletion. This is only possible within the 2h timelapse (while index status=countdown).

response = cosmos_client.files_index_restore(new_index_uuid)
print(response)

To restore an index, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Index restored successfully",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "status": "active"
  },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.0
}