Files Index
Query and analyze Index of Files using Cosmos.
About Index
Index are sets of documents that will be processed and analyzed together, allowing to ask questions and retrieve informations contained in the Index files.
They are referred through an unique index_uuid
and a name
of type string. You can find all the details in API Reference , but for a quick overview, these are the Index properties:
Index attributes | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index | "your-index-uuid" |
name | Name of the index | "Financial Reports 2023" |
status | Index is active or in countdown (scheduled for deletion) | active |
vectorized | File contents are embedded and ready for querying | false |
created_at | Creation timestamp | "2024-11-15T15:03:00.219676+00:00 |
updated_at | Update timestamp | "2024-11-15T15:03:00.219681+00:00 |
expires_at | Expiration date of the index, if scheduled for deletion | None |
files | Files linked to the the index, and storage. | "3 files: 147086 bytes" |
Storage Limits
⚠️ Warning:
There is a total storage limit that is set for the total of files across all index. Once reached, no new files can be uploaded to existing index or new index created.
Existing index can still be queried, deleted, or some files removed from them to free space. You can also manage storage through the Dashboard .
Supported File Formats
- Documents:
.pdf
,.docx
,.doc
,.odt
,.txt
,.md
- Spreadsheets:
.xlsx
,.xls
,.ods
,.csv
- Presentations:
.pptx
,.ppt
,.odp
- Notebooks:
.ipynb
Step-by-step Tutorial
In this guide:
Section 1
: Prerequisites.Section 2
: Setup Cosmos Python Client.Section 3
: Index operations. Parameters and examples.
Section 3.1
: New Index + details: Create index and fetch index details.Section 3.2
: Modify Index: Rename an index, add files, delete files.Section 3.3
: Index Contents: Embedding & Querying: Embed an index, ask index.Section 3.4
: Index Management: List all index, delete index, restore index.
1. Prerequisites
Before you begin, ensure you have:
- An active CosmosAPI account
- API key from the API keys dashboard
2. Setup Cosmos Python Client
Using Python Cosmos client you can perform the API requests in a convenient way.
2.1. Install Cosmos Python Client:
Get the Cosmos Python client through PIP:
pip install delos-cosmos
2.2. Authenticate Requests:
Initialize the client with your API key:
from cosmos import CosmosClient cosmos_client = CosmosClient(api_key=your-cosmos-api-key)
2.3. Call API:
You can start invoking any Cosmos endpoints. For example, let's try the /health
endpoint to check the validity of your API key and the availability of the client services:
response = cosmos_client.status_health() print(response)
3. Index Operations
Index name a group of documents that are analyzed and processed together. They may be concerning the same topic, or share a common structure.
When asking a question to the Index, the Model will process these documents together, and retrieve the most relevant information in order to answer the question.
These are the Index requests available in Python cosmos_client:
Group | Client method | Used for |
---|---|---|
Index Management | .files_index_create | Create a new Index |
.files_index_delete | Delete a specific Index | |
.files_index_restore | Restore a specific deleted Index | |
-------------------- | ------------------------------- | ---------------------------------------- |
Index Contents | .files_index_files_add | Add files to an existing Index |
.files_index_files_delete | Delete files from an existing Index | |
.files_index_rename | Rename an existing Index | |
-------------------- | ----------------------------- | ---------------------------------------- |
Index Details | .files_index_list | List all Index |
.files_index_details | See the details of an Index | |
-------------------- | ------------------------------ | ---------------------------------------- |
Index Querying | .files_index_embed | Embed the files in the specified Index |
.files_index_ask | Query the files in a specific Index |
Let's create a Index to work with several files. We will first send the files to create the Index, then process them, and then the Index will be ready for our queries.
3.1. NEW INDEX + DETAILS
1. Create new Index:
In order to create a new index, which will be shared to your team:
response = cosmos_client.files_index_create( filepaths=['/path/to/document1.pdf', '/path/to/document2.docx'], name="my_new_index", read_images=False) print(response)
The Index creation performs an inner call to the /files_parse
service, in order to read all files contents. Here are all the parameters for the index creation request:
Parameter | Description | Example |
---|---|---|
name | Name for new index. | TestFiles |
filepaths | List of paths to the files to be processed. | [/path/to/file1.pdf , /path/to/file2.docx ] |
read_images (optional) | Whether to scan images or not (default). | False |
The parameter read_images
allows to enable or not the scanning of the images and graphic elements while processing the file contents. By default it is disabled (read_images=False
). This option consumes more since it requires a more complex processing.
Expected response:
{ "request_id": "3b096969-50cb-4325-9313-d49e821090c6", "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597", "status_code": 200, "status": "success", "message": "Index created successfully.", "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a", "name": "TestFiles", "created_at": "2025-02-19T14:49:50.872575+00:00", "updated_at": "2025-02-19T14:49:50.872599+00:00", "vectorized": false, "status": "active", "files": { "9a32bca9a2ddcdb97535aa38": { "file_hash": "9a32bca9a2ddcdb97535aa38", "filename": "financial_reports_2023.docx", "size": 577590 }, "3b09696950cb4325931003fa3": { "file_hash": "3b09696950cb4325931003fa3", "filename": "results_2024_q1.pdf", "size": 577590 } } }, "error": null, "timestamp": "2025-02-19T12:54:08.628050Z", "cost": 0.0075 }
The index_uuid
received when creating the Index allows to perform any further operation on this Index.
2. See Index details:
You can retrieve the details of your created index by providing the index_uuid
:
response = cosmos_client.files_index_details(index_uuid=index_uuid) print(response)
The index details request only requires the UUID of the index to query:
Parameter | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index. | 1111-1111111-1111-1111 |
The response will be similar to the following:
{ "request_id": "3b096969-50cb-4325-9313-d49e821090c6", "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597", "status_code": 200, "status": "success", "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` (named `TestFiles`) details retrieved.", "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a", "name": "TestFiles", "vectorized": false, "status": "active", "expires_at": null, "created_at": "2025-02-19T14:49:50.872575+00:00", "updated_at": "2025-02-19T14:58:00.806729+00:00", "storage": { "size_bytes": 147086, "size_mb": 0.01, "num_files": 2 }, "files": [ { "file_hash": "9a32bca9a2ddcdb97535aa38", "filename": "financial_reports_2023.docx", "size": 577590 }, { "file_hash": "3b09696950cb4325931003fa3", "filename": "results_2024_q1.pdf", "size": 577590 } ] }, "error": null, "timestamp": "2025-02-19T14:58:00.808005Z", "cost": 0.0 }
The details come handy to make sure which files are in every index, and the storage details associated to them:
The
index_uuid
allows to perform operations on this index, such as adding or removing files instantly from the index. It is an unique UUID that cannot be modified or customized, and will be unique and constant for each Index.The
name
is modifiable, but it is also expected to be unique inside an organization.The
vectorized
status shows the Index contents readiness for being queried.Index
status
shows whether the Index isactive
or scheduled for deletion (countdown
).The
expiry_date
is set from 2h from current time, at the moment the deletion of the Index is requested (and therefore is only non-None for Index withstatus=countdown
).The
storage
field shows the number of files and size occupied in this index. Remember Storage is limited to 100 MB per organization (across all index). Your organization storage is also managable through the Dashboard.
3.2. MODIFY INDEX
You may want to rename an index or modify the set of files that an index contains.
1. Rename Index
You can rename an index by using the /rename_index
endpoint:
response = cosmos_client.files_index_rename( new_index_uuid, "New name", ) print(response)
The index rename request expects:
Parameter | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index. | 1111-1111111-1111-1111 |
name | New name for new index. | Financial Reports 2023-24 |
Expected response:
{ "request_id": "3b096969-50cb-4325-9313-d49e821090c6", "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597", "status_code": 200, "status": "success", "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` name changed from 'TestFiles' to 'Financial Reports 2023-24'.", "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a", "old_name": "TestFiles", "new_name": "Financial Reports 2023-24", "status": "active", "updated_at": "2025-02-19T15:06:16.575203+00:00" }, "error": null, "timestamp": "2025-02-19T12:54:08.628050Z", "cost": 0.0 }
2. Add new Files to Index
For adding new files, use the /add_files_to_index
endpoint. You can choose to enable the read_images
parameter if graphic contents are relevant to your processing (by default it is disabled):
response = cosmos_client.files_index_add_files( index_uuid=your-index-uuid, filepaths=["files=path/to/document3.pdf", \\ "files=path/to/document4.txt"], read_images=True, ) print(response)
The request to add files expects:
Parameter | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index. | 1111-1111111-1111-1111 |
filepaths | List of paths to the files to be processed. | [/path/to/file3.docx , /path/to/file4.pdf ] |
read_images (optional) | Whether to scan images or not (default). | False |
Expected response:
{ "request_id": "2093f52f-51ac-41ea-8b78-02367906906a", "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e", "status_code": 200, "status": "success", "message": "Files added and processed successfully.", "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a", "new_files": ["ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f"], "processed_chunks": 9 }, "error": null, "timestamp": "2025-02-19T15:07:25.479522Z", "cost": 0.0183 }
3. Delete Files from Index
Or to delete one or more files from the index, by providing the filehash (it can be retrieved from the index details):
files_hashes = ["ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f"] response = cosmos_client.files_index_delete_files( index_uuid=your-index-uuid, files_hashes=files_hashes ) print(response)
To delete files, the parameters are:
Parameter | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index. | 1111-1111111-1111-1111 |
files_hashes | List of hashes of files to be removed. | [ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f ] |
Expected response:
{ "request_id": "2093f52f-51ac-41ea-8b78-02367906906a", "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e", "status_code": 200, "status": "success", "message": "File(s) deleted from index successfully", "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a", "remaining_files": ["9a32bca9a2ddcdb97535aa38"] }, "error": null, "timestamp": "2025-02-19T15:07:25.479522Z", "cost": 0.0 }
You can request the index details again in order to make sure the files were correctly added or removed (see section 3.1 for Index details).
Also, you will be able to see the storage that those files in the index take in your quota, which is limited to 100 MB per organization (across all index).
3.3. EMBEDDING & QUERYING
1. Embed Index
In order to perform vectorized searches, you need to embed the index. This operation will calculate the embeddings of files belonging to the index:
response = cosmos_client.files_index_embed(index_uuid=your-index-uuid) print(response)
To embed index, the parameters are:
Parameter | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index. | 1111-1111111-1111-1111 |
Expected response:
{ "request_id": "9bc55dfd-fe35-45ac-b639-fa4296f29058", "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e", "status_code": 200, "status": "success", "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` successfully vectorized.", "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a" }, "error": null, "timestamp": "2025-02-19T15:07:25.479522Z", "cost": 0.000139 }
2. Query Index
Now that the index is vectorized, you can ask the index. The index will return the answer to the question based on the embeddings of files belonging to the index:
response = cosmos_client.files_index_ask( index_uuid=new_index_uuid, question="Where is located the bridge these articles mention?", output_language="en", )
We can specify one or more filehashes to limit the files the Index is going to analyze in order to answer to your question:
response = cosmos_client.files_index_ask( index_uuid=new_index_uuid, question="Where is located the bridge these articles mention?", output_language="en", active_files_hashes = [ "26a79e1e7233ef12c763c4a0e6b3221ddba54357d4", "f66d2345d7c64ea7e4428f87d537927b567d8eba00" ] )
To ask index, the parameters are:
Parameter | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index. | 1111-1111111-1111-1111 |
question | Question on the Index files. | Where is located the bridge these articles mention? |
output_language (optional) | Language for the response (default: the same used in question). | en |
active_files_hashes (optional) | List of files within this Index to access for this question, or all (default) or none | all |
The responses will contain and answer to the question, as well as the sources of index file and page that contain the information to base the answer to the question.
Expected response:
{ "status": "success", "message": "Query processed successfully", "data": { "answer": "The article discusses the bridge of Brooklyn 'FILE:1 PAGE:2'.", "sources": { "1": "efcb3858b45a3edb306c9d3457820e41ec78e3492f833ad189254c02886df260" } } }
3.4. INDEX MANAGEMENT
1. List All Index
You can list all the Index in your team that are in active
or countdown
(scheduled deletion) status:
response = cosmos_client.files_index_list() print(response)
(This function does not receive any parameter).
Expected response:
{ "request_id": "2093f52f-51ac-41ea-8b78-02367906906a", "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e", "status_code": 200, "status": "success", "message": "Retrieved 2 index.", "data": { "index": [ { "index_uuid": "your-index-uuid", "name": "my_new_index", "status": "active", "vectorized": true, "created_at": "2024-11-15T15:03:00.219676+00:00", "updated_at": "2024-11-15T15:03:00.219681+00:00", "expires_at": "2024-11-16T15:03:00.219682+00:00", "storage": { "size_bytes": 147086, "size_mb": 0.01, "num_files": 2 } }, { "index_uuid": "another-index-uuid", "name": "2024 Sales results", "status": "active", "vectorized": false, "created_at": "2024-11-15T15:03:00.219676+00:00", "updated_at": "2024-11-15T15:03:00.219681+00:00", "expires_at": "2024-11-16T15:03:00.219682+00:00", "storage": { "size_bytes": 577590, "size_mb": 0.55, "num_files": 3 } } ], "total_storage": { "bytes": 289172, "mb": 0.028, "limit_mb": 100, "usage_percentage": 2.8 } }, "error": null, "timestamp": "2025-02-19T15:07:25.479522Z", "cost": 0.0 }
2. Delete an Index (⚠️ *warning*: delayed opperation
)
You can delete an index if you no longer need to access it. Unlike the other endpoints, which perform the requests live, this endpoint provides a security marge
to be effective. It will delete the index after 2h, giving time to reverse the operation in case of errors. Index that are marked for deletion receive the status "countdown" once the expiry date is set, instead of the "active" status.
response = cosmos_client.files_index_delete(new_index_uuid) print(response)
To delete an index, the parameters are:
Parameter | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index. | 1111-1111111-1111-1111 |
Expected response:
{ "request_id": "2093f52f-51ac-41ea-8b78-02367906906a", "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e", "status_code": 200, "status": "success", "message": "Index marked for deletion", "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a", "expires_at": "2025-02-19T17:41:52.180933+00:00" }, "error": null, "timestamp": "2025-02-19T15:41:52.180933Z", "cost": 0.0 }
3. Restore an Index scheduled deletion
After an index is marked for deletion, but before the expiry date, you can restore it. This will allow you to revert the operation in case of errors. It will restore the "active" status and cancel the scheduled deletion. This is only possible within the 2h timelapse (while index status=countdown
).
response = cosmos_client.files_index_restore(new_index_uuid) print(response)
To restore an index, the parameters are:
Parameter | Description | Example |
---|---|---|
index_uuid | Unique identifier for the index. | 1111-1111111-1111-1111 |
Expected response:
{ "request_id": "2093f52f-51ac-41ea-8b78-02367906906a", "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e", "status_code": 200, "status": "success", "message": "Index restored successfully", "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a", "status": "active" }, "error": null, "timestamp": "2025-02-19T15:07:25.479522Z", "cost": 0.0 }