Embed with LLM

Embed plain text using Cosmos.

About LLM Embedding

The LLM Embed Service allows you to generate dense vector representations of text. These embeddings can be used for various purposes, such as semantic search, text clustering, and machine learning.

Two embedding models are currently supported in Cosmos API.

Embedding Model	Developer	Vector size
ada-v2	OpenAI	1536
text-embedding-3-large	OpenAI	3072

⚠️ Warning:

Notice that all Embedding models provide the vector representation of input text, but they use different algorithms and their embeddings dimensions may vary. That means that results obtained from different models won't be interchangeable in RAG operations or other processes.

Step-by-step Tutorial

In this guide:
Section 1: Prerequisites.
Section 2: Setup Cosmos Python Client.
Section 3: Embed text.
Section 4: Handle Errors.

1. Prerequisites

Before you begin, ensure you have:

An active CosmosPlatform account
API key from the API keys dashboard

2. Setup Cosmos Python Client

Using Python Cosmos client you can perform the API requests in a convenient way.

2.1. Install Cosmos Python Client:

Get the Cosmos Python client through PIP:

pip install delos-cosmos

2.2. Authenticate Requests:

Initialize the client with your API key:

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key=your-cosmos-api-key)

2.3. Call API:

You can start invoking any Cosmos endpoints. For example, let's try the /health endpoint to check the validity of your API key and the availability of the client services:

response = cosmos_client.status_health()
print(response)

3. Embed text

Here is an example of a LLM embedding request using Python client (/llm/embed endpoint):

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key="your-cosmos-api-key")
response = cosmos_client.llm_embed(text="Hello, World!", model="ada-v2")
print(response)

The text that you send is split in blocks, in order to fit the context size of the embedding model. The embeddings are a vector of floating-point numbers, that allow performing similarity research and other operations.

Each block will be identified with an id (0, 1, 2, ...), containing the text that was embedded together. This same blocks-like structure is kept and returned in the embedded_texts inside the embedding response . This is an example of a successful response:

{
  "request_id": "efb4d2f7-6a69-46b6-bb5b-615ae7fe66c7",
  "response_id": "e9e850a3-9b45-4c67-a846-cf88c4640eb9",
  "status_code": 200,
  "status": "success",
  "message": "Text successfully embedded.",
  "data": {
    "embedded_texts": {
      {
        "id": "0",
        "text": "Hello, World!",
        "embedding": [
          -0.041947972,
          -0.010223089,
          -0.011844206,
          ... (more values)
        ],
      }
    }
  }
  ,
  "timestamp": "2024-11-20T15:25:32.290344Z"
}

3.1. Parameters:

The parameters for the web search request are:

Parameter	Description	Example
`text`	The text to be embedded	"What is the capital of France?"
`model`	Choice of the embedding model	`ada-v2` or `text-embeddings-3-large`

4. Handle Errors

Common errors include:

Missing API key
No text provided

Example error response:

{
  "status_code": 400,
  "status": "error",
  "message": "Validation error",
  "error": {
    "error_code": "400",
    "error_message": "No text provided for embedding."
  }
}