Chat with LLM

Chat with state-of-the-art LLM using Cosmos.

About LLM Chat

The LLM Chat Service allows for conversational interactions with the AI, providing a user-friendly interface for message exchanges and responses. It offers several state-of-the-art models, such as GPT-4o, Llama-3, and more.

Large Language Model	Developer
gpt-3.5 (`legacy`)	OpenAI
gpt-4o	OpenAI
gpt-4o-mini	OpenAI
command-r	Cohere
command-r-plus	Cohere
llama-3-70b-instruct	Meta
mistral-large	Mistral AI
mistral-small	Mistral AI
claude-3.5-sonnet	Anthropic
claude-3-haiku	Anthropic

Key features of the CosmosPlatform Chat Service:

Natural language interaction
Context-aware responses
Low-latency AI communication
Transparent cost tracking

Step-by-step Tutorial

In this guide:
Section 1: Prerequisites.
Section 2: Setup Cosmos Python Client.
Section 3: Chat with LLM. Parameters and examples.
Section 4: JSON mode and predefined output structures.
Section 5: Chat streaming.
Section 6: Handle Errors.

1. Prerequisites

Before you begin, ensure you have:

An active CosmosPlatform account
API key from the API keys dashboard

2. Setup Cosmos Python Client

Using Python Cosmos client you can perform the API requests in a convenient way.

2.1. Install Cosmos Python Client:

Get the Cosmos Python client through PIP:

pip install delos-cosmos

2.2. Authenticate Requests:

Initialize the client with your API key:

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key=your-cosmos-api-key)

2.3. Call API:

You can start invoking any Cosmos endpoints. For example, let's try the /health endpoint to check the validity of your API key and the availability of the client services:

response = cosmos_client.status_health()
print(response)

3. Chat with LLM

Here is an example of a LLM chat request using Python client (/llm/chat endpoint):

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key="your-cosmos-api-key")
response = cosmos_client.llm_chat(
              text="Hello, Cosmos!",
              model="mistral-small"
            )
print(response)

A successful response will return the AI's reply:

{
  "request_id": "4fa2fb9d-d8ac-4995-8dd9-836323f11148",
  "response_id": "48b4a03a-e406-45e7-bf06-0d44b68f48af",
  "status_code": 200,
  "status": "success",
  "message": "Chat response received.",
  "data": {
    "answer": "Hello! How can I assist you today?"
  },
  "timestamp": "2024-11-20T15:21:40.127776Z",
  "cost": "0.0023"
}

3.1. Parameters:

Parameter	Description	Example
`text`	The text to send to LLM.	"What is the capital of France?"
`model`	Large Language Model to use.	`mistral-large`, `gpt-4o` ...
`messages` (optional)	List of previous messages.	`[{"role":"assistant", "content":"Welcome! I am Cosmos."}]`
`temperature` (optional)	Randomness of the response.	`0.7`
`response_format` (optional)	Choice to request JSON-parsed response.	`{"type":"json_object"}` or `None`

The model allows to select the Large Language Model to chat with.
The temperature is a float number (in between 0 and 1) to control the randomness of LLM responses. Default value is 0.7. The lower it is, the more deterministic the responses will be.

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key="")
response = cosmos_client.llm_chat(
                text="Hello, Cosmos!",
                model="gpt-4o",
                temperature=0.7
              )
print(response)

4. Specify output format

The response_format allows to require json-parsed responses. For example:

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key="")
response = cosmos_client.llm_chat(
              text="What is the capital city and GDP of Germany?Reply in a JSON",
              model="gpt-4o",
              response_format={"type":"json_object"}
            )
print(response)

The AI's response will follow the JSON format. Please note that the more precise you are in your instructions and requirements, the better the response will align with your expectations. The response may be similar to the following:

{
  "request_id": "4fa2fb9d-d8ac-4995-8dd9-836323f11148",
  "response_id": "48b4a03a-e406-45e7-bf06-0d44b68f48af",
  "status_code": 200,
  "status": "success",
  "message": "Chat response received.",
  "data": {
    "answer": {
      "capital": "Berlin",
      "GDP": "approximately 4.2 trillion USD"
    }
  },
  "timestamp": "2024-11-20T15:21:40.127776Z",
  "cost": "0.0023"
}

5. Streaming

Chat streaming is available in a separated endpoint, named /llm/chat_stream. Parameters are received and read in the same format than in /llm/chat endpoint, but now the response will be handled through a Streaming Response, that will receive word by word the LLM response. Here is an example of a chat streaming call using Python client:

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key="")
response = cosmos_client.llm_chat_stream(
              text="Hello, Cosmos!",
              model="mistral-small"
            )
print(response)

A successful response will return the AI's reply, streamed word by word:

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0: ""\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:"Hello"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:"!"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:" How"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:" can"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:" I"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:" assist"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:" you"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:" today"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {'content': '0:"?"\n\n'}}]}

data: {'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {}, 'finish_reason': 'stop'}], 'request_id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'response_id': 'dc1044e2-a8e8-44d5-a94d-307b2fe8c42e', 'status_code': 200, 'status': 'success', 'message': 'Chat response received.\n(No previous `messages` have been read.)', 'timestamp': '2025-02-19T09:05:10.063001+00:00', 'cost': '0.00023'}

data: [DONE]

This streaming format is harder for humans to read, but it allows printing the response word by word in the fastest way. The parsed and cleaned full response extracted from previous chunks is:

"Hello! How can I assist you today?"

data: {
  'id': 'ff4894ce-036e-412e-87c2-d40680863f31',
  'choices': [{'delta': {}, 'finish_reason': 'stop'}],
  'request_id': 'ff4894ce-036e-412e-87c2-d40680863f31',
  'response_id': 'dc1044e2-a8e8-44d5-a94d-307b2fe8c42e',
  'status_code': 200,
  'status': 'success',
  'message': 'Chat response received.\n(No previous `messages` have been read.)',
  'timestamp': '2025-02-19T09:05:10.063001+00:00',
  'cost': '0.00023'
}

data: [DONE]

5.1. Parameters:

The parameters to provide for the /llm/chat_stream are the same as for the /llm/chat presented in previous sections.

Parameter	Description	Example
`text`	The text to send to LLM.	"What is the capital of France?"
`model`	Large Language Model to use.	`mistral-large`, `gpt-4o` ...
`messages` (optional)	List of previous messages.	`[{"role":"assistant", "content":"Welcome! I am Cosmos."}]`
`temperature` (optional)	Randomness of the response.	`0.7`
`response_format` (optional)	Choice to request JSON-parsed response.	`{"type":"json_object"}` or `None`

The model is an allows to select the Large Language Model to chat with.
The temperature, which is a float number (in between 0 and 1) to control the randomness of LLM responses. Default value is 0.7. The lower it is, the more deterministic the responses will be.

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key="")
response = cosmos_client.llm_chat_stream(
              text="Hello, Cosmos!",
              model="gpt-4o-mini",
              temperature=0.7
            )
print(response)

The response_format allows to require json-parsed responses. For example:

from cosmos import CosmosClient

cosmos_client = CosmosClient(api_key="")
response = cosmos_client.llm_chat_stream(
              text="What is the capital city and GDP of Germany? Reply in a JSON",
              model="gpt-4o",
              response_format={"type":"json_object"}
            )
print(response)

⚠️ Warning:
When using the response_format='{"type":"json_object"}', the Cosmos API does request the LLM a JSON response.
However, for the sake of speed, in this streaming mode, the Cosmos API does not process the LLM response to ensure it is perfectly parseable. If ensuring parseability is a requirement in your pipeline, we recommend using the LLM Chat endpoint with the response_format parameter.

6. Handle Errors

Common errors include:

Missing API key
No text provided
No model provided

Example error response:

{
  "status_code": 422,
  "status": "error",
  "message": "Validation error",
  "error": {
    "error_code": "422",
    "error_message": "Validation failed for the input fields.",
    "details": "[{'loc': ('header', 'api_key'), 'msg': 'Field required', 'type': 'missing'}]"
  }
}