POST
/
chat
/
completions
curl --request POST \
  --url https://app.empower.dev/api/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "n": 123,
  "frequency_penalty": 0,
  "ignore_eos": true,
  "max_tokens": 123,
  "messages": [
    {
      "content": "<string>",
      "role": "<string>"
    }
  ],
  "model": "IlyaGusev/saiga_mistral_7b_lora",
  "presence_penalty": 0,
  "stream": true,
  "temperature": 1,
  "top_p": 123
}'
{
  "choices": [
    {
      "finish_reason": "<string>",
      "index": 123,
      "logprobs": [
        123
      ],
      "message": {
        "content": "<string>",
        "role": "<string>"
      }
    }
  ],
  "created": 123,
  "id": "<string>",
  "model": "<string>",
  "usage": {
    "completion_tokens": 123,
    "prompt_tokens": 123,
    "total_tokens": 123
  }
}

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
n
integer
default: 1

How many completions to generate for each prompt.

frequency_penalty
number
default: 0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

ignore_eos
boolean
default: false

Whether to allow the model to continue generating tokens after the stop sequence. If true, the model will continue generating tokens until it reaches the max_tokens limit.

max_tokens
integer
default: 256

The maximum number of tokens that can be generated in the chat completion.

messages
object[]
required

The prompt(s) to generate completions for, encoded as array of strings or a string.

model
string
required

Name of the model to be used, can either be the name of a LoRA or a base model.

presence_penalty
number
default: 0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

stream
boolean
default: false

Whether to stream back partial progress.

temperature
number
default: 1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

top_p
number
default: 1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

Response

200 - application/json
choices
object[]
required

A list of chat completion choices. Can of more than one if the n parameter is greater than 1.

created
integer | null
id
string
required

The id of the request.

model
string
required

The model used to generate the completions.

usage
object
required

The usage statistics for the completion request.