Optional apiThe version of the API functions. Part of the path.
Optional authOptional cacheOptional callbackUse callbacks instead
Optional callbacksOptional concurrencyUse maxConcurrency instead
Optional endpointHostname for the API call
Optional locationRegion where the LLM is stored
Optional maxThe maximum number of concurrent calls that can be made.
Defaults to Infinity, which means no limit.
Optional maxMaximum number of tokens to generate in the completion.
Optional maxThe maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.
Optional metadataOptional modelModel to use
Optional onCustom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.
Optional tagsOptional temperatureSampling temperature to use
Optional topKTop-k changes how the model selects tokens for output.
A top-k of 1 means the selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature).
Optional topPTop-p changes how the model selects tokens for output.
Tokens are selected from most probable to least until the sum of their probabilities equals the top-p value.
For example, if tokens A, B, and C have a probability of .3, .2, and .1 and the top-p value is .5, then the model will select either A or B as the next token (using temperature).
Optional verboseGenerated using TypeDoc
Interface representing the input to the Google Vertex AI model.