What is inference cost and how do you estimate it?

Question

Accepted Answer

Inference cost is the expense of running a trained model to generate predictions or responses in production. For LLM APIs, it's calculated as (input tokens × input price) + (output tokens × output price) per request. A typical GPT-4-class API call costs $0.01–$0.10 depending on prompt length and response size.

What is inference cost and how do you estimate it?

Key Considerations