Building a cost analyzer for ChatGPT

Building a cost analyzer for ChatGPT

Published on 12.01.2024

Cost optimization remains a critical factor for businesses and individuals alike. This article delves into a practical exploration of cost efficiency concerning ChatGPT, a widely used conversational AI.

Understanding the Cost Dynamics

ChatGPT, developed by OpenAI, offers two distinct pricing models: a flat rate of $20 per month for unlimited use, and a pay-as-you-go API model. The latter’s cost is contingent upon the length of input and output. This raises an intriguing question: would it have been more economical to use the GPT-4 API for all our conversations, instead of the flat-rate ChatGPT service?

Considering the Trade-offs

Before delving into the financial comparison, it’s crucial to note certain trade-offs:

  • The ChatGPT interface, while not exceptionally innovative, provided early access to some OpenAI features not available via the API, like DALL-E 3 image generation previews. This early access could also apply to future features.
  • ChatGPT includes various built-in functionalities, such as a Python sandbox for the “Code Interpreter” plugin, enabling the execution of arbitrary Python code. Replicating these features independently would require additional effort.

Methodology for Analysis

Data Acquisition

The first step involves exporting conversation data from the ChatGPT interface. This is achievable through the “Export data” function, yielding a ZIP file containing all conversations, barring those deleted by the user.

Upon receiving the download link via email, the ZIP file reveals numerous files, including conversations.json, which is particularly promising. This file, at 11.4 MB, encompasses all our conversations with detailed messages and timestamps.

Development Approach

To calculate costs accurately, we must consider that input and output lengths in GPT-4 API are measured in tokens, not characters. Tokens can represent partial words, punctuation, or spaces. For token counting, we will utilize the following resources:

  • Token Counting: A PHP library for token counting (https://github.com/yethee/tiktoken-php).
  • ZIP File Handling: A PHP library for easier ZIP file manipulation (https://github.com/Ne-Lexa/php-zip).
  • Development Ease: Libraries like Laravel IDE Helper (https://github.com/barryvdh/laravel-ide-helper) will be used to enhance the development process.

Considerations and Challenges

  • Context Length: Determining the exact number of previous messages ChatGPT uses for context is ambiguous. This context influences the available tokens for the output. We will assume the use of the maximum context length that fits within the API limits, including the output.
  • Library Performance: The chosen token counting library has known performance limitations. To address this, we can either find a faster alternative or implement a chunked processing system. The latter option could also enhance the user experience in a public version of this tool.

Processing Steps

Upon importing the exported data:

  1. Read conversations.json.
  2. Iterate through all conversations, and within each:
    • Traverse through the messages.
    • Calculate the token length of each message.
    • Aggregate this length to derive total conversation length.

Next Steps

This analysis sets the stage for a deeper exploration into the cost-effectiveness of ChatGPT’s usage models. Stay tuned for further insights and findings in our subsequent articles.