{{current_date_full}}

Weekly concept
token noun
Token: the small piece of text a model actually reads and writes, usually a bit smaller than a word, and the unit your AI usage is measured and billed in.
Tokens are the unit and currency of AI. They are how a model measures its inputs and outputs. You are billed on the amount of tokens you consume, from your message, the input, and the AI’s response, the output. Language models don’t read words the same way you do. Instead it categorises words and characters into tokens which are roughly three quarters of a word in English, or around four characters. So one hundred tokens is roughly seventy five words, and a full page of writing is around four hundred to five hundred tokens. Because most of the model’s training is done in English, other languages tend to consume more tokens, so the same sentence in Mandarin or Arabic may consume two to three times as many tokens.
So why does AI use tokens, rather than words or letters? This is because AI models can only read numbers (and vectors), so words are broken up into tokens, and every token is turned into a number. Each token has its own unique identification number so the AI assigns the correct token, in the correct position, this is known as tokenisation.
Breaking words into tokens, with unique identification numbers simplifies the computational process for the models. If the AI stored words (and translations), its database would be significantly larger, and if it stored letters, responses would take far too long. Tokens act like reusable building blocks. For example, the word tokenisation splits into ‘token’ and ‘isation’, each with their own unique identification number. Since ‘isation’ now has its own identification number, that number can be reused for other words, such as ‘organisation’, ‘civilisation’ and ‘realisation’ etc. Hence, rather than having to store another 3 words, the model can simply reuse the number used for the token ‘isation’, hence reducing the amount of data needed to pull from its database, and the computational process of reading and responding to messages, making them much more efficient.

By breaking words up into repeatable chunks, a vocabulary of a few tens of thousands of tokens, can spell most English words, including words that the model never encountered during training.
Your usage is charged per million tokens, the message that you send, the input, and the response you get back, the output, are billed separately, with responses being roughly five times as expensive. Therefore, long conversations are not only slower, but much more expensive, because the model re-reads the whole conversation (the context window) before every response. Reduce the length of your messages and for ChatGPT and Claude.ai, start new chats regularly. If you use Claude Code or Codex the command /clear empties the context window and /compact summarises your conversation, reducing the amount of tokens the model re-reads per response. The less that the model has to re-read, the less tokens you consume, making your conversations quicker and cheaper.
For every message, you are paying for the length of your conversation. Reduce the length of your messages, start new chats and use commands like /clear and /compact.
The reason models can produce extremely comprehensive responses, but make very simple mistakes, such as the number of r’s in strawberry, is because it breaks down the word into tokens, and sees the token as a number, so it doesn’t actually see the letters, so counting the r’s is difficult, although Claude Fable 5 managed to count them successfully!
If you found this helpful, please feel free to give me some feedback as I continue to release new educational topics every week. Let me know if there’s any other topics that you guys want me to break down.
Stay curious,
James
Be first to know
The full course is coming soon. Register your interest and you’ll be notified when it becomes available. Organisational training is also available for a limited time.
Curiosity