What AI agents are quietly changing inside companies

May 16, 2026

Introduction

This week, I wrote a note about a small IT company that I know very well. They recently deployed Codex and just received their first invoice.

As you might imagine, after distributing licenses to all their developers with instructions to experiment freely, they were unprepared for what would inevitably happen at the end of the month. When the invoice arrived, the cost was staggering, and no one could explain or quantify the actual benefits achieved during that period.

Their immediate reaction was to halt all Codex usage and ask support to define procedures for better optimization and controls to monitor cost evolution.

This experience serves as the basis for this article, which I hope will help those currently facing similar situations or others who may encounter this problem in the future.

Why this matters now

One of the biggest mistakes people make with AI coding agents like Codex is assuming that usage is mostly driven by the human user.

It is not!!!!!

Agentic workflows can consume significantly more tokens than traditional chat interactions because the system continuously:

reads files,
revisits context,
retries operations,
expands prompts,
generates intermediate reasoning,
and loops through tasks.

This becomes especially important for:

individual users paying for their own license,
teams using shared enterprise environments,
and companies trying to scale AI usage without losing cost visibility.

The goal is not to use Codex less.

The goal is to use it more intentionally!

Understanding what actually drives consumption

Most people think consumption is mainly caused by:

long answers,
large code generation,
or heavy prompting.

In reality, a large percentage of usage often comes from:

repeated file reads,
unclear task definitions,
agent exploration,
retry loops,
unnecessary context loading,
and poor workflow structure.

The more ambiguity the agent faces, the more tokens it tends to consume trying to understand the environment.

This is why structured workflows are almost always cheaper than exploratory workflows.

Practical examples for individual users

1. Avoid “open-ended” sessions

Bad approach:

“Help me improve this project.”

Why this becomes expensive:

the agent explores many files,
tries to infer architecture,
loads unnecessary context,
and keeps expanding the working set.

Better approach:

“Review only these 3 files and identify duplicated logic in the authentication flow.”

This dramatically reduces:

context size,
exploration overhead,
and retry cycles.

2. Start with constraints

One of the best ways to reduce Codex usage is to define:

objective,
scope,
files involved,
constraints,
and expected output.

Example:

“Update only the API validation logic.
Do not modify database models.
Return minimal changes.
Explain only critical decisions.”

Without constraints, the agent often over-explores.

3. Avoid feeding entire repositories unnecessarily

Many users paste or expose an entire codebase when only a small subset is relevant.

This increases:

token usage,
context complexity,
and reasoning overhead.

A good rule is to:

Only expose the minimum viable context.

4. Reuse prompts and workflows

If you repeatedly perform:

code review,
refactoring,
bug analysis,
or documentation generation,
create standardized prompts or Skills.

Structured workflows reduce:

ambiguity,
retries,
and inconsistent outputs.

This improves both:

quality,
and consumption efficiency.

Practical examples for enterprise teams

Enterprise environments face a different problem.

The challenge is usually not one power user but instead in uncontrolled scaling.

1. The hidden cost of exploratory usage

In many companies, teams begin using Codex without operational guidelines.

Typical pattern:

multiple agents running simultaneously,
duplicated work,
repeated repository scans,
long conversational debugging sessions,
and unlimited experimentation.

This creates silent token expansion.

The cost often appears suddenly.

2. Standardized operational workflows matter

The companies extracting the most value from AI are usually not the companies using AI everywhere but the ones that are standardizing:

how AI is used,
where AI is used,
and which workflows justify agentic execution.

Examples of high-value enterprise workflows:

meeting-to-decisions processing,
executive summaries,
code review assistance,
operational reporting,
dependency analysis,
incident summaries.

Structured workflows create predictable consumption.

3. Separate experimentation from production usage

This is one of the most important enterprise practices and Companies should distinguish between:

Exploration

open experimentation,
discovery,
ideation,
prototyping.

and:

Production workflows

standardized tasks,
repeatable operations,
controlled inputs,
reusable Skills.

Without this separation, costs become difficult to forecast.

4. Skills become a cost-control mechanism

Most people think Skills are mainly about convenience, but in enterprise environments, Skills also improve:

consistency,
governance,
output quality,
and token efficiency.

A well-designed Skill:

limits unnecessary exploration,
structures the workflow,
standardizes outputs,
and reduces repeated reasoning.

This is especially important at scale.

A Real Example: Meeting-to-Decisions Skill

A generic AI workflow often looks like this:

Upload Teams transcript → ask for summary.

The result is usually:

broad,
inconsistent,
and operationally weak.

A structured Skill is different.

Instead of generic summarization, the Skill explicitly extracts:

decisions made,
unresolved topics,
blockers,
owners,
deadlines,
dependencies,
operational risks,
and next actions.

This creates:

better outputs,
more consistency,
and lower token waste.

Because the workflow is predefined.

The biggest mistake companies make

The biggest mistake is trying to scale AI usage before designing AI workflows.

AI consumption is not only a technical issue. It is an operational design problem.

The organizations that will benefit most from AI are not necessarily the ones generating the most prompts. In my opinion, are the ones redesigning workflows around:

lower cognitive friction,
reusable systems,
and structured execution.

In conclusion

The future of AI usage is probably not unlimited interaction but instead intentional orchestration.

The most valuable AI users will not be the people generating the most output.

They will be the people creating:

efficient workflows,
reusable operational systems,
and high-leverage processes.

That is where tools like Codex become much more than assistants.

They become operational infrastructure.

If you liked this article, please help me by subscribing my account. Every new Subscriber is an incentive to continue to write!

Mike Schlottman

May 17

Good piece. The frame that makes this click for anyone who lived through the cloud-cost reckoning of the late 2010s is that this is essentially FinOps with a new resource type. Same pattern: technology adoption outruns financial controls, the first invoice triggers panic, leadership freezes spend, and then teams have to retrofit the governance they should have built before deployment. The playbook is mature: tagging, chargeback, anomaly detection, budgets per team, showback dashboards. AI cost governance is not a new discipline, it is FinOps applied to tokens instead of compute hours.

Prompt or not - AI 4 everyone

Discussion about this post

Ready for more?