Building an Audit Trail for Claude Code Agents: What You Actually Need
Your security team is going to ask about your Claude Code agents. Not if β when. If your answer is βwe have Anthropic API usage logs,β youβre going to have a bad meeting.
API usage logs tell you tokens were consumed. They donβt tell you what the agent did with those tokens. They donβt show the shell commands executed, the files modified, the Git commits pushed, or the AWS resources created. The actual actions β the things auditors care about β happen downstream of the API call.
What Needs to Be in the Log
A complete agent audit trail captures five categories:
Session lifecycle. When the session started, who launched it, what project, what permissions were granted, when it ended, and why (completed, timed out, killed, budget exhausted).
Tool invocations. Every tool call β shell commands, file reads and writes, MCP server calls, API requests. For each: tool name, input parameters, output or error, timestamp.
Infrastructure operations. AWS API calls, Git operations, database queries, HTTP requests to external services. These have side effects on external systems β they need special attention because theyβre the ones that cause damage.
Policy decisions. Every time an action was checked against a policy β what was requested, what rule applied, whether it was allowed or denied. Denials are especially important: they show your controls are working.
Cost metrics. Token consumption per tool call, cumulative session cost, budget thresholds crossed. Cost events are a leading indicator β a session that suddenly spikes in cost is often stuck in a loop.
What Format Satisfies SOC 2 Auditors
SOC 2 auditors care about three things: completeness, immutability, and retrievability.
Completeness: Every relevant event is captured. If an agent modified a production database, that event must be in the log. Gaps where the agent was active but no logs exist will be flagged.
Immutability: Logs canβt be altered after the fact. This means write-once storage. S3 with Object Lock, CloudWatch Logs (append-only), or a dedicated log management service that enforces immutability.
Retrievability: You can find what you need. If the auditor asks for everything session X did on April 15th, you produce that in minutes, not days. This requires structured logs (JSON, not free-text), consistent session IDs across all entries, and a query mechanism.
A practical event format:
{
"timestamp": "2026-04-15T14:32:07Z",
"sessionId": "sess_a8f3b21c",
"developerId": "dev_jordan",
"projectId": "proj_backend",
"eventType": "tool_invocation",
"tool": "bash",
"input": "aws s3 cp config.json s3://prod-config/app/config.json",
"output": "upload: ./config.json to s3://prod-config/app/config.json",
"policyResult": "allowed",
"policyRule": "s3:PutObject on prod-config/app/*"
}
Every event ties back to a session, developer, and project. Every event has a policy result. This is what auditors want to see.
What a Bad Audit Trail Looks Like
The most common failure mode is logging at the wrong level of abstraction.
Bad:
2026-04-15 14:32:00 INFO anthropic.api POST /v1/messages 200 1247 tokens
2026-04-15 14:32:05 INFO anthropic.api POST /v1/messages 200 893 tokens
This tells you nothing. What did the agent do between those API calls? What files did it read? What commands did it execute? The Anthropic API logs are the conversation. The tool execution logs are the actions. Auditors care about the actions.
Another failure: logging without session correlation. If you have logs from Claude Code, CloudTrail, and your application but they share no common identifier, correlating them during an incident is manual detective work. Every log entry needs a session ID.
Retention and Export
SOC 2 typically requires 1 year minimum. HIPAA requires 6 years.
Practical approach: store recent logs (90 days) in a queryable system like CloudWatch Logs or Elasticsearch. Archive older logs to S3 with Object Lock for the compliance retention period. Build an index to find archived logs by session ID, developer, or date range.
Your audit system should support three export paths:
- Real-time streaming to your SIEM for security monitoring and alerting
- Periodic export to your compliance archive (daily or weekly)
- On-demand export for incident response β when someone asks for everything related to a production incident, you produce a complete, chronological report filtered by session IDs
Where to Log
The critical decision is where in the stack to log. Logging inside the agent process is unreliable β if the agent crashes, the last few events might be lost. Logging at the gateway layer is more reliable because the logging infrastructure is independent of the agent lifecycle.
This is one reason gateway architectures exist. The gateway sees every operation the agent performs and logs it before the operation reaches the infrastructure. If the agent crashes, the logs are intact. If the agent tries something outside its policy, the denial is logged even though the operation didnβt execute.
Your auditors will thank you. More importantly, the first time something goes wrong β and it will β youβll thank yourself.
Put this into practice with Sentrely
Everything covered in this article is built into Sentrely's managed control plane. Get early access and have it running against your Claude agents in minutes.