AI Agent Security: How to Keep Your Data Safe When AI Can Take Action

If you’re evaluating an AI agent — one that can read your email, browse the web, manage files, and take real actions on your behalf — your first question should be: is this safe?

It’s the right question. And if the company building the agent dodges it or gives you a vague paragraph about “enterprise-grade security,” that tells you everything you need to know.

This post explains exactly how Agent-S handles security, what architectural decisions we made and why, where the risks actually are, and what we’re still working on.

Why Security Matters More for AI Agents Than Chatbots

A chatbot generates text. If it hallucinates, you get a wrong answer. Annoying, but the blast radius is small — you read the output, decide it’s wrong, and move on.

An AI agent takes actions. It sends emails. It updates your CRM. It logs into services on your behalf. It downloads files and processes real data. The blast radius of a mistake or a security failure isn’t a bad paragraph — it’s a sent email, a deleted file, a leaked credential.

This is the fundamental difference between a chatbot and an agent. Chatbots are read-only. Agents are read-write. And read-write systems demand a completely different security posture.

When we built Agent-S, we started from a simple premise: an AI agent should be treated with the same security rigor as a remote employee who has access to your systems. That means isolation, scoped permissions, credential management, audit logging, and the ability to revoke access instantly. Not as afterthoughts. As foundational architecture.

The Isolation Model: Your Agent’s Computer Is Yours Alone

Every Agent-S user gets a dedicated computing environment. Not a shared server with user-level permissions. Not a container on a multi-tenant cluster. A dedicated machine.

Separate compute, separate storage. Your agent’s Linux environment — its file system, running processes, network stack — is not shared with any other user’s agent. There is no scenario where another user’s agent can access your files or read your agent’s memory. The environments are isolated at the infrastructure level, not just the application level.

No cross-contamination of state. Your agent’s accumulated memory, learned preferences, downloaded files, browser cookies, and saved sessions — all of it lives exclusively in your dedicated environment. When your agent learns your email style over three weeks of corrections, that knowledge exists in exactly one place. It is not aggregated, not anonymized-and-pooled, not used to train anything.

This architecture is more expensive than shared environments. We chose it because the security properties of true isolation are worth the infrastructure cost — especially when the agent has access to sensitive data and credentials.

For a deeper look at the architecture, including why a full computing environment matters beyond just security, see our post on why your AI agent needs its own computer.

Credential Management: How Passwords and Tokens Are Handled

This is the section most people care about, and rightfully so. When you connect your email, your CRM, or any other service to your agent, what actually happens to your credentials?

OAuth-first, always. For services that support it — and most modern SaaS products do — Agent-S uses standard OAuth 2.0 flows. This means your agent receives a scoped access token, not your raw password. The token grants specific permissions (read email, manage calendar events, etc.) and can be revoked at any time without changing your password or affecting your own access.

Secure credential storage. When credentials are stored — whether OAuth tokens or, for services that require it, direct login credentials — they are encrypted at rest. Credentials are not stored in plaintext files, not written to agent logs, and not included in the context sent to the underlying language model. The AI model itself never sees your raw passwords. It interacts with authenticated sessions, not raw credential material.

Scoped access by design. When you connect a service, you choose what level of access to grant. Read-only email access? Read and send? The agent operates within exactly the scope you define. If you connect your Gmail with read-only access, the agent cannot send emails through that connection — even if you ask it to. The OAuth scope enforces the boundary, not just the agent’s instructions.

You control the connections. Every connected service is visible in your Agent-S dashboard. You can see what’s connected, what permissions are granted, and when each connection was last used. Revoking access takes one click and is immediate.

No credential sharing across the system. Your credentials exist in your isolated environment. They are never accessible to Agent-S staff, never shared with other users, and never used for any purpose other than executing tasks you’ve requested.

The Permission and Autonomy Model: You Decide What the Agent Can Do

Security isn’t just about keeping outsiders out. It’s about controlling what happens on the inside. An AI agent that can do anything without asking is a liability, regardless of how well the perimeter is secured.

Agent-S uses a tiered autonomy model where you control the boundary between “do this yourself” and “ask me first.”

Confirmation gates for sensitive actions. By default, high-impact actions require your explicit approval before execution. Sending an email externally, modifying data in connected services, deleting files, taking actions that affect other people — these are flagged for confirmation. The agent drafts the action, shows you exactly what it plans to do, and waits for your approval.

Graduated trust. Most users start with tight controls: the agent can research, draft, and organize autonomously, but needs approval before any external action. Over time, as you see the agent handle tasks correctly, you expand the boundaries. After two weeks of approving every email draft, you might allow the agent to send routine confirmations on its own while still flagging first-contact emails.

Per-action control, not all-or-nothing. Autonomy isn’t a single toggle. You can grant full autonomy for calendar management while requiring confirmation for every email send. The controls are granular enough to match your actual comfort level with each type of action.

Default conservative. When the agent is unsure whether it has permission to proceed, the default is to ask. False negatives (asking when it didn’t need to) are annoying but harmless. False positives (acting when it should have asked) can cause real problems. We’d rather the agent be slightly cautious than slightly reckless.

This mirrors how you’d manage any new hire. Day one, you check everything. Month three, you trust them with routine work. The difference is that the agent’s permission boundaries are explicit and enforceable, not just implied through social norms.

Data Handling: What’s Stored, What’s Ephemeral, Who Can Access It

When your agent processes your data — reads an email, downloads a report, compiles research — where does that data go?

Persistent data lives in your isolated environment. Files your agent creates, research it compiles, memory it builds from your interactions — all of this is stored on your dedicated machine. It persists between sessions so the agent can build on previous work, but it never leaves your environment unless you explicitly ask the agent to send it somewhere.

Language model interactions. When your agent reasons about a task, it sends relevant context to the underlying language model (currently Anthropic’s Claude). The context includes the information needed for the task but is designed to exclude raw credentials and unnecessary sensitive data. Anthropic’s commercial API terms do not use this data for model training.

Conversation history and memory. Your conversations with your agent are stored as part of its persistent state — this is what allows the agent to remember your preferences and learn from corrections. This data lives in your isolated environment and is not accessible to other users.

What we don’t do with your data. We don’t sell it. We don’t use it to train models. We don’t aggregate it across users. We don’t share it with third parties except as necessary to provide the service (i.e., sending task context to the language model API). We don’t mine it for content analytics.

Data deletion. If you delete your account, your entire computing environment — files, memory, credentials, conversation history — is destroyed. Not archived, not retained for 90 days. Destroyed.

What Happens When Things Go Wrong

No security post should pretend things never go wrong. They do. The question is what happens next.

Confirmation gates catch most mistakes before they happen. The most common “mistake” scenario — the agent misinterprets your instructions — is caught at the confirmation step. You see the proposed action, say “no, that’s not what I meant,” and correct course. The wrong action never executes.

Audit trail for everything. Every action your agent takes is logged with a timestamp, the context that prompted it, and the result. If something goes wrong, you can trace the full chain of events — not a vague activity feed, but a detailed record.

Reversible by default. The agent’s architecture favors reversibility. Email drafts are saved before sending. Files are moved before deleting. When an action is genuinely irreversible, it receives a higher confirmation threshold.

Kill switch. You can pause your agent immediately, halting all autonomous activity — scheduled tasks, in-progress workflows, everything — while keeping the environment intact for review.

Scoped blast radius. An agent with read-only email access can’t send emails. An agent without CRM credentials can’t modify CRM data. The principle of least privilege limits what can go wrong, even when something does.

How This Compares to the Alternatives

Every approach to delegating work involves security tradeoffs. Here’s how the AI agent model stacks up against the realistic alternatives.

Compared to a human virtual assistant: A human VA with access to your email can read anything, forward messages to anyone, screenshot sensitive data, and copy information to personal devices. You trust them based on reputation and legal agreements, not architectural enforcement. An AI agent’s access is architecturally scoped — it cannot access services you haven’t connected, cannot exceed the OAuth permissions you’ve granted, and every action is logged. It also doesn’t get phished, doesn’t reuse your passwords, and doesn’t quit and take knowledge of your systems with them.

Compared to API-based automation (Zapier, Make, etc.): Traditional automation tools manage all users’ credentials in shared infrastructure. A vulnerability could potentially expose credentials across accounts. They also require numerous point-to-point integrations, each with its own credential scope, each an attack surface. Agent-S’s isolated computing model means your credentials exist in your environment only, and a single agent interface replaces the credential sprawl of dozens of integrations.

Compared to doing everything yourself: The most “secure” option is never delegating anything. But you become the single point of failure for every task. The relevant question isn’t whether delegation introduces risk (it does, always) but whether the risk is managed well enough that the tradeoff is worth it.

What We’re Still Working On

We could write a paragraph here about how Agent-S has solved all security challenges. We’re not going to, because it would be dishonest.

Areas we’re actively improving:

More granular permission controls. Our current autonomy model covers the most important boundaries, but there are edge cases where users want more specific controls — like allowing the agent to email people within their organization but not external contacts, or permitting file creation but not file deletion in specific directories. We’re building toward more granular, rule-based permission policies.

Better visibility into model context. We want to give users clearer visibility into exactly what data is included in each language model request. Currently, users can review actions and outputs, but the intermediate reasoning context could be more transparent.

Formal third-party security audits. We have internal security practices and testing, but we haven’t yet completed a formal third-party penetration test. This is on our roadmap because internal review, no matter how rigorous, is not a substitute for external validation.

Compliance certifications. For enterprise users who need SOC 2, GDPR compliance documentation, or other formal certifications, we’re working toward these but don’t have them yet. We’ll be transparent about our timeline rather than letting you discover the gap during procurement.

Being honest about what’s incomplete is how you evaluate whether a company takes security seriously. Anyone who tells you their security is perfect is either lying or hasn’t looked hard enough.

Frequently Asked Questions

Can AI agents see my passwords?

No. When you connect a service through OAuth (the standard method), your agent receives a scoped access token — not your password. For services where direct credentials are needed, those credentials are encrypted at rest and never included in the context sent to the AI language model. The model interacts with authenticated sessions, never with raw credential material. Your passwords are not visible in logs, conversation history, or any Agent-S interface.

What happens if an AI agent makes a mistake?

The most common scenario is that the agent proposes an incorrect action — a wrong email draft, an incorrect calendar change. Because sensitive actions require your approval by default, you catch this at the confirmation step and correct it before anything happens. For the rare case where an error slips through, every action is logged in a detailed audit trail. You can also pause the agent instantly, halting all activity while you review. And because access is scoped through permissions you control, the potential impact of any single error is bounded.

Is my data shared with other users?

No. Each Agent-S user gets a fully isolated computing environment. Your files, credentials, conversation history, and agent memory are not shared with, accessible to, or visible to any other user. This is infrastructure-level isolation, not just application-level permissions on a shared system. Your data is not used to train AI models, not aggregated for analytics, and not sold to third parties.

Can I revoke agent access to a service?

Yes, immediately. Every connected service is visible in your Agent-S dashboard. You can disconnect any service with one click, and the agent loses access within seconds. You can also pause your entire agent, stopping all autonomous activity instantly while keeping your environment intact for review.

What data does the AI model actually see?

When your agent reasons about a task, it sends relevant context to the underlying language model (Anthropic’s Claude). This includes information needed for the current task — the email it’s replying to, the document it’s summarizing, your instructions. It is designed to exclude raw credentials and unnecessary sensitive data. Under Anthropic’s commercial API terms, this data is not used for model training.

The Bottom Line

AI agent security isn’t a feature you bolt on after building the product. It’s an architectural decision that shapes everything — how environments are provisioned, how credentials are stored, how permissions are enforced, how data flows through the system.

We built Agent-S with isolation, scoped access, and user control as foundational constraints, not optional settings. The result is a system where giving your agent access to your tools carries manageable, well-defined risk — not open-ended exposure.

If you have specific security questions we haven’t addressed here, reach out. We’d rather answer hard questions directly than hope nobody asks them.