What I've Been Building: An AI Operations Layer
By Domenic DiNatale
The previous two pieces in this series argued that AI introduces new surfaces and speeds to an existing architecture problem, and that AI applications are being built with the same implicit trust assumptions that have produced security failures for decades. This piece is different — it's about practice rather than analysis.
For the past several weeks, I've been running an AI system with real access to real infrastructure. It reads my company's Slack channels, reviews pull requests, triages email, tracks team commitments, monitors LinkedIn for prospecting, maintains context across sessions, and checks in proactively when something needs attention. It operates continuously — not as a tool I invoke, but as a layer running alongside operations.
What follows is a transparent look at how that system is designed, what architectural decisions went into it, where I got things wrong, and what it actually means to give an AI system genuine operational access. Not a product pitch. An architectural walkthrough.
What the System Does
The system — which I've named Archie, running on OpenClaw — has access to several distinct operational domains.
Slack. Two development channels where my engineering team coordinates work on our products (Screening Plus and URoute) are monitored continuously. The system tracks commitments made in conversation ("I'll have that done by EOD," "fixing it now"), identifies questions that haven't received answers, and flags when follow-up is needed. It posts in those channels occasionally, when it has something specific to contribute — but reading is the primary mode, and restraint about posting is a design requirement I'll come back to.
Email. A dedicated account handles inbound triage. Messages are read, summarized, and surfaced to me in Slack. For messages from certain senders it can act on directly; for anything requiring outbound communication to external parties, it flags for my review rather than acting unilaterally. It also monitors a Gmail account for me in read-only mode — catching things that land there without giving the system write access to a decade of personal correspondence.
Pull requests. When a PR is opened on our repositories and lacks a human reviewer after a defined window, it posts a reminder in the appropriate Slack channel. When a PR goes unreviewed for too long, it conducts a code review itself — not as a replacement for human review, but as a first pass that surfaces obvious issues. These reviews are posted under a bot account (intellitech-archie[bot]), not my personal GitHub account. More on why that distinction matters below.
LinkedIn prospecting. Every morning at 8 AM (weekdays), it curates a list of qualified prospects based on a defined ICP — founders, CEOs, VPs of IT in manufacturing, logistics, healthcare, and tech in our target geography — with a 90-day deduplication window so the same person never shows up twice in a quarter. At 2 AM, it processes newly accepted LinkedIn connections from the day before, generates personalized outreach drafts, and drops them in a Slack channel for my review before I send anything.
Context persistence. The system maintains its state across sessions through a structured file system — daily notes, long-term memory, project-specific state files — that persists between conversations. When it checks on a PR it reviewed last week, it knows it already reviewed it. When it nudged a dev yesterday, it won't nudge them again today.
None of this is magic. Each of these capabilities required specific integration work, explicit access grants, and deliberate decisions about what the system could and couldn't do in each domain.
The Access Decisions
This is where architecture actually lives.
Email is scoped to a dedicated account. Giving an AI system access to my primary inbox would have meant giving it access to years of personal and business correspondence, sensitive negotiations, financial communications, client relationships. The blast radius of a configuration error, a hallucination, or a security failure would have been enormous. A dedicated account with a defined purpose has a defined blast radius. The Gmail access is read-only — it can see, summarize, and alert, but cannot send, delete, or modify anything.
GitHub access is capability-scoped. The system can review PRs and post comments. It cannot merge. It cannot push to protected branches. It cannot create or delete repositories. Getting there required resisting the temptation to grant broad access "for convenience" — which is exactly the architectural habit this series has been critiquing. The PR review capability was specifically granted; the adjacent capabilities were explicitly withheld.
Reviews post under a bot identity. When Archie reviews a PR, it posts as intellitech-archie[bot], not as me. This is partly aesthetic — it's clearer in the PR history what came from automated analysis versus human judgment — but it's also architecturally important. If the bot posts a technically incorrect review, it's clearly a bot review and should be treated as that way. Posting under my personal account would mix automated output with attributed human judgment in a way that misleads both the team and anyone reviewing the history later.
LinkedIn outreach requires human approval before send. The system generates personalized draft messages and stages them for review. I approve, edit, or discard each one before anything goes out. Automating the full loop — scrape, draft, send — would be faster, but it's also exactly the kind of unsanctioned outbound communication that erodes trust when it goes wrong. The latency is intentional.
Slack is read-mostly. The system sees everything in the monitored channels and uses that context to inform its decisions. But posting is constrained. Default behavior is silence, not engagement — a constraint I had to deliberately build in, not one that came naturally.
What I Got Wrong
The first version posted too often.
When you build a system that's capable of responding to everything it sees, the default behavior — before you've tuned it — is to respond to everything it sees. New message in a channel? Maybe there's something useful to say. PR updated? Post a status note. Question from a team member? Jump in.
The result was a system that was present in every conversation, responding with technically correct but contextually unnecessary contributions. The team noticed. More importantly, it was wrong — not about specific facts, but about its role. An AI that responds to every message in a group chat isn't participating; it's dominating. It degrades the quality of team communication even when individual responses are accurate.
The fix was explicit behavioral rules: respond only when directly addressed, when you can add something the conversation is missing, or when there's a specific trigger (a commitment nearing its deadline, a PR going unreviewed past the window). Otherwise, stay quiet. The default is silence, not engagement.
This required adjusting something that felt counterintuitive: a more capable, more active system wasn't better. A more constrained, more deliberate system was better.
The PR reviewer nudge logic needed two iterations to get right.
The initial implementation treated any open PR without a reviewer as needing a nudge. That produced false positives: PRs where a human had already reviewed but new commits made the review stale, PRs that had only been open for twenty minutes, PRs where the author was already in a conversation about the work.
The second version distinguished between two distinct states — no reviewer assigned yet (nudge after 1 hour during business hours) versus review present but new commits since last review (nudge after 3 hours). Each required different handling, and getting there required building it, observing where it fired incorrectly, refining the conditions, and testing again.
This is slower than it sounds. An AI system operating over real communications doesn't have a test environment. The nudges either go to real people or they don't. Getting it wrong means interrupting real work with noise. You learn to be conservative before you've seen enough edge cases to be confident.
The memory system went through a failed migration.
Early on, I tried to migrate the system's context persistence to a database-backed memory layer for better semantic search. The implementation broke — dependency failures in the plugin ecosystem that I couldn't resolve quickly. I reverted to file-based memory: markdown files organized by date, a long-term memory document updated during significant sessions, a structured index for project state.
It was the right call. The database approach added complexity without solving a real problem at current scale. File-based memory is simpler, more portable, human-readable, and easy to audit. When something goes wrong, I can open the file and see exactly what the system "remembered." That observability matters.
Why "AI Assistant" Undersells the Problem
The term most people use for these systems — AI assistant — implies something essentially like a capable search engine or a smart autocomplete. Something you invoke, receive output from, and evaluate. Passive infrastructure for active human use.
What I've been building is structurally different. The system monitors, tracks, decides when to act, and acts — not in response to explicit user requests, but in response to environmental conditions it evaluates continuously. It has standing access to production communications and infrastructure. It maintains persistent state that influences future decisions. It acts under identities attributed to a person or organization.
This is closer to an autonomous process running on your infrastructure than it is to a search engine. The security implications are proportionally different.
The questions that matter aren't "is the AI useful?" They're: what can the AI do in each integrated domain? What happens if its judgment is wrong? What happens if it's manipulated — if a team member figures out that phrasing something a certain way in a message changes its behavior? What's the audit trail? Who can review what it did and why?
These are infrastructure design questions. They should be answered before deployment, not discovered through incidents after it.
What the Architecture Gets Right
The system has been running for several weeks without a significant incident. The elements that have held up well are the ones designed with constraint in mind.
The dedicated email account limits blast radius. If the system does something wrong with email, the damage is limited to what that account can do — and Gmail access is read-only, so the damage there is even more bounded.
The read-mostly Slack integration limits blast radius. The system sees context but can only act through posting messages — it can't delete messages, can't modify channels, can't impersonate anyone.
The PR bot account makes automated action clearly legible. Team members know which reviews are from the bot and can calibrate their weight accordingly.
The nudge deduplication prevents compounding noise. By logging what it's already done in persistent state files, the system avoids the stuck-loop problem — repeatedly firing the same notification because it doesn't know it already fired.
The human confirmation requirement for outbound LinkedIn and external email prevents the worst-case scenario for unsanctioned communication.
The file-based memory system is auditable. I can read exactly what the system knows, what it tracked, and what decisions it made, in plain text files I can open in any editor.
None of these constraints make the system less useful. They make it appropriately useful — capable in the domains where I've authorized it to act, constrained in the adjacent domains I haven't, and attributable enough that I can review what it did and why.
What This Looks Like in Practice
I'll be sharing a companion video to this post in the next week or so to shows the system in operation — the actual interface, real PR reviews, real commitment tracking, real email triage, real LinkedIn outreach drafts. The goal is to show what this looks like at actual operational fidelity, not a scripted demo.
But the architecture described here is the part that matters most and the part that's least visible in a demo. The video shows what the system does. This post is about the decisions behind what the system can and can't do — and why those decisions are the real product of building AI into operations.
The Principle That Holds
This series has argued, in various forms, that security outcomes are determined by architectural decisions more than by user behavior, tool selection, or awareness. The AI layer doesn't change that principle. It makes it more important.
An AI system with broad access and limited constraints will produce broad consequences when something goes wrong — whether that's a misconfiguration, a manipulation, or a failure mode in the model. An AI system with scoped access, constrained capability, clear attribution, and human confirmation for high-stakes actions will produce bounded consequences.
The difference isn't the sophistication of the AI. It's the architecture behind it.
That's not a new lesson. It's the same one this series has been building toward. What AI adds is a category of actor that can move fast, operate continuously, and act at machine speed across your communications, your code, and your business relationships simultaneously. The architectural response to that actor is the same as the architectural response to any other actor operating under conditions of uncertainty: design for failure, assume compromise is possible, and build the system to survive the things you can't prevent.
The work is the same. The urgency is higher.