The CRM/ERP Lyra Stack: How Our AI-Native Company Looks in May 2026

Most "show me an agent doing real work" threads end with a LangChain demo. The honest demand underneath is for production receipts — agents running continuously against real money and real consequences, not a one-shot notebook run. This post is that, from my side of the desk.

I should be upfront before going further. I set up and run all the systems described below. This is not an objective product review. It is a practitioner inventory written by the practitioner who shipped it. I am writing it because the most common pushback to enterprise-AI pilots — "where is anyone actually running this in production?" — has a clean answer in a solo-practice context, and I think more of those receipts in public would shift the conversation.

The Journey, Briefly

I started where most people start: an agent mesh of off-the-shelf agents. Four of them, each with a different personality and toolkit, sharing a file-based task queue at ~/.agent-mesh/. Claude Code (Anthropic) did the heavy coding. Hermes (Kimi K2.5 on NVIDIA NIM) did research and sales drafting. OpenClaw (GLM 4.7) was the orchestrator with cron skills. Goose (Qwen3 Coder, on the local Strix Halo) was the Telegram-reachable backup. The idea was to plaster skills-as-Markdown-files on each agent and let them hand each other tasks.

It worked, but it was the wrong shape for what I actually needed. Plastering more skills onto a generic agent runs into the same wall every time — the agent has no real model of my contacts, my pipeline, my meetings, my finance state. It always read like a junior trying to fake familiarity. So in mid-April I started building Lyra: a context-built CRM/ERP where the agents are tied directly to the data they reason over, not loaded as a chat plugin. Lyra is now the centre. The mesh has shrunk to a residual layer feeding leads and research into the centre.

The Architecture (Today, May 2026)

Architecture diagram: Operator interacts via Pixel 9, Desktop, and Telegram with three primary surfaces — Lyra Companion app, @LyraSG_bot, and Claude Code. All three converge on CRM/ERP Lyra (centre), which fans out to local AI on Strix Halo, cloud LLM for burst-out, and the public @AltronisLyra_bot. Altronis News and the residual mesh (OpenClaw, Hermes, Goose) feed into Lyra. — CRM/ERP Lyra architecture · May 2026 · operator interacts via three surfaces; Lyra is the centre; off-the-shelf mesh has shrunk to a feeder layer. .

View ASCII version

                              ┌───────────────────────────┐
                              │      OPERATOR (me)        │
                              │  Pixel 9 · Desktop · TG   │
                              └─┬─────────┬──────────┬────┘
                                │         │          │
                  mobile UX     │  voice/ │          │  pair-program /
                                │  text/  │          │  architect /
                                │  photo  │          │  build Lyra
                                ▼         ▼          ▼
        ┌─────────────────────┐  ┌──────────────────┐  ┌───────────────────────┐
        │  LYRA COMPANION     │  │  @LyraSG_bot     │  │  CLAUDE CODE          │
        │  (Pixel 9 app)      │  │  (internal DM)   │  │  Anthropic · primary  │
        │  Kotlin/Compose     │  │  speak-to-CRM    │  │  engineering partner  │
        │  - 5-min ContextTick│  │  - voice/text/img│  │  - over Telegram      │
        │  - HealthConnect    │  │  - approve/edit/ │  │    (most days, on the │
        │  - notifs listener  │  │    skip buttons  │  │    phone)             │
        │  - mic opt-in       │  │    consumer      │  │  - terminal CLI when  │
        │  - GPS, BT, DND     │  │  - mutation gate │  │    at desk            │
        │  - Chief/Sales/     │  │                  │  │  - architecture,      │
        │    Delivery/Finance │  │                  │  │    Lyra build-out,    │
        │    chats            │  │                  │  │    schema design,     │
        │  - Daily wrap, muses│  │                  │  │    review-before-ship │
        └──────────┬──────────┘  └────────┬─────────┘  └──────────┬────────────┘
                   │                      │                       │
                   │ context_ticks,       │ read + mutate         │ ships code
                   │ photos, ack          │ requests              │ into Lyra
                   ▼                      ▼                       ▼
        ╔════════════════════════════════════════════════════════════════════╗
        ║              CRM/ERP LYRA  (the centre — our own build)            ║
        ║  ────────────────────────────────────────────────────────────────  ║
        ║  Backend  (FastAPI, port 8003, ~31k LOC)        Frontend (Next.js) ║
        ║   ├─ Agent Registry (Chief, Sales, Delivery,        port 5173      ║
        ║   │   Finance, Governance)                                         ║
        ║   ├─ Tool Catalog: get_contact, push_draft,                        ║
        ║   │   browse_url, web_search, request_phone_                       ║
        ║   │   action, search_knowledge, ask_sales/                         ║
        ║   │   delivery/finance ...                                         ║
        ║   ├─ Mutation gate (every change needs approve)                    ║
        ║   ├─ skill-server side-car (port 18793)                            ║
        ║   └─ ContextTick schema + tick-flow watchdog                       ║
        ║                                                                    ║
        ║  Data: Firestore (lyra-980e5) — context_ticks, contacts, meetings, ║
        ║         outreach, proposals, invoices, news_items, case_studies,   ║
        ║         pending_mobile_actions, pending_mutations                  ║
        ╚════════════════╦══════════════════════════════════╦════════════════╝
                         │                                  │
              inference  │                                  │  public-facing
                         ▼                                  ▼
        ┌──────────────────────────────┐         ┌─────────────────────────┐
        │  LOCAL AI (Strix Halo)       │         │  @AltronisLyra_bot      │
        │  llama-server  :8001         │         │  posts to @sgaibiz grp  │
        │    Gemma 4 26B  Q8 (128k)    │         │  + altronis.sg readers  │
        │  llama-vlm     :8080         │         │  + future X drafts      │
        │    Qwen3-VL-32B Q4 (8k)      │         │  (separate token, no    │
        │  ComfyUI       :7860         │         │   DM auth, broadcast    │
        │    Qwen-Image + Lightning    │         │   only)                 │
        │    LoRA 4-step → 36s 1024px  │         └─────────────────────────┘
        │  SearXNG       :8888
        │  bge-m3-server, Surya OCR
        └──────────────┬───────────────┘
                       │
            burst-out  │
                       ▼
        ┌──────────────────────────────┐
        │  CLOUD LLM (when needed)     │
        │  Anthropic — Claude Opus 4.7 │
        │  z.ai — GLM 4.7/5.1          │
        │  NVIDIA NIM — Kimi K2.5      │
        │  OpenAI — gpt-5.1            │
        └──────────────────────────────┘

                  ┌─── feeds  into  CRM Lyra  ───┐
                  │                              │
                  ▼                              ▼
       ┌───────────────────────┐      ┌──────────────────────────┐
       │  ALTRONIS NEWS INGEST │      │  RESIDUAL MESH           │
       │  (hourly cron)        │      │  ┌────────────────────┐  │
       │  → news_items         │      │  │ OpenClaw (10 crons)│  │
       │  → x-drafter pipeline │      │  │  - SME AI Prospect │  │
       │  → digest channel     │      │  │  - Enterprise Scout│  │
       └───────────────────────┘      │  │  - Dev Radar       │  │
                                      │  │  - Goal Planner    │  │
                                      │  │  → pushes leads +  │  │
                                      │  │    intel into Lyra │  │
                                      │  ├────────────────────┤  │
                                      │  │ Hermes (low-prof)  │  │
                                      │  │  - lead scouting   │  │
                                      │  │  - last activity   │  │
                                      │  │    early April     │  │
                                      │  ├────────────────────┤  │
                                      │  │ Goose (@Esuna_bot) │  │
                                      │  │  - tg-reachable    │  │
                                      │  │    backup coder    │  │
                                      │  │    when Claude is  │  │
                                      │  │    rate-limited    │  │
                                      │  └────────────────────┘  │
                                      └──────────────────────────┘

Three things to read off the diagram. One, Lyra is the centre — every interaction surface (companion app, internal Telegram bot, Claude Code as engineering partner) reads and writes through the same backend, against the same Firestore database. Two, Claude Code sits on the same row as my own interfaces because that is honestly where it lives in the workflow — I work with Claude over Telegram most days (on the phone, between meetings, on the move), with the terminal CLI used when I am at my desk; it is the engineering partner that designs, builds, and reviews Lyra alongside me, not a passive coder. Three, the residual mesh on the side (OpenClaw, Hermes, Goose) is the feeder layer: OpenClaw runs ten focused cron skills that drop scouted leads and research into Lyra, Hermes is dormant-but-not-removed, Goose is the Telegram-reachable backup when Claude is rate-limited. Local AI on the Strix Halo is hot path; cloud LLMs are burst-out for frontier-tier tasks.

CRM/ERP Lyra: The Chief and the Sub-Agents

Inside the Lyra backend there is an agent registry — five distinct agent specs, each with its own persona, its own toolset, and its own mutation permissions.

The Chief is the default orchestrator. Full toolbox. Handles cross-domain questions and routes to vertical agents when the question fits cleanly into one bucket. It is the agent the @LyraSG_bot speaks to by default when I drop a voice note or text in. Its persona is tight: "use the right tool for the question, do not narrate the plumbing, when a question spans domains answer the synthesis instead of handing off."

The Sales agent owns the pipeline. Its tools are scoped: get_contact_by_name, list_upcoming_meetings, list_recent_interactions, get_interaction_body, web_search, browse_url, create_contact, update_contact_stage, set_contact_followup, push_draft (writes a sales-reply draft straight into Outlook Drafts), attach_files_to_draft, request_phone_action (queues an on-device action on the Pixel companion app), request_handoff. It is the agent that handles "what's the next move on Company X", "draft a follow-up for that meeting last week", "log a new lead from the conference".

The Delivery agent owns active engagements. Its tools cover project summaries, upcoming meetings, contact lookups, web search and browse. It is read-only on most of the system — it does not mutate the pipeline because Delivery is about reporting on what is happening, not changing it. It is the agent that answers "what's on fire today" and "what is the deliverable status on the CF DocFlow engagement".

The Finance agent owns money in and money out. Invoices, payments, monthly revenue, overdue balances, GST awareness (Altronis is below the SGD 1M GST threshold today so invoices carry no tax line by default; the agent knows to switch when that changes).

The Governance agent is read-only and exists to audit the other four — it can read every collection in Firestore but cannot mutate anything. Its job is meta-reasoning: "are the Sales drafts last week consistent with the Chief's instructions?", "is the Finance agent's GST handling still correct given the threshold?". It is the agent that catches the other agents being wrong.

Each agent has the same access pattern: it talks to Claude / Gemma / GLM / Kimi over the OpenAI-compatible chat completions interface, receives a tool-call response, the backend executes the tool, gets the result, threads it back. Mutations (anything that changes state) hit a confirmation gate before they execute — the user has to tap approve on a Telegram inline keyboard. Read-only tools fire immediately.

The Two Telegram Bots

I run two distinct Telegram bots and the distinction matters more than it sounds.

@LyraSG_bot is the internal bot. Only my user ID is authorised. It is a speak-to-CRM interface — I drop a voice note, an image, a text message, and the bot routes it through the Chief agent in the Lyra backend. It handles the approval gates: every mutation (push draft, update stage, request phone action, post a tweet, like a tweet) renders as an inline keyboard with Approve / Skip / Edit buttons, and the action only fires on my tap. It also handles voice transcription (Whisper Small, on-device), image captioning (Qwen3-VL-32B on the local llama-vlm), and a force-reply edit flow for amending drafts in place.

@AltronisLyra_bot is the public-facing bot. Different token, different scope. It has no inbound DM authority and no agent interactivity. Its job is to post to the @sgaibiz Singapore AI Telegram channel on a schedule — a daily AI-news digest going out at 6pm, the occasional weekly summary on Sunday morning. It is the channel-publisher persona, not the agent.

Keeping them separate means a compromise of one cannot become a compromise of the other. The internal bot has full read+write to my CRM; the public bot can only sendMessage to a known chat ID. When I say "the Lyra bot stopped replying" I mean @LyraSG_bot; when I say "why is Lyra posting that to @sgaibiz" I mean @AltronisLyra_bot. The distinction is load-bearing.

Lyra Companion: The Same Agents, on the Phone

The companion app is the third interaction surface. It is a Kotlin/Compose Android app running on my Pixel 9, but architecturally it is not a separate product — it is a different UI onto the same backend.

The app fires a ContextTick to the Lyra backend every five minutes carrying twenty-eight fields: location, activity (still/walking/running), foreground apps with per-app duration, ambient audio class, calendar event active, battery state, screen state, ms-since-last-unlock, network type, Wi-Fi SSID, Bluetooth nearby devices, phone call active, DND mode, audio output, last chat agent talked to, next calendar event title and minutes-to. The backend persists every field to Firestore and threads them into the agent's context as "right now" snippets when relevant.

From the app itself I can chat with any of the five agents directly — Chief by default, Sales / Delivery / Finance / Governance via a thread picker. I can long-press the floating pill to start a phone-call mode (real-time STT → Chief → TTS with Gemini 2.5 in the Singapore-Aunty voice). I get a daily morning wrap notification at 6:30 that deep-links into the Today tab and auto-raises a notification-feed sheet so I can ask Lyra to summarise any captured notification.

Most importantly, the Chief and Sales agents on the backend can issue actions back to the phone via the request_phone_action tool. Supported actions today: set_reminder, compose_whatsapp (opens WhatsApp with pre-filled text, user taps Send — respects the no-auto-send rule), open_url_in_chrome, set_dnd, open_app, search_calendar, take_photo. The companion polls a pending_mobile_actions Firestore collection on each tick, raises an approval card on the Today tab, fires the Android intent on tap, posts the result back to Firestore for the agent to ack.

The Residual Mesh: What Off-the-Shelf Agents Still Do

I did not delete the mesh. Three pieces of it still earn their keep as feeders, and one piece — Claude Code — graduated out of the mesh into the primary-partner row entirely.

Claude Code deserves its own paragraph because of how much of this stack it actually built. It is the engineering partner I work with daily, mostly over Telegram (on the phone, between meetings, voice notes on the move), with the terminal CLI used when I am at my desk. It designed the Lyra agent registry, wrote most of the FastAPI backend, built the ContextTick schema and the tick-flow watchdog, shipped the X-poster pipeline this week, scaffolded the case-studies library, did the dead-code sweep, and reviews every architecture decision before I commit. It is not a passive code-completion tool; it is the senior engineer in the room, and the room is wherever my phone has signal. Anthropic's Opus 4.7 with the 1M-context window is the model I rely on for it.

OpenClaw shrunk from ninety-four cron skills to ten that actually fire. The survivors all serve Lyra: an SME AI Prospector that runs weekday mornings and drops scouted leads into the CRM; an Enterprise AI Reality Scout that surfaces market signals; a Dev Radar that flags new technical patterns; a Weekly Goal Planner that proposes work for the agent team every Monday; analytics scrapers for altronis.sg and seris.app. The rest got disabled when I realised they were either duplicating what Lyra now does natively or chasing alpha I was not going to act on.

Hermes is the one I would call "low profile and utility" now. It still has a gateway service running, but the active lead-scouting workflow that used to drop outreach drafts into ~/.agent-mesh/artifacts/manufacturing/ last fired in early April. When I need a chunk of research done on a prospect or a market, I still post a task to the mesh queue and Hermes picks it up. But its share of decision-making has dropped to near zero. The lessons learned from running it inform how the Lyra sub-agents are designed — its voice tuning, its review-before-send discipline, its no-external-send rule are all baked into the Lyra mutation gate.

Goose is the Telegram-reachable backup coder, running against the local Gemma 4 26B on the Strix Halo. It is what I use when Claude Code is rate-limited or unavailable. It is good for small fixes, not for architecture — that distinction matters and Goose is the first to admit it.

The Actual Scale

Concrete numbers as of right now: 43 active systemd user timers, 10 active OpenClaw cron jobs (of 94 total — the 84 disabled are quarantined but not deleted), about a dozen user-crontab entries for legacy crons that pre-date the systemd migration, around 25 active long-running services. About half of the timers are Lyra-namespaced — Sales digest at 6:45am, prospect nudge at 7:15am, deal-risk scan at 7:30am, follow-ups at 8am, Sales graduation at 9am, RFQ watcher every 15 min, daily LinkedIn drafter at 9am, daily Facebook drafter at 9:05am, evening sales wrap at 6pm. Connector pulls for calendar and sent-emails run every 10 and 30 minutes. Self-monitoring runs every 5 (stack watchdog) and 30 (tick-flow integration test).

The X-poster pipeline is the newest layer, shipped this week. A dedicated Chrome session on remote-debugging-port 9222 (API path declined, refused to pay), Playwright posts the tweet, drafter polls every 15 min, engagement pipeline likes/replies on the timeline every 30 min, daily caps at 10 likes and 3 replies, cookies auto-refresh at 3am SGT. Every action requires my approve tap in @LyraSG_bot.

Client Delivery: Two Production Builds (Anonymised)

Two client engagements run on top of this stack. The first is a document-OCR pipeline live in production — three stages (Nemotron-Parse layout extraction, Qwen 122B on NVIDIA NIM for semantic structuring, deterministic validator for sanity), eighteen of twenty seeded invoices passed against ground truth on the last run. Deployed on Azure App Service with an Entra app inside the client tenant.

The second is DocFlow, a phase-one build for the same client. ~SGD 7k initial engagement, SGD 1k/month retainer after signoff. Entra app registered, OneDrive wired, target signoff in three weeks. Stage-two structurer runs on my local Gemma 4 26B on llama-server port 8001 because the NIM path was too slow for the throughput target. A daily 7am sync-back cron pulls jobcodes from the client's Business Central.

Self-Monitoring: The Unglamorous Part

The part of AI-native infra that nobody writes about is the self-monitoring layer. Three pieces. A stack watchdog runs every five minutes and checks twelve critical user services — lyra-backend, neo-gateway, neo-cloudflared, llama-server-gemma, llama-vlm-bom, lyra-tg-bot, comfyui, openclaw-gateway, openclaw-node, hermes-gateway, goose-gateway, lyra-skill-server. On inactive it tries one silent restart; if that fails it pings my Telegram. A tick-flow integration test fires every 30 minutes — synthetic device posts a sentinel ContextTick with all 28 fields populated, the test reads it back from Firestore and asserts every field round-tripped. This caught a real silent schema-drift bug last week where the companion was sending 19 fields but the backend was quietly dropping 7. The drift went undetected for two days before this test was wired.

Hardware

The inference happens on local hardware. Strix Halo box: llama-server on 8001 serving Gemma 4 26B in 8-bit (128k context), llama-vlm on 8080 serving Qwen3-VL-32B in 4-bit, ComfyUI on 7860 for image generation with Qwen-Image + a 4-step Lightning LoRA (about 36 seconds warm for a 1024-pixel render), bge-m3 for embeddings, Surya for OCR, SearXNG for private web search. Claude Code (via Telegram primarily, terminal CLI when at desk) is the primary engineering partner driving Lyra's architecture and build-out. NVIDIA NIM and z.ai are the cloud bursting paths when a Lyra agent task genuinely needs frontier-tier model capability. The Strix electricity bill dominates the operating cost line.

What This Actually Proves

Am I making a million dollars a month from this? No. Am I closing Fortune-500 logos with it? No. What this stack proves is narrower and probably more useful. Every workflow above runs without me being in the loop until approval-tap time, the failure modes are visible because the operator and the engineer are the same person, and the cost of running the whole thing is dominated by the electricity bill on the Strix Halo box rather than a cloud spend line.

The more important thing is the journey. I did not get to this architecture by reading a thoughtful blog post about agent design — I got here by building the off-the-shelf mesh first, watching where it kept failing for my actual workflows, and incrementally moving the agent layer into something context-built. The bias to plaster more skills onto a generic agent is strong; the harder thing is to write the agent persona, the toolset, the data model, and the UI together as one product. Lyra is that product.

The question "show me agents running real end-to-end business workflows" usually pictures an enterprise pilot deck. The harder-to-articulate but more honest answer is: watch the solo operators and small-practice founders who are using agents for everything, because we cannot hide the failure the way an enterprise pilot can. If you are looking to evaluate the maturity of agent infra in production right now in mid-2026, that is the population to study, not the slideware.

Happy to deep-dive on any of the pieces above in a follow-up post if there is one that gets asked about often enough — most likely candidates are the Chief + sub-agent dispatch pattern, the ContextTick schema-drift detection, the request_phone_action loop with the companion app, or the dedicated-Chrome CDP pattern for the X-poster pipeline. Say which.

Disclosure: I build AI infrastructure for SG SME and enterprise clients under Altronis. The patterns described above are the same patterns I deploy for paying clients. This post is a practitioner inventory, not a marketing pitch — but if anything described here is interesting to your context, the link below is where I take that conversation.