Backend infrastructure for a warehouse automation platform coordinating 100+ robots across 5 facilities
Series A — Hybrid Nashville, TNAcme Robotics builds fleet coordination software for warehouse automation. Our platform manages task dispatch, obstacle detection, battery management, and real-time telemetry for autonomous mobile robots (AMRs) deployed at logistics facilities across the Southeast. Customers include 3PLs, regional grocery distributors, and a national auto parts chain.
The platform processes 40,000+ robot events per hour across a microservices architecture built on Python, ROS2, and Kafka. We're in the middle of a REST-to-event-driven migration that will carry us through the next 3x scale jump. We need a Senior Platform Engineer to own the hardest backend problems: event streaming reliability, multi-warehouse coordination, and the data pipelines that feed our operational dashboards.
This is a scope-expansion hire. We raised our Series A in January. The platform team is growing from 4 to 8 engineers over 12 months. You'll join as one of the first senior hires post-raise, with real ownership over platform decisions and direct input into the architecture roadmap.
You report to the VP of Engineering (Maria Chen). The platform team owns the scheduler, telemetry pipeline, and all fleet coordination APIs. You'll work across all three areas with meaningful ownership in at least one.
| Area | What You Own |
|---|---|
| Fleet scheduler | Priority-weighted task dispatch, route conflict resolution, drain-mode deploys |
| Event streaming | Kafka topic design, consumer group health, offset management, MQTT bridge reliability |
| Telemetry pipeline | Robot sensor ingestion, battery and obstacle event processing, time-series storage |
| API layer | REST and WebSocket endpoints consumed by the operator dashboard and mobile app |
| Observability | Latency SLOs, consumer lag alerts, scheduler throughput dashboards (Grafana + Prometheus) |
| On-call | Rotating on-call for fleet incidents. We average <2 pages/week, most resolved in <15 min. |
A robot that misses a dispatch event in a warehouse costs money immediately — pallets don't move, trucks miss their windows. We treat reliability with the same rigor as feature development. Every new subsystem ships with an SLO, a runbook, and a rollback path. We don't cut corners on observability because we've felt what happens when a consumer group silently falls behind at 2am.
Platform is a 4-person team right now. There's no committee that approves your architectural decisions. You'll own your subsystem end-to-end: design, implementation, deploy, and on-call. Senior means you're the last line of defense on your area — the team trusts your judgment.
We write ADRs for architectural decisions because six months later no one remembers why we chose the MQTT bridge over direct Kafka on robot firmware. Runbooks exist for every recurring incident type. If a problem recurs twice without a runbook, you write one. This isn't overhead — it's how we stay fast as we grow.
Post-mortems are blameless and mandatory for any P1. The question is never "who broke it" — it's "what let this happen and what prevents recurrence." We've published 11 post-mortems in the past 18 months. Most of them are boring because the fix was adding a metric we should have had already.
ROS2 experience is helpful but not required. The platform-side interfaces are well-abstracted. If you've built reliable event-driven systems at scale, you'll ramp on the robotics context quickly.
We're 18 months post-product, 6 months post-Series A. We have revenue, paying customers, and real operational load. We also have scaling work ahead of us that isn't optional — the Meridian warehouse expansion in Q3 is the hard deadline. The person we hire into this role will feel that pressure. Here's what each path looks like:
| Things Go Well | Things Get Hard | |
|---|---|---|
| Scale | Kafka migration complete before Meridian, smooth go-live | Migration slips, Meridian delayed, customer escalation |
| Team | Two open reqs fill fast, you onboard strong teammates | Hiring takes 4+ months, you carry more load longer |
| Your role | Clear ownership of event streaming, room to grow into lead | More incident response, less greenfield work |
| Comp | Raises tied to Series B milestone (12-18 months out) | Equity value depends on outcomes we're all working toward |
Warehouse automation is a durable market. The cost pressure on labor has not changed. We're not building toward an exit in 18 months — we're building a company. If you want to own a platform that matters to real operations and has room to grow with it, this is that role.
| Period | Focus | Success Looks Like |
|---|---|---|
| 30 days | Learn the scheduler, telemetry pipeline, and current Kafka migration state. Shadow on-call. Read all 14 platform ADRs. | You can explain our event ordering guarantees and where they're weak without looking anything up. |
| 60 days | Own one subsystem fully. Lead a sprint on the MQTT-Kafka bridge or the consumer lag alerting — your choice based on where you see the biggest gap. | You've made one meaningful architectural improvement and written the ADR for it. |
| 90 days | Drive the Meridian pre-launch readiness review. Identify any platform gaps between current state and the load profile of 60 new robots. | The Meridian readiness doc is written by you and signed off by the team. No surprises at go-live. |
| Category | Details |
|---|---|
| Must Have | 4+ years backend engineering. Hands-on experience with event-driven systems (Kafka, Kinesis, or equivalent). Python at production scale. PostgreSQL query optimization and schema design. You've been on-call and handled P1 incidents without hand-holding. |
| Strong Plus | Kafka operations experience (topic replication, consumer group management, offset reset). TimescaleDB or similar time-series DB. Kubernetes at operational depth (not just deployments). Any IoT or device-telemetry background. |
| Nice to Have | ROS2 or robotics middleware exposure. MQTT broker operation. Redis Streams. Terraform. We'll ramp you on anything not listed here. |
| Critical | You've inherited a system you didn't design, improved it without rewriting everything, and left it more maintainable than you found it. We're not looking for someone who wants to start over. |
We raised a Series A and have budget to hire properly. This is a market-rate role.
| Component | Details |
|---|---|
| Base salary | $155,000 – $185,000 depending on experience |
| Equity | 0.1% – 0.25% option grant. 4-year vest, 1-year cliff. Current preferred price is $2.40/share. |
| Benefits | Health/dental/vision (100% employee, 80% dependent). 401(k) with 4% match. $2,000 annual learning budget. |
| Location | Hybrid — Nashville, TN office 2-3 days/week. No fully remote option for this role. |
The equity is real but not a lottery ticket. We're building toward profitability, not a SPAC. If we execute on the next 24 months of the roadmap, the equity is meaningful. We won't pretend otherwise in either direction.
Interested? Send us a note about a hard infrastructure problem you've solved. We read every application and respond within 5 business days.
engineering@acmerobotics.example 615-555-0192