The Role · Platform Overview · Architecture Decisions
Acme Robotics

Senior Platform Engineer

Backend infrastructure for a warehouse automation platform coordinating 100+ robots across 5 facilities

Series A — Hybrid Nashville, TN

The Opportunity

Acme Robotics builds fleet coordination software for warehouse automation. Our platform manages task dispatch, obstacle detection, battery management, and real-time telemetry for autonomous mobile robots (AMRs) deployed at logistics facilities across the Southeast. Customers include 3PLs, regional grocery distributors, and a national auto parts chain.

The platform processes 40,000+ robot events per hour across a microservices architecture built on Python, ROS2, and Kafka. We're in the middle of a REST-to-event-driven migration that will carry us through the next 3x scale jump. We need a Senior Platform Engineer to own the hardest backend problems: event streaming reliability, multi-warehouse coordination, and the data pipelines that feed our operational dashboards.

This is a scope-expansion hire. We raised our Series A in January. The platform team is growing from 4 to 8 engineers over 12 months. You'll join as one of the first senior hires post-raise, with real ownership over platform decisions and direct input into the architecture roadmap.

The Role

You report to the VP of Engineering (Maria Chen). The platform team owns the scheduler, telemetry pipeline, and all fleet coordination APIs. You'll work across all three areas with meaningful ownership in at least one.

AreaWhat You Own
Fleet schedulerPriority-weighted task dispatch, route conflict resolution, drain-mode deploys
Event streamingKafka topic design, consumer group health, offset management, MQTT bridge reliability
Telemetry pipelineRobot sensor ingestion, battery and obstacle event processing, time-series storage
API layerREST and WebSocket endpoints consumed by the operator dashboard and mobile app
ObservabilityLatency SLOs, consumer lag alerts, scheduler throughput dashboards (Grafana + Prometheus)
On-callRotating on-call for fleet incidents. We average <2 pages/week, most resolved in <15 min.

What This Is NOT

How We Work

Reliability as a Feature

A robot that misses a dispatch event in a warehouse costs money immediately — pallets don't move, trucks miss their windows. We treat reliability with the same rigor as feature development. Every new subsystem ships with an SLO, a runbook, and a rollback path. We don't cut corners on observability because we've felt what happens when a consumer group silently falls behind at 2am.

Small Team, Real Ownership

Platform is a 4-person team right now. There's no committee that approves your architectural decisions. You'll own your subsystem end-to-end: design, implementation, deploy, and on-call. Senior means you're the last line of defense on your area — the team trusts your judgment.

Documentation That Earns Its Keep

We write ADRs for architectural decisions because six months later no one remembers why we chose the MQTT bridge over direct Kafka on robot firmware. Runbooks exist for every recurring incident type. If a problem recurs twice without a runbook, you write one. This isn't overhead — it's how we stay fast as we grow.

Incident Culture

Post-mortems are blameless and mandatory for any P1. The question is never "who broke it" — it's "what let this happen and what prevents recurrence." We've published 11 post-mortems in the past 18 months. Most of them are boring because the fix was adding a metric we should have had already.

Tech Stack

Core Backend
Python 3.12, FastAPI, Celery, Pydantic
Robotics Middleware
ROS2 Humble, MQTT (Eclipse Mosquitto), DDS
Data Layer
PostgreSQL 16, Redis 7, TimescaleDB (telemetry)
Event Streaming
Apache Kafka (Strimzi on k8s), MQTT-Kafka bridge
Infrastructure
Kubernetes (EKS), Terraform, GitHub Actions
Observability
Prometheus, Grafana, PagerDuty, Sentry

ROS2 experience is helpful but not required. The platform-side interfaces are well-abstracted. If you've built reliable event-driven systems at scale, you'll ramp on the robotics context quickly.

Where You Fit

Platform Team (Current)

VP Engineering (Maria Chen) Platform Team Lead (Jordan Park) Senior Backend Engineer (Alex Rivera) Senior Platform Engineer (You) Backend Engineer (1 open req)

Platform Team (6-Month Target)

VP Engineering (Maria Chen) Platform Team Lead (Jordan Park) Senior Platform Engineer (You) — Event Streaming lead Senior Backend Engineer (Alex Rivera) — Scheduler lead Backend Engineer x2 Data Engineer (new hire)

The Honest Version

We're 18 months post-product, 6 months post-Series A. We have revenue, paying customers, and real operational load. We also have scaling work ahead of us that isn't optional — the Meridian warehouse expansion in Q3 is the hard deadline. The person we hire into this role will feel that pressure. Here's what each path looks like:

Things Go WellThings Get Hard
ScaleKafka migration complete before Meridian, smooth go-liveMigration slips, Meridian delayed, customer escalation
TeamTwo open reqs fill fast, you onboard strong teammatesHiring takes 4+ months, you carry more load longer
Your roleClear ownership of event streaming, room to grow into leadMore incident response, less greenfield work
CompRaises tied to Series B milestone (12-18 months out)Equity value depends on outcomes we're all working toward

Warehouse automation is a durable market. The cost pressure on labor has not changed. We're not building toward an exit in 18 months — we're building a company. If you want to own a platform that matters to real operations and has room to grow with it, this is that role.

30 / 60 / 90 Day Plan

PeriodFocusSuccess Looks Like
30 daysLearn the scheduler, telemetry pipeline, and current Kafka migration state. Shadow on-call. Read all 14 platform ADRs.You can explain our event ordering guarantees and where they're weak without looking anything up.
60 daysOwn one subsystem fully. Lead a sprint on the MQTT-Kafka bridge or the consumer lag alerting — your choice based on where you see the biggest gap.You've made one meaningful architectural improvement and written the ADR for it.
90 daysDrive the Meridian pre-launch readiness review. Identify any platform gaps between current state and the load profile of 60 new robots.The Meridian readiness doc is written by you and signed off by the team. No surprises at go-live.

What We're Looking For

CategoryDetails
Must Have4+ years backend engineering. Hands-on experience with event-driven systems (Kafka, Kinesis, or equivalent). Python at production scale. PostgreSQL query optimization and schema design. You've been on-call and handled P1 incidents without hand-holding.
Strong PlusKafka operations experience (topic replication, consumer group management, offset reset). TimescaleDB or similar time-series DB. Kubernetes at operational depth (not just deployments). Any IoT or device-telemetry background.
Nice to HaveROS2 or robotics middleware exposure. MQTT broker operation. Redis Streams. Terraform. We'll ramp you on anything not listed here.
CriticalYou've inherited a system you didn't design, improved it without rewriting everything, and left it more maintainable than you found it. We're not looking for someone who wants to start over.

Compensation

We raised a Series A and have budget to hire properly. This is a market-rate role.

ComponentDetails
Base salary$155,000 – $185,000 depending on experience
Equity0.1% – 0.25% option grant. 4-year vest, 1-year cliff. Current preferred price is $2.40/share.
BenefitsHealth/dental/vision (100% employee, 80% dependent). 401(k) with 4% match. $2,000 annual learning budget.
LocationHybrid — Nashville, TN office 2-3 days/week. No fully remote option for this role.

The equity is real but not a lottery ticket. We're building toward profitability, not a SPAC. If we execute on the next 24 months of the roadmap, the equity is meaningful. We won't pretend otherwise in either direction.

Next Steps

Interested? Send us a note about a hard infrastructure problem you've solved. We read every application and respond within 5 business days.

engineering@acmerobotics.example 615-555-0192