Senior Platform Engineer

The Opportunity

Acme Robotics builds fleet coordination software for warehouse automation. Our platform manages task dispatch, obstacle detection, battery management, and real-time telemetry for autonomous mobile robots (AMRs) deployed at logistics facilities across the Southeast. Customers include 3PLs, regional grocery distributors, and a national auto parts chain.

The platform processes 40,000+ robot events per hour across a microservices architecture built on Python, ROS2, and Kafka. We're in the middle of a REST-to-event-driven migration that will carry us through the next 3x scale jump. We need a Senior Platform Engineer to own the hardest backend problems: event streaming reliability, multi-warehouse coordination, and the data pipelines that feed our operational dashboards.

This is a scope-expansion hire. We raised our Series A in January. The platform team is growing from 4 to 8 engineers over 12 months. You'll join as one of the first senior hires post-raise, with real ownership over platform decisions and direct input into the architecture roadmap.

The Role

You report to the VP of Engineering (Maria Chen). The platform team owns the scheduler, telemetry pipeline, and all fleet coordination APIs. You'll work across all three areas with meaningful ownership in at least one.

Area	What You Own
Fleet scheduler	Priority-weighted task dispatch, route conflict resolution, drain-mode deploys
Event streaming	Kafka topic design, consumer group health, offset management, MQTT bridge reliability
Telemetry pipeline	Robot sensor ingestion, battery and obstacle event processing, time-series storage
API layer	REST and WebSocket endpoints consumed by the operator dashboard and mobile app
Observability	Latency SLOs, consumer lag alerts, scheduler throughput dashboards (Grafana + Prometheus)
On-call	Rotating on-call for fleet incidents. We average <2 pages/week, most resolved in <15 min.

What This Is NOT

Not a robotics firmware role. Our robots run ROS2 but the firmware team owns that layer. You own the cloud-side platform, not the embedded stack.
Not a data science role. We do analytics but you're building the data infrastructure, not running models.
Not a pure SRE role. You'll write and own significant feature code, not just pipelines and infra.

How We Work

Reliability as a Feature

A robot that misses a dispatch event in a warehouse costs money immediately — pallets don't move, trucks miss their windows. We treat reliability with the same rigor as feature development. Every new subsystem ships with an SLO, a runbook, and a rollback path. We don't cut corners on observability because we've felt what happens when a consumer group silently falls behind at 2am.

Small Team, Real Ownership

Platform is a 4-person team right now. There's no committee that approves your architectural decisions. You'll own your subsystem end-to-end: design, implementation, deploy, and on-call. Senior means you're the last line of defense on your area — the team trusts your judgment.

Documentation That Earns Its Keep

We write ADRs for architectural decisions because six months later no one remembers why we chose the MQTT bridge over direct Kafka on robot firmware. Runbooks exist for every recurring incident type. If a problem recurs twice without a runbook, you write one. This isn't overhead — it's how we stay fast as we grow.

Incident Culture

Post-mortems are blameless and mandatory for any P1. The question is never "who broke it" — it's "what let this happen and what prevents recurrence." We've published 11 post-mortems in the past 18 months. Most of them are boring because the fix was adding a metric we should have had already.

Tech Stack

Core Backend

Python 3.12, FastAPI, Celery, Pydantic

Robotics Middleware

ROS2 Humble, MQTT (Eclipse Mosquitto), DDS

Data Layer

PostgreSQL 16, Redis 7, TimescaleDB (telemetry)

Event Streaming

Apache Kafka (Strimzi on k8s), MQTT-Kafka bridge

Infrastructure

Kubernetes (EKS), Terraform, GitHub Actions

Observability

Prometheus, Grafana, PagerDuty, Sentry

ROS2 experience is helpful but not required. The platform-side interfaces are well-abstracted. If you've built reliable event-driven systems at scale, you'll ramp on the robotics context quickly.

Where You Fit

Platform Team (Current)

VP Engineering (Maria Chen) Platform Team Lead (Jordan Park) Senior Backend Engineer (Alex Rivera) Senior Platform Engineer (You) Backend Engineer (1 open req)

Platform Team (6-Month Target)

VP Engineering (Maria Chen) Platform Team Lead (Jordan Park) Senior Platform Engineer (You) — Event Streaming lead Senior Backend Engineer (Alex Rivera) — Scheduler lead Backend Engineer x2 Data Engineer (new hire)

The Honest Version

We're 18 months post-product, 6 months post-Series A. We have revenue, paying customers, and real operational load. We also have scaling work ahead of us that isn't optional — the Meridian warehouse expansion in Q3 is the hard deadline. The person we hire into this role will feel that pressure. Here's what each path looks like:

	Things Go Well	Things Get Hard
Scale	Kafka migration complete before Meridian, smooth go-live	Migration slips, Meridian delayed, customer escalation
Team	Two open reqs fill fast, you onboard strong teammates	Hiring takes 4+ months, you carry more load longer
Your role	Clear ownership of event streaming, room to grow into lead	More incident response, less greenfield work
Comp	Raises tied to Series B milestone (12-18 months out)	Equity value depends on outcomes we're all working toward

Warehouse automation is a durable market. The cost pressure on labor has not changed. We're not building toward an exit in 18 months — we're building a company. If you want to own a platform that matters to real operations and has room to grow with it, this is that role.

30 / 60 / 90 Day Plan

Period	Focus	Success Looks Like
30 days	Learn the scheduler, telemetry pipeline, and current Kafka migration state. Shadow on-call. Read all 14 platform ADRs.	You can explain our event ordering guarantees and where they're weak without looking anything up.
60 days	Own one subsystem fully. Lead a sprint on the MQTT-Kafka bridge or the consumer lag alerting — your choice based on where you see the biggest gap.	You've made one meaningful architectural improvement and written the ADR for it.
90 days	Drive the Meridian pre-launch readiness review. Identify any platform gaps between current state and the load profile of 60 new robots.	The Meridian readiness doc is written by you and signed off by the team. No surprises at go-live.

What We're Looking For

Category	Details
Must Have	4+ years backend engineering. Hands-on experience with event-driven systems (Kafka, Kinesis, or equivalent). Python at production scale. PostgreSQL query optimization and schema design. You've been on-call and handled P1 incidents without hand-holding.
Strong Plus	Kafka operations experience (topic replication, consumer group management, offset reset). TimescaleDB or similar time-series DB. Kubernetes at operational depth (not just deployments). Any IoT or device-telemetry background.
Nice to Have	ROS2 or robotics middleware exposure. MQTT broker operation. Redis Streams. Terraform. We'll ramp you on anything not listed here.
Critical	You've inherited a system you didn't design, improved it without rewriting everything, and left it more maintainable than you found it. We're not looking for someone who wants to start over.

Compensation

We raised a Series A and have budget to hire properly. This is a market-rate role.

Component	Details
Base salary	$155,000 – $185,000 depending on experience
Equity	0.1% – 0.25% option grant. 4-year vest, 1-year cliff. Current preferred price is $2.40/share.
Benefits	Health/dental/vision (100% employee, 80% dependent). 401(k) with 4% match. $2,000 annual learning budget.
Location	Hybrid — Nashville, TN office 2-3 days/week. No fully remote option for this role.

The equity is real but not a lottery ticket. We're building toward profitability, not a SPAC. If we execute on the next 24 months of the roadmap, the equity is meaningful. We won't pretend otherwise in either direction.

Next Steps

Interested? Send us a note about a hard infrastructure problem you've solved. We read every application and respond within 5 business days.

engineering@acmerobotics.example 615-555-0192