ROUTD: Scaling Last-Mile Logistics

How we rebuilt a multi-tenant SaaS logistics platform for last-mile delivery from the ground up in 3 months, cutting cloud costs by 77%, achieving 99.8% uptime, and scaling to $20M+ in annual orders. A full platform rebuild with dedicated engineering, cloud architecture, and real-time tracking across mobile and web.

The Problem

ROUTD had a real business. $20M+ in annual orders. Customers like Compass Group and Ole & Steen depending on them for last-mile delivery across the UK. But after 1.5 years of iterative development, the codebase was a liability.

The system worked, barely. Real-time tracking was flaky. Route optimization was slow. Multi-tenancy was bolted on. Cloud costs were climbing. The team knew they needed a full rebuild, but they couldn't afford to stop shipping.

They needed a team that could rebuild the platform without losing velocity. Not consultants who would write a report. Builders who would own the code.

The Architecture

We embedded 8 engineers and rebuilt the entire platform in 3 months. Not a rewrite-in-place. A full ground-up redesign with production traffic running in parallel.

Stack Choices

Backend: .NET Web API + Azure Functions

We chose .NET because the team already had C# experience, and we needed something that could handle synchronous request/response at scale. The Web API layer handles all CRUD operations, auth, and tenant isolation.

Azure Functions handle the expensive operations: route optimization, ETA calculation with third-party APIs, and notification delivery. These workloads are bursty. We don't need a route optimizer running 24/7. Serverless made sense here.

Frontend: React Dashboard + React Native Mobile

The admin dashboard is React with TypeScript. Real-time driver tracking uses WebSockets for live position updates. The mobile app is React Native, a single codebase for iOS and Android. That mattered because the team was small and couldn't afford separate native engineers.

Database: Azure SQL with Multi-Tenancy

We implemented multi-tenancy at the database level with global query filters. Every table has a TenantId column. Every query automatically includes WHERE TenantId = @CurrentTenant via EF Core's global filters. This prevents data leakage between customers without maintaining separate databases.

Row-level security in SQL Server adds a second layer of defense. Even if the application layer fails, the database won't return cross-tenant data.

Timeline & Milestones

Here's what 3 months of dedicated engineering actually looks like when you're rebuilding a production logistics platform with live traffic.

The key to a 3-month timeline was running old and new systems in parallel. Customers never experienced downtime. We migrated tenants one at a time, validated data integrity, then cut over. The last tenant switched on a Tuesday afternoon with zero incidents.

Key Engineering Decisions

1. Multi-Tenant Architecture Without the Pain

Most SaaS platforms face the multi-tenancy question: separate databases per customer, or shared database with tenant isolation?

Separate databases are safer but operationally expensive. Shared databases are efficient but risky. One bad query can leak data across customers.

We chose shared database with two layers of isolation: application-level global filters and database-level row-level security.

In EF Core, we applied a global filter to every entity:

modelBuilder.Entity<Order>().HasQueryFilter(o => o.TenantId == _currentTenant.Id);

This means every query automatically includes the tenant filter. Developers can't forget to add it. They can't accidentally bypass it.

At the database level, we added SQL Server row-level security policies:

CREATE SECURITY POLICY TenantSecurityPolicy
ADD FILTER PREDICATE dbo.fn_TenantAccessPredicate(TenantId) ON dbo.Orders
WITH (STATE = ON);

Even if someone writes raw SQL, the database won't return cross-tenant data.

This approach gave us the operational simplicity of a shared database with the security guarantees of tenant isolation.

2. Cost Optimization at Scale

The previous infrastructure was running ~$3K/month in cloud costs. We rebuilt it for under $700/month while handling 30,000+ orders and 60,000+ API calls monthly.

How?

Serverless where it matters. Route optimization doesn't need to run 24/7. It runs when a new delivery batch comes in, maybe 50 times a day. That's perfect for Azure Functions. We pay for execution time, not idle server hours.

Right-sized databases. We moved from a General Purpose tier to a Basic tier for non-production environments. Production stayed on Standard tier but with auto-pause enabled during low-traffic hours.

CDN for static assets. Images, maps, driver profile photos. All served from Azure CDN with aggressive caching. Reduced bandwidth costs by 60%.

Third-party API batching. Instead of calling the ETA API for every address update, we batch requests every 2 minutes. Reduced API costs from $800/month to $200/month.

This isn't about being cheap. It's about being intentional. When you're processing $20M in orders, cloud costs should be a rounding error, not a line item. This is the kind of cloud architecture work that compounds. Every dollar saved on infrastructure is a dollar that goes toward growth.

3. Real-Time Tracking Without the Database Bottleneck

The original system wrote every GPS update directly to the database. When you have 30 drivers on the road, each sending location updates every 10 seconds, that's 180 writes per minute just for location tracking.

We decoupled real-time display from durable storage.

Driver apps send GPS updates to a SignalR hub via WebSockets. The hub validates the tenant, broadcasts the update to all connected dashboards, and writes the latest position to Redis.

Every 5 minutes, a background job reads from Redis and batch-inserts location history into SQL. We reduced database writes by 95% while maintaining sub-second latency for live tracking.

Admin dashboards connect to the SignalR hub and receive location updates as they happen. No polling. No stale data. Just real-time movement on a map.

4. Route Optimization Without Vendor Lock-In

The previous system used a third-party route optimization API that cost $0.10 per route calculation. With 30K orders per month, that's $3K just for routing.

We built a custom route optimizer using the Traveling Salesman Problem (TSP) with a greedy nearest-neighbor heuristic. Not perfect, but 80% time savings on route planning compared to manual assignment.

For complex routes (15+ stops), we fall back to the third-party API. For simple routes (3-8 stops), we use our own algorithm. This hybrid approach reduced third-party API costs by 70% while maintaining quality.

The optimizer runs as an Azure Function triggered when a new delivery batch is created. It takes a list of addresses, calculates optimal visit order, estimates ETAs based on traffic data, and returns a route manifest.

Average execution time: 800ms cold start, 200ms warm start. Cost: ~$15/month.

The Results

We shipped the rebuilt platform in 3 months with a team of 8 engineers. The new system has been running in production for over a year with:

$20M+ in orders processed annually
30,000+ orders delivered per month
60,000+ API calls per month
Cloud costs under $700/month (down from $3K/month)
Zero cross-tenant data leakage incidents
99.8% uptime (excluding planned maintenance)

ROUTD is now used by billion-dollar companies including Compass Group and Ole & Steen for their last-mile delivery operations across the UK.

Before & After: By the Numbers

The best way to understand the impact of a platform rebuild is to compare the numbers side by side. Here's what changed when we moved from the legacy system to the new architecture.

Beyond the Code

We didn't just write software. We embedded with the ROUTD team and worked on:

Pricing strategy. We modeled customer pricing based on order volume, data growth, third-party API costs, and infrastructure spend. This helped ROUTD move from flat-rate pricing to usage-based pricing, improving margins by 30%.

Monitoring and chaos engineering. We set up Grafana dashboards with SLA-driven alerts. We ran chaos experiments: killing database connections, simulating API timeouts, injecting latency into WebSocket connections. All to validate failure modes before customers hit them.

Business guidance. We consulted on HCI principles for driver UX, advised on GDPR compliance for location data retention, and analyzed user behavior patterns to improve onboarding flows.

Sales enablement. We helped the sales team demo the platform by building a sandbox environment with realistic test data. We also created technical sales collateral explaining the multi-tenant architecture to enterprise buyers.

Build vs. buy decisions. We identified areas where renting third-party services (SMS delivery, push notifications, map rendering) saved tens of thousands in unnecessary development costs.

This is what separates dedicated engineering from traditional consulting. A consulting firm writes a 60-page strategy deck, presents it to your leadership team, and walks away. You're left with a PDF and no code. We don't operate that way. Our engineers sit in your Slack, push to your repo, and own the outcomes alongside your team. When something breaks at 2am, we're in the incident channel, not writing a post-mortem from the outside. The goal isn't to advise. It's to ship. And when the engagement ends, you keep everything: the code, the architecture, the CI/CD pipelines, and a team that now knows how to maintain it.

What We'd Do Differently

If we rebuilt this today, we'd consider:

Event sourcing for order state. Right now, order status is a single column that gets updated in place. If we tracked order state as an append-only event log, we'd get built-in audit trails and better debugging when customers ask "why was this order marked as delivered at 3pm when the driver says they delivered it at 2pm?"

GraphQL instead of REST. The dashboard makes 8-10 REST calls on page load to fetch orders, drivers, routes, and analytics. A single GraphQL query could fetch everything in one round-trip.

Edge functions for geofencing. We run geofence calculations (is the driver inside the delivery zone?) in Azure Functions. Moving this to Cloudflare Workers or Deno Deploy would reduce latency for real-time alerts.

But these are optimizations, not blockers. The platform works. It scales. It makes money.

The Takeaway

ROUTD had a working product, paying customers, and a real problem. The codebase couldn't scale. The team couldn't ship fast enough.

We didn't spend 6 months writing a strategy deck. We embedded 8 engineers, rebuilt the platform in 3 months, and shipped it to production. That's the difference between dedicated engineering and advice. We write the code, own the architecture, and stay until it's running.

If your platform is hitting a ceiling and you need engineers who build, not consultants who recommend, join us on Discord and tell us what you're working on. Or check out our cloud architecture and dedicated engineering services to see how we work.