Nathan Stanley

Software Engineer with 10 years experience

Australian Australian Flag currently in Seattle, WA Green Card

Nathan Stanley

I build distributed LLM training systems and ML infrastructure at Amazon. I turn ambiguous problems into concrete tools and frameworks that engineers and scientists love to use. Currently working at the scale of 10k H200 GPUs on AWS. Always looking for my next challenge.

Amazon

Post-Training on Customer Data

Amazon SFAI Ā· 2025-2026 Ā· YouTube

ā€œMake Rufus learn from how customers actually use it.ā€

I took a broad directive to improve Rufus using real customer interactions and turned it into three parallel workstreams: building a secure training environment on AWS, developing training applications using Verl and Megatron-LM for 500B+ parameter models, and curating datasets from real customer traffic. I lead all three efforts and coordinate with science teams to pull it together. Today, a team of scientists uses this framework, infrastructure, and data pipeline to iterate on training recipes for different Rufus applications.

Post-Training on Customer Data

LLM Training Infrastructure

Amazon SFAI Ā· 2024-2025 Ā· YouTube

ā€œOur training runs keep failing. Fix it.ā€

Starting from a vague mandate to "improve training efficiency," I systematically debugged reliability issues across the entire stack for large-scale pretraining runs on 5,000+ GPUs. I built observability tooling, identified failure modes at the container, scheduler, and hardware levels, and implemented fixes that improved availability from 85% to 99%. These infrastructure patterns were adopted broadly across Amazon through contributions to AWS Batch, EC2, and an internal GPU pooling platform.

LLM Training Infrastructure

Rufus Studio

Amazon SFAI Ā· 2023-2024

ā€œPrompt changes take too long. Do something about it.ā€

Tasked with improving prompt engineering velocity, I designed and built an internal platform that mirrors production with the ability to edit prompts deep in the inference stack. The tool provides a full-featured IDE experience for prompt template editing with live evaluation. I led a team of 6 engineers to build it, reached 300+ weekly active users in the first month, and reduced prompt deployment time from 17 days to 3 days. Rufus Studio has since become the canonical platform for all Rufus internal tooling.

Rufus Studio

Distribution Center Technology

Amazon DCTech Ā· 2021-2022

ā€œWe're launching in 4 months and nothing can handle the load.ā€

As scalability lead for Amazon Grocery distribution centers, I discovered that critical APIs couldn't meet TPS targets due to architectural bottlenecks in downstream services. I led a cross-team war room, redesigned loading patterns using parallel fanout and caching, and achieved 20x throughput and 8x latency improvements across 15 APIs. The parallel loading library I built was adopted by 15+ teams across Amazon. I also designed a CQRS-based location recommendation system to handle long-term scale.

DC Tech Infrastructure

Non-Prime Customer Experience

Amazon Retail Ā· 2019-2020

ā€œHow do we convert more shoppers into Prime members?ā€

Led large-scale A/B experimentation on Amazon product pages to optimize the shopping experience for non-Prime customers. Designed and analyzed experiments across millions of daily sessions, identifying high-impact changes to pricing display, shipping messaging, and conversion funnels. The changes I drove generated over $50M in annualized profit through improved conversion rates and Prime subscription growth.

a/b testing button spacing
is not for me

MiClub

MiMembership

2017-2019 Ā· Website

ā€œGolf clubs need modern membership software.ā€

Over two years, I worked with two other developers to build the premier golf membership management system in Australia. We solved complex challenges including smart entity-based search, design system consistency, ORM performance tuning, MySQL InnoDB optimizations, offline mode support, and integrations with Golf Australia's handicapping systems. The platform now serves 500+ clubs.

MiMembership

Pace of Play

2019 Ā· Website

ā€œClubs can't figure out who's causing slow play.ā€

Hearing that clubs consistently struggled to maintain player speed and identify slowdowns, I invented a novel solution. We tracked players via the scoring app GPS and built a custom algorithm to identify slow players. The frontend resembled a video player over Google Maps where administrators could scrub through any day's timeline and observe player movement. GPS smoothing algorithms provided high-quality signal, and summary reports helped clubs address problematic patterns.

Pace of Play

MiScore Scoring App

2018-2019 Ā· WebsiteYouTube

ā€œDigital scoring apps are about to be allowed in competition.ā€

When Golf Australia announced they would allow digital scoring in official competitions, I convinced the company to build our own mobile app. The app reached top 10 in the Australian App Store sports category and exploded in popularity during COVID-2020 when golf became one of the few permitted outdoor activities.

MiScore Scoring App

Golf Platform Performance

2015-2019 Ā· Website

ā€œThe legacy system is slow. Make it fast.ā€

As platform lead, I profiled and optimized MiClub's 2M+ line legacy golf management codebase across the full stack: database engine configuration, query profiling, index optimization, webapp threading, and page load performance. I also built a performance tracker for MiScore to monitor the quality and speed of our OCR scorecard scanning system, reducing scan times from 5 seconds to under 1 second.

MiClub Golf

Side Projects

Crewly

Ā· Website

A web app for planning daily boat trips. Track crew members, manage guest lists, and check weather conditions all in one place. Built to simplify the logistics of coordinating group outings on the water.

Crewly

Ripper

Stealth AI project for real estate agents Ā·

Coming soon.

lets see what that $200 claude
subscription can really do

Experience

Senior Software EngineerAmazon2019 - Present
Software EngineerMiClub Golf2015 - 2019

Education

M.Eng. Software & Machine Learning
University of Western Australia Ā· Distinction (First Class Honours)

B.Eng. Software Engineering & Computer Science
University of Western Australia Ā· Distinction (First Class Honours)

High School (Western Australia) Ā· ATAR 99.90 (top 0.1% percentile across math and science)

When I'm Not Coding

šŸ‹ļøā›µšŸƒšŸ‚šŸŽ¾

Contact