10 weeks. Real data. No slides. No theory.
You will run NCCL benchmarks on GCP GPU instances — scripts, profiling data, and access provided — recover MFU, debug AllReduce latency, and build a public benchmark portfolio on GitHub.
Most GPU engineers are optimizing the wrong thing. They tune hyperparameters. They upgrade hardware. They never look at what's actually killing cluster efficiency.
10 weeks. Real GCP GPU data. The systems skills that separate GPU engineers from GPU infrastructure engineers.