The 80/20 Rule of ML Costs
Training gets all the attention, but it's 20% of total cost. The other 80%: model serving (30%), monitoring and evaluation (15%), data pipelines (20%), retraining (15%).
Optimization: serve models on CPU where latency allows (10x cheaper than GPU serving), batch predictions for non-real-time use cases, implement model compression (2-5x inference cost reduction), and set retraining triggers based on drift (not schedules).