tech-at-instacart

Instacart Engineering

Follow publication

Distributed Machine Learning at Instacart

Han Li
tech-at-instacart
Published in
9 min readMar 17, 2023

--

System Architecture

Fig 1: Distributed ML Application in Development & Production on Ray Cluster
Fig 1: Distributed ML Application in Development & Production on Ray Cluster
Fig 1a: Connect development environment with Ray Cluster
Fig 1b: Automated containerized application running on Ray Cluster
Fig 1c: Isolated Python environments between different Ray Clusters

Case Study: Parallel Fulfillment ML Jobs

Fig 2: Some common scenarios of Fulfillment ML in Instacart

Previous Solutions & Limitations

Fig 3: Our previous system of parallel zone-level model training
Fig 4: Examples of low utilized service containers (left) and idle task queues (right)
Fig 5: Examples of a very busy task queue hosting too many tasks

Improvements by New System

Fig 6: Architecture Overview of distributed Fulfillment ML workflows hosted on Ray Cluster
Fig 7: A diagram to illustrate the idea of workspace isolation between different models
Fig 8: Before(left) and after(right) CPU utilization of the same model training the same zones.
Fig 9: Code example to convert an existing Forecast class object into a Ray Actor object

Learnings & Future Work

--

--

No responses yet

Write a response