What: Model layers are split across GPU’s Each GPU processes a portion of the model Requires more complex sync code Useful for incredibly large models that don’t fit on a single GPU