When we are checkpointing the state dict to remote storage, we currently move entire tensors from GPU to CPU before uploading the bytes which is pretty memory intensive. Instead of moving entire ...
Thanks for pointing out the docs reference, I updated the patch to reword that section. There's a sentence right before the one you draw attention to which to me reads as another argument to change ...