Reduce GPU memory usage in the runtime.

#14

by xiping - opened Jun 9

←

xiping

Jun 9

After adding 'with torch.no_grad():', memory can be reduced form 11.77G to 3.19G when batch=4, token=1024.

Jun 24

•

Thanks, it helped a lots! And it works for the rerank model as well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment