Reduce GPU memory usage in the runtime.
#14
by
xiping
- opened
After adding 'with torch.no_grad():', memory can be reduced form 11.77G to 3.19G when batch=4, token=1024.
Thanks, it helped a lots! And it works for the rerank model as well.