Reduce GPU memory usage in the runtime.

#14

After adding 'with torch.no_grad():', memory can be reduced form 11.77G to 3.19G when batch=4, token=1024.

Thanks, it helped a lots! And it works for the rerank model as well.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment