C/C++教程

one of the variables needed for gradient computation has been modified by an inplace operation

本文主要是介绍one of the variables needed for gradient computation has been modified by an inplace operation,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!

记录一个pytorch多卡训练遇到的bug
报错如下:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 30; expected version 29 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

这个是多卡训练时候遇到的,单卡是一切正常的

先按网上的提示,在报错的代码前加上with torch.autograd.set_detect_anomaly(True):语句,之后它会把挂掉时候的栈显示出来,我的打出来是在batchNorm那里出的问题

搜索得到一个方案:https://discuss.pytorch.org/t/ddp-sync-batch-norm-gradient-computation-modified/82847/5

解决方法就是在DDP那里加上一个broadcast_buffers=False参数

这篇关于one of the variables needed for gradient computation has been modified by an inplace operation的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!