例如我安装的了两个版本的cuda,分别为cuda9.0和cuda10.2。
两个cuda的目录分别为
#cuda9 /data/home/cuiaihao/cuda9 #cuda10.2 /data/home/cuiaihao/cuda10
当我在使用cuda9时,我的环境变量中为
export PATH=/data/home/cuiaihao/cuda9/bin:$PATH export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda9/lib64:$LD_LIBRARY_PATH #export PATH=/data/home/cuiaihao/cuda10.2/bin${PATH:+:${PATH}} #export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
当我在使用cuda10.2时,我的环境变量为
#export PATH=/data/home/cuiaihao/cuda9/bin:$PATH #export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda9/lib64:$LD_LIBRARY_PATH export PATH=/data/home/cuiaihao/cuda10.2/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/data/home/cuiaihao/cuda10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
环境变量可以通过vim进行编辑,
vim ~/.bashrc
编辑结束后运行,使环境变量生效
source ~/.bashrc
修改成功后可以通过nvcc --version来查看当前使用的cuda版本
nvcc --version
比如我的设置好cuda10.2后,nvcc --version输出为
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89
标准安装
为了性能和完整的功能,建议通过CUDA和c++扩展来安装Apex $ git clone https://github.com/NVIDIA/apex $ cd apex $ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ Apex 同样支持 Python-only build (required with Pytorch 0.4) via $ pip install -v --no-cache-dir ./
我遇到的问题
如果你在安装apex时报错,比如出现
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 10.2. after this pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ i get this error torch.version = 1.6.0 /tmp/pip-req-build-l3l15eo8/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies! warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!") Compiling cuda extensions with nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176 from /data/home/cuiaihao/cuda9/bin Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-req-build-l3l15eo8/setup.py", line 171, in <module> check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME) File "/tmp/pip-req-build-l3l15eo8/setup.py", line 106, in check_cuda_torch_binary_vs_bare_metal "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. " RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 10.2. In some cases, a minor-version mismatch will not cause later errors: https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. You can try commenting out this check (at your own risk). Running setup.py install for apex ... error
但是此时我的cuda使用的是10.2
首先要确定我们目前Linux下使用的cuda版本和pytorch中cudatoolkit的版本是否相同
1.如果确实不同
第一:调整cuda,下载合适的cuda版本
第二:重新安装pytorch和适配cuda的cudatoolkit版本
2.如果使用的cuda和cudatoolkit相同,但是安装时输出的nvcc和自己在命令行输入nvcc --version的结果不同的话。
可以手动设置cuda_home位置为当前使用的cuda目录位置
$ git clone https://github.com/NVIDIA/apex $ cd apex $ CUDA_HOME=/data/home/cuiaihao/cuda10.2 pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
这样就可以成功安装
running install_egg_info running egg_info writing apex.egg-info/PKG-INFO writing dependency_links to apex.egg-info/dependency_links.txt writing top-level names to apex.egg-info/top_level.txt reading manifest file 'apex.egg-info/SOURCES.txt' adding license file 'LICENSE' writing manifest file 'apex.egg-info/SOURCES.txt' Copying apex.egg-info to /data/home/cuiaihao/.conda/envs/cascade-stereo/lib/python3.6/site-packages/apex-0.1-py3.6.egg-info running install_scripts writing list of installed files to '/tmp/pip-record-vp8iy4e8/install-record.txt' Running setup.py install for apex ... done Successfully installed apex-0.1
nice,牛的牛的