To config a computer for deep learning or deep reinforcement learning, we install cuda, cudnn, torch and so on.
There may be some problems during install this software. I record my process of configuring the DL environment. My
computer is a DELL PRECISION TOWER 7810 working station with Ubuntu 16.04 OS and Quadro VGA controller with M5000 GPU.
All you need to install conda
is here.
This tutorial is in Chinese for your reference.
To increase the speed for conda install
, you should modify the download source for conda.
You could use
conda config --show
or conda config --show channels
to check the source channels.
Use conda config --remove <channel name>
to remove a channel.
To add Tsinghua Souce channel you need the following command:
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ conda config --set show_channel_urls yes
or you could edit the source channel in .condarc
. This file is usually exists in $HOME
. You could find it by using sudo find / -name '.condarc'
After that, conda config --set show_channel_urls yes
is need to show the download url for every installation.
Here is a good tutorial for this work
conda env list
or conda info --envs
.conda create -n <env_name> python=3.7
.conda remove -n <env name> --all
conda activate <env name>
conda deactivate
The first thing you need to do is to make sure the match of the versions among all of these softwares.
The first step is to check the CUDA version corresponding with pytorch
and tensorflow match.
The second step is to verify the nvidia driver version corresponding with CUDA. See [**CUDA and
nvidia-driver match**](https://docs.nvidia.com/cuda/...
Then, you need to make sure the the cudnn version corresponding with CUDA. This can be seen in
cudnn.
The version on my machine are as follows:
software | version |
---|---|
torch | 1.4 |
CUDA | 10.1 |
nvidia driver | 418 |
cudnn | 7.6 |
tensorflow-gpu | 2.1 |
After these, you can start install them.
The installation command depends on what virtual environment you are using. Refer pytorch
for exact command.
You are recommanded to install tensorflow-2.1
. The differences between version 2.0
and version 2.1
are big. You
should always use tools in the newest stable version.
pip install tensorflow==2.1 pip install tensorflow-gpu==2.1
When you import tensorflow
, you may face the following warning:
2020-01-20 11:46:50.881093: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/protobuf/lib:/usr/local/lib2020-01-20 11:46:50.881169: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/protobuf/lib:/usr/local/lib
2020-01-20 11:46:50.881178: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
It is only a warning and will not affect your usage.
However, you should notice that version mismatch problem will not warn you in tf
but it will do in torch
. Thus
the version match work is very important.
You can use the following command to check the corresponding driver for your machine
ubuntu-drivers devices
Then, you can use command to install the nvidia driver.
sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt-get update sudo apt install nvidia-418
Some useful commands:
lspci |grep VGA
lspci |grep -i nvidia
You can follow the tutorial in homepage of CUDA.
But you could only get the latest version of CUDA.
For history version, you need to visit history release.
For version 10.1
, you can get it here.
Then, you can follow the command as follows:
sudo dpkg -i cuda-repo-ubuntu1604-10-1-local-10.1.105-418.39_1.0-1_amd64.deb sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub sudo apt-get update sudo apt-get install cuda
Then you need to add CUDA
to your environment varibale.
In the ~/.bashrc
, add the following context.
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export CUDA_HOME=/usr/local/cuda
Then, source ~/.bashrc
will finish the work.
In fact, it would be convient to install CUDA. However, I made a mistake during my procedure.
I tried to install 10.2
first and shut down before the last step. However, dpkg
record the
package in its memory. To install 10.1
, you need to run the following command first.
dpkg -r cuda-repo-<version> dpkg -P cuda-repo-<version>
You could watch the nvidia driver using:
nvidia-smi
or watch -n 10 nvidia-smi
If the error is
Failed to initialize NVML: Driver/library version mismatch
This is because the kernel module of the nvidia is mismatch with current driver version. Under this condition.
restarting the machine is a good choice.
Then, you can see (the version is wrong because I can't get my working station now)
Some useful commands:
cat /usr/local/cuda/version.txt
It is very easy to install cudnn. Here, I recommand you to install cudnn use tar
rather than deb
.
First, download it from cudnn.
Then, run the following command:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ sudo chmod a+r /usr/local/cuda/include/cudnn.h sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
Some useful commands:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
You need tensorboardX
, sciki-image
, seaborn
, matplotlib
and so on. Some of them may be have been installed
during installation of Torch or Tensorflow, otherwise you need to conda install
them manually.
Firstly, do the version match. tfp
For tf 2.1
, the required tfp
version is 0.9
.
pip install tensorflow-probability==0.9
sudo apt-get install ffmpeg
add export PATH=/usr/local/ffmpeg/bin:$PATH
in ~/.bashrc
.
Clone the source code and follow the tutorial.
Use pip install -e .
, you could install the baselines.
You should note that the OpenAI gym could also be installed. You don't need to install it again for the reason
that there may be a version missmatch.
However you could still follow gym to install it.
It is also esay to install them if you are lucky.
You could get a 30 days trial license for mujoco for one machine.
An e-mail could get three machines. The trial is necessary because sometimes you can't install mujoco-py anyway.
Register your computer and get the license. For your computer id, download the getid file and then:
chmod +x getid ./getid
Download product first, for the mujoco version, you should see the mujoco-py for
version support.
Then
$ mkdir ~/.mujoco $ cp mujoco200_linux.zip ~/.mujoco $ cd ~/.mujoco $ unzip mujoco200_linux.zip $ cp -r mujoco200_linux mujoco200
the last line is because the mujoco_py will need the directory name without linux.
Copy license
$ cp mjkey.txt ~/.mujoco $ cp mjkey.txt ~/.mujoco/mujoco200/bin
Environment variable, edit ~/.bashrc
and add the following command in it. Then source ~/.bashrc
.
export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}
Testing
$ cd ~/.mujoco/mujoco200_linux/bin $ ./simulate ../model/humanoid.xml
You will see.
For some remote machine, you will not the this for the limit of hardware, but for some you could see it.
download source code git clone https://github.com/openai/mujoco-py.git
.
Install patchelf, this is for the lG
.
$ sudo curl -o /usr/local/bin/patchelf https://s3-us-west-2.amazonaws.com/openai-sci-artifacts/manual-builds/patchelf_0.9_amd64.elf $ sudo chmod +x /usr/local/bin/patchelf
Install gcc dependencies:
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
Some other dependencies
$ cd ~/mujoco-py $ cp requirements.txt requirements.dev.txt ./mujoco_py $ cd mujoco_py $ pip install -r requirements.txt $ pip install -r requirements.dev.txt
Installation
$ cd ~/mujoco-py/vendor $ ./Xdummy-entrypoint $ cd .. $ python setup.py install
Testing, import mujoco_py
, for the first time it will compile some file. If you face the gcc error, infer the
trouble shooting in mujoco-py. If this could not help you, may be you need
change another computer.
Another control environment which regardless of mujoco_py
. The directory of mujoco
for dm_control
is ~/.mujoco/mujoco200_linux/
, thus you need to copy another directory
of mujoco
:
$ cd ~/.mujoco $ cp -r mujoco200 mujoco200_linux
Then you could install dm_control
$ pip install dm_control
One thing you need to notice is that the visual tools used is OpenGL EGL
.
First, you need to pip install pyopengl
. Then, you need to export PYOPENGLPLATFORM=egl
.
By this way, you could use dm_control
.