CDH,全称Cloudera’s Distribution, including Apache Hadoop。是Hadoop众多分支中对应中的一种,由Cloudera维护,基于稳定版本的Apache Hadoop构建,提供了Hadoop的核心(可扩展存储、分布式计算),最为重要的是提供基于web的用户界面。
CDH的优点:版本划分清晰,更新速度快,支持Kerberos安全认证,支持多种安装方式(如Yum、rpm等)。
CDH分为Cloudera Manager管理平台和CDH parcel(parcel包含各种组件的安装包)。
Cloudera公司最近在官网宣布:
从2021年1月31日开始,所有Cloudera软件都需要有效的订阅,并且只能通过付费墙进行访问。
也就是说CDH各版本都不能免费获取了。很多以前的文章、书籍中提到CDH大数据平台,都是会引用Cloudera官网下载地址,例如:https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/RPMS/x86_64/
现在开始要输入账号密码认证了。用Cloudera的账号密码的话,会得到如下失败信息:
403 Forbidden (varnish) the provided credentials were incorrect
说明没有权限获取CDH了。在Cloudera官网社区里有很多讨论。
解决办法如下:
1.使用开源的Hadoop、Spark、Hive等分别安装;
2.使用自己或别人以前下载过的CDH安装包,共享使用,官网说的是CDH6.6以后收费,以前的还是可以使用。
链接: https://pan.baidu.com/s/1ION1DoWnqpfVO_sBx0GpeA 提取码: rqyi
从共享的百度网盘链接将CDH安装包下载到本地工作机,如下所示:
~/workspace$ tree CDH CDH ├── CDH │ ├── CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel │ ├── CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1 │ ├── CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha256 │ └── manifest.json └── ClouderaManager ├── cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm ├── cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm ├── cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm ├── cloudera-manager-server-db-2-6.3.1-1466458.el7.x86_64.rpm ├── enterprise-debuginfo-6.3.1-1466458.el7.x86_64.rpm └── oracle-j2sdk1.8-1.8.0+update181-1.x86_64.rpm
1.创建实例
2.实例创建完成后,列表如下
root@cdh001:~# vim /etc/hosts
在该文件中添加如下内容:
172.18.48.175 cdh001.tigoyi.com cdh001 172.18.48.176 cdh002.tigoyi.com cdh002 172.18.48.177 cdh003.tigoyi.com cdh003
确认编辑正确:
[root@cdh001 ~]# cat /etc/hosts ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 172.18.48.175 cdh001.tigoyi.com cdh001 172.18.48.176 cdh002.tigoyi.com cdh002 172.18.48.177 cdh003.tigoyi.com cdh003
1.执行以下命令,创建公钥和私钥。
[root@cdh001 ~]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:tZBH8ykK49WNoil/KjsDbfwu+5siAmEJ8MprfpHMrY0 root@cdh001 The key's randomart image is: +---[RSA 2048]----+ |o o | |.. + = . | |. o o = * + | |o+ . * * o | |oooo+ + S . | |. o=++ | |.o o=.. . | |+ .E*..+ | | o..o@*. | +----[SHA256]-----+
敲3次回车,就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)。
2.将公钥拷贝到要免密登录的目标机器上
将公钥拷贝到本机
[root@cdh001 ~]# ssh-copy-id cdh001 /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub" The authenticity of host 'cdh001 (172.18.48.175)' can't be established. ECDSA key fingerprint is SHA256:ZjNmP0BKmg4ugXev5ZrlWjTjypVf+Fp2mexLGiDogjc. ECDSA key fingerprint is MD5:3b:5b:ce:dc:24:3d:83:e3:9b:97:80:4d:b0:4b:ef:25. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@cdh001's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'cdh001'" and check to make sure that only the key(s) you wanted were added.
将公钥拷贝到机器cdh002
[root@cdh001 ~]# ssh-copy-id cdh002
现在当前机器cdh001可以免密登录到目标机器cdh002。
类似的,将公钥拷贝到机器cdh003
[root@cdh001 ~]# ssh-copy-id cdh003
1.需求:循环复制文件到所有节点的相同目录下
2.需求分析:
(a)期望脚本使用方法:
xsync <要同步的文件路径>
(b)说明:在/root/bin这个目录下存放的脚本,root用户可以在系统任何地方直接执行。
3.脚本实现
(a)在/root目录下创建bin目录,并在bin目录下xsync创建文件,操作如下:
root@cdh001:~# cd /usr/local/bin root@cdh001:/usr/local/bin# vim xsync root@cdh001:/usr/local/bin#
在该文件中编写如下代码
#!/bin/bash #1获取输入参数个数,如果没有参数,直接退出 pcount=$# if((pcount==0));then echo no args; exit; fi #2获取文件路径 p1=$1 fname=`basename $p1` echo fname=$fname #3获取上级目录到绝对路径 pdir=`cd -P $(dirname $p1);pwd` echo pdir=$pdir #4获取当前用户名称 user=`whoami` #5循环 for((num=2;num<4;num++));do host=$(printf "%03d" "$num") echo -------------------cdh$host-------------- rsync -rvl $pdir/$fname $user@cdh$host:$pdir done
(b)修改脚本xsync具有执行权限
root@cdh001:/usr/local/bin# chmod 777 xsync
(c)调用脚本形式:
xsync <文件路径>
(d)分发集群分发脚本
[root@cdh001 ~]# xsync /usr/local/bin/xsync fname=xsync pdir=/usr/local/bin -------------------cdh002-------------- sending incremental file list xsync sent 597 bytes received 35 bytes 1,264.00 bytes/sec total size is 508 speedup is 0.80 -------------------cdh003-------------- sending incremental file list xsync sent 597 bytes received 35 bytes 1,264.00 bytes/sec total size is 508 speedup is 0.80
(e)分发映射文件
[root@cdh001 ~]# xsync /etc/hosts fname=hosts pdir=/etc -------------------cdh002-------------- sending incremental file list hosts sent 379 bytes received 41 bytes 840.00 bytes/sec total size is 291 speedup is 0.69 -------------------cdh003-------------- sending incremental file list hosts sent 379 bytes received 41 bytes 840.00 bytes/sec total size is 291 speedup is 0.69
~/workspace$ rsync -rvl CDH/ root@39.108.122.82:/root/CDH root@39.108.122.82's password: sending incremental file list created directory /root/CDH ./ CDH/ CDH/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel CDH/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1 CDH/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha256 CDH/manifest.json ClouderaManager/ ClouderaManager/cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm ClouderaManager/cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm ClouderaManager/cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm ClouderaManager/cloudera-manager-server-db-2-6.3.1-1466458.el7.x86_64.rpm ClouderaManager/enterprise-debuginfo-6.3.1-1466458.el7.x86_64.rpm ClouderaManager/oracle-j2sdk1.8-1.8.0+update181-1.x86_64.rpm sent 3,496,611,440 bytes received 257 bytes 1,221,097.15 bytes/sec total size is 3,495,756,962 speedup is 1.00
下载并安装
[root@cdh001 ~]# wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm [root@cdh001 ~]# yum -y install mysql57-community-release-el7-10.noarch.rpm [root@cdh001 ~]# yum -y install mysql-community-server
启动服务
[root@cdh001 ~]# systemctl start mysqld
查看状态
[root@cdh001 ~]# systemctl status mysqld
查看mysql临时密码
[root@cdh001 ~]# grep "password" /var/log/mysqld.log 2021-04-15T07:31:51.264538Z 1 [Note] A temporary password is generated for root@localhost: LeuhBAe2ai,u
[root@cdh001 ~]# mysql -u root -p mysql> set global validate_password_policy=0; Query OK, 0 rows affected (0.00 sec) mysql> set global validate_password_length=4; Query OK, 0 rows affected (0.00 sec) mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY '123465'; Query OK, 0 rows affected (0.01 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec)
[root@cdh001 ~]# systemctl enable mysqld [root@cdh001 ~]# systemctl restart mysqld
要求使用5.1.26以上版本的jdbc驱动,可点击这里直接下载mysql-connector-java-5.1.47.tar.gz
[root@cdh001 ~]# wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.47.tar.gz [root@cdh001 ~]# tar -xzf mysql-connector-java-5.1.47.tar.gz [root@cdh001 ~]# ll mysql-connector-java-5.1.47 总用量 2452 -rw-r--r-- 1 root root 91845 8月 7 2018 build.xml -rw-r--r-- 1 root root 248527 8月 7 2018 CHANGES -rw-r--r-- 1 root root 18122 8月 7 2018 COPYING -rw-r--r-- 1 root root 1007505 8月 7 2018 mysql-connector-java-5.1.47-bin.jar -rw-r--r-- 1 root root 1007502 8月 7 2018 mysql-connector-java-5.1.47.jar -rw-r--r-- 1 root root 61407 8月 7 2018 README -rw-r--r-- 1 root root 63658 8月 7 2018 README.txt drwxr-xr-x 8 root root 4096 8月 7 2018 src [root@cdh001 ~]# mkdir -p /usr/share/java [root@cdh001 ~]# mv mysql-connector-java-5.1.47/mysql-connector-java-5.1.47-bin.jar /usr/share/java/mysql-connector-java.jar
根据所需要安装的服务参照下表创建对应的数据库以及数据库用户,数据库必须使用utf8编码,创建数据库时要记录好用户名及对应密码:
服务名 | 数据库名 | 用户名 | 密码 |
---|---|---|---|
Cloudera Manager Server | scm | scm | 123465 |
Activity Monitor | amon | amon | 123465 |
Reports Manager | rman | rman | 123465 |
Hue | hue | hue | 123465 |
Hive Metastore Server | metastore | hive | 123465 |
Sentry Server | sentry | sentry | 123465 |
Cloudera Navigator Audit Server | nav | nav | 123465 |
Cloudera Navigator Metadata Server | navms | navms | 123465 |
Oozie | oozie | oozie | 123465 |
创建sql脚步文件
[root@cdh001 ~]# vim cdhinit.sql
添加内容如下:
# scm CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY '123465'; # amon CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY '123465'; # rman CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY '123465'; # hue CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY '123465'; # hive CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON metastore.* TO 'hive'@'%' IDENTIFIED BY '123465'; # sentry CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY '123465'; # nav CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY '123465'; # navms CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY '123465'; # oozie CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY '123465'; # flush FLUSH PRIVILEGES; SHOW DATABASES;
修改mysql密码校验策略
[root@cdh001 ~]# mysql -u root -p mysql> set global validate_password_policy=LOW; Query OK, 0 rows affected (0.00 sec) mysql> set global validate_password_length=6; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE 'validate_password%'; +--------------------------------------+-------+ | Variable_name | Value | +--------------------------------------+-------+ | validate_password_check_user_name | OFF | | validate_password_dictionary_file | | | validate_password_length | 6 | | validate_password_mixed_case_count | 1 | | validate_password_number_count | 1 | | validate_password_policy | LOW | | validate_password_special_char_count | 1 | +--------------------------------------+-------+ 7 rows in set (0.00 sec)
运行脚本
mysql> source /root/cdhinit.sql Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 0 rows affected (0.00 sec) +--------------------+ | Database | +--------------------+ | information_schema | | amon | | hue | | metastore | | mysql | | nav | | navms | | oozie | | performance_schema | | rman | | scm | | sentry | | sys | +--------------------+ 13 rows in set (0.00 sec)
首先安装httpd和createrepo:
[root@cdh001 ~]# yum -y install httpd createrepo
启动httpd服务并设置开机自启动:
[root@cdh001 ~]# systemctl start httpd [root@cdh001 ~]# systemctl enable httpd Created symlink from /etc/systemd/system/multi-user.target.wants/httpd.service to /usr/lib/systemd/system/httpd.service.
将上传的CDH安装包移到httpd的html目录下:
[root@cdh001 ~]# mv CDH/ /var/www/html/
生成RPM元数据:
[root@cdh001 ~]# cd /var/www/html/CDH/ClouderaManager/ [root@cdh001 ClouderaManager]# createrepo . Spawning worker 0 with 3 pkgs Spawning worker 1 with 3 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete [root@cdh001 ClouderaManager]# ll 总用量 1380428 -rw-r--r-- 1 root root 10483568 4月 15 12:42 cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm -rw-r--r-- 1 root root 1203832464 4月 15 12:58 cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm -rw-r--r-- 1 root root 11488 4月 15 12:58 cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm -rw-r--r-- 1 root root 10996 4月 15 12:58 cloudera-manager-server-db-2-6.3.1-1466458.el7.x86_64.rpm -rw-r--r-- 1 root root 14209868 4月 15 12:58 enterprise-debuginfo-6.3.1-1466458.el7.x86_64.rpm -rw-r--r-- 1 root root 184988341 4月 15 13:01 oracle-j2sdk1.8-1.8.0+update181-1.x86_64.rpm drwxr-xr-x 2 root root 4096 4月 15 14:09 repodata
接着再创建ClouderaManager的repo文件
[root@cdh001 ClouderaManager]# vim /etc/yum.repos.d/cloudera-manager.repo
添加如下内容:
[cloudera-manager] name=Cloudera Manager 6 baseurl=http://cdh001/CDH/ClouderaManager/ gpgcheck=0 enabled=1
分发
[root@cdh001 ~]# xsync /etc/yum.repos.d/cloudera-manager.repo
修改 /etc/httpd/conf/httpd.conf 配置文件
把第284行的 AddType application/x-gzip .gz .tgz 修改为 AddType application/x-gzip .gz .tgz .parcel
重启httpd服务
[root@cdh001 ~]# systemctl restart httpd
[root@cdh001 ~]# yum install oracle-j2sdk1.8
设置环境变量
[root@cdh001 ~]# vim /etc/profile
添加如下内容:
#java export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera/ export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib
环境变量生效
[root@cdh001 ~]# source /etc/profile [root@cdh001 ~]# java -version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
分发配置文件
[root@cdh001 ~]# xsync /etc/profile
[root@cdh001 ~]# ssh cdh002 Welcome to Alibaba Cloud Elastic Compute Service ! [root@cdh002 ~]# yum install oracle-j2sdk1.8 [root@cdh002 ~]# source /etc/profile [root@cdh002 ~]# java -version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
[root@cdh001 ~]# yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server
[root@cdh002 ~]# yum install cloudera-manager-daemons cloudera-manager-agent [root@cdh003 ~]# yum install cloudera-manager-daemons cloudera-manager-agent
[root@cdh001 ~]# sed -i "s/server_host=localhost/server_host=cdh001/g" /etc/cloudera-scm-agent/config.ini [root@cdh001 ~]# xsync /etc/cloudera-scm-agent/config.ini
[root@cdh001 ~]# systemctl start cloudera-scm-agent [root@cdh001 ~]# ssh cdh002 Last login: Thu Apr 15 14:46:05 2021 from 172.18.48.175 Welcome to Alibaba Cloud Elastic Compute Service ! [root@cdh002 ~]# systemctl start cloudera-scm-agent [root@cdh002 ~]# 登出 Connection to cdh002 closed. [root@cdh001 ~]# ssh cdh003 Last login: Thu Apr 15 14:49:01 2021 from 172.18.48.175 Welcome to Alibaba Cloud Elastic Compute Service ! [root@cdh003 ~]# systemctl start cloudera-scm-agent [root@cdh003 ~]# 登出 Connection to cdh003 closed.
[root@cdh001 ~]# /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm 123465 All done, your SCM database is configured correctly!
[root@cdh001 ~]# systemctl start cloudera-scm-server [root@cdh001 ~]# systemctl status cloudera-scm-server ● cloudera-scm-server.service - Cloudera CM Server Service Loaded: loaded (/usr/lib/systemd/system/cloudera-scm-server.service; enabled; vendor preset: disabled) Active: active (running) since 四 2021-04-15 16:09:21 CST; 33s ago Process: 12900 ExecStartPre=/opt/cloudera/cm/bin/cm-server-pre (code=exited, status=0/SUCCESS) Main PID: 12903 (java) CGroup: /system.slice/cloudera-scm-server.service └─12903 /usr/java/jdk1.8.0_181-cloudera/bin/java -cp .:/usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-conne...
查看启动日志
[root@cdh001 ~]# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
看到下面的信息,说明启动完成
2021-04-15 16:11:08,771 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.
打开浏览器,访问地址:http://<server_host>:7180,默认账号和密码都为admin:
首先是Cloudera Manager的欢迎页面,点击页面右下角的【继续】按钮进行下一步:
勾选接受条款,点击【继续】进行下一步:
这里选择免费版:
选择版本以后会出现第二个欢迎界面,不过这个是安装集群的欢迎页:
这一步是要选择用于安装CDH集群的主机:
点击更多选项添加远程Parcel存储库
http://39.108.122.82/CDH/CDH/
Fix:
在阿里云服务器上url应该改为http://cdh001/CDH/CDH/,采用公网ip会限速
返回后会加载出CDH版本,确认无误后点击【继续】:
等待检查完成即可:
这里我选择Data Engineering
Process, develop, and serve predictive models.
服务: HDFS、YARN(含 MapReduce 2)、ZooKeeper、Oozie、Hive、Hue 和 Spark
CDH会自动给出一个角色分配,如果觉得不合理,我们可以手动调整一下,注意角色分配均衡: