官方文档
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ClusterSetup.html
https://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-common/ClusterSetup.html
节点克隆与免密登录
将预配置好的机器克隆:机器名为node1, node2, node3,根据VM网络信息配置合适IP地址,比如。
|
IP |
| node1 |
192.168.179.101 |
| node2 |
192.168.179.102 |
| node3 |
192.168.179.103 |
Hostname修改
略。
IP修改
略。
配置hosts
/etc/hosts (对三个节点都做如下配置)
1 2 3 4 5 6
| 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.179.101 node1 192.168.179.102 node2 192.168.179.103 node3
|
可通过ping命令测试映射文件配置是否正确
1 2 3
| [root@node1 hadoop]# ping node2 PING node2 (192.168.179.102) 56(84) bytes of data. 64 bytes from node2 (192.168.179.102): icmp_seq=1 ttl=64 time=0.634 ms
|
修改完,退出root账号,切换到张三用户。
免密登录
|
HDFS |
Yarn |
| node1 |
DataNode, NameNode, historyserver |
NodeManager |
| node2 |
DataNode |
NodeManager,ResourceManager |
| node3 |
DataNode, SecondaryNameNode |
NodeManager |
说明:这里面只配置了node1、node2到其他主机的无密登录;因为node1配置的是NameNode,node2配置的是ResourceManager,都要求对其他节点无密访问。
(1)node1上生成公钥和私钥:
1
| [zhangsan@node1 .ssh]$ ssh-keygen -t rsa
|
然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)
(2)将node1公钥拷贝到要免密登录的目标机器上
1 2 3
| [zhangsan@node1 .ssh]$ ssh-copy-id node1 [zhangsan@node1 .ssh]$ ssh-copy-id node2 [zhangsan@node1 .ssh]$ ssh-copy-id node3
|
(3)node2上生成公钥和私钥:
1
| [zhangsan@node2 .ssh]$ ssh-keygen -t rsa
|
然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)
(4)将node2公钥拷贝到要免密登录的目标机器上
1 2 3
| [zhangsan@node2 .ssh]$ ssh-copy-id node1 [zhangsan@node2 .ssh]$ ssh-copy-id node2 [zhangsan@node2 .ssh]$ ssh-copy-id node3
|
[SSH免密登录原理]: ../Linux/Linux_SSH_Passwordless_login.md “原理”
集群规划
|
HDFS |
Yarn |
| node1 |
DataNode, NameNode, historyserver |
NodeManager |
| node2 |
DataNode |
NodeManager,ResourceManager |
| node3 |
DataNode, SecondaryNameNode |
NodeManager |
上传Hadoop压缩包并解压
1 2 3
| 本文将其解压到此目录; (base) [zhangsan@node1 hadoop]$ pwd /opt/bigdata/hadoop
|
创建软连接
1 2 3 4 5
| (base) [zhangsan@node1 hadoop]$ ln -s hadoop-3.1.3/ default (base) [zhangsan@node1 hadoop]$ ll total 12 lrwxrwxrwx. 1 zhangsan zhangsan 12 Feb 28 12:53 default -> hadoop-3.1.3 drwxr-xr-x. 11 zhangsan zhangsan 4096 Feb 28 15:57 hadoop-3.1.3
|
配置环境变量
1 2 3
| (base) [zhangsan@node1 ~]$ vim ~/.bash_profile export HADOOP_HOME=/opt/bigdata/hadoop/default export PATH=$PATH:$HADOOP_HOME/bin
|
source一下配置文件,使环境变量生效。
1
| (base) [zhangsan@node1 ~]$ source ~/.bash_profile
|
配置文件
core-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| <property> <name>fs.defaultFS</name> <value>hdfs://node1:8020</value> </property>
<property> <name>hadoop.tmp.dir</name> <value>/opt/bigdata/hadoop/default/tmp</value> </property> <property> <name>hadoop.proxyuser.zhangsan.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.zhangsan.groups</name> <value>*</value> </property>
|
hdfs-site.xml
1 2 3 4 5 6 7 8
| <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>node3:9868</value> </property>
|
mapred-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
<property> <name>mapreduce.jobhistory.address</name> <value>node1:10020</value> </property>
<property> <name>mapreduce.jobhistory.webapp.address</name> <value>node1:19888</value> </property>
|
yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| <property> <name>yarn.resourcemanager.hostname</name> <value>node2</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log.server.url</name> <value>http://node1:19888/jobhistory/logs</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property>
|
slaves/workers
hadoop 2.x 从进程的配置文件为 slaves
hadoop 3.x 从进程的配置文件为 workers
使用scp命令将修改后的配置文件复制到其他两台机器
1 2 3
| [zhangsan@node1 ~]$ scp -r /opt/bigdata/hadoop/hadoop-2.7.3/etc/hadoop/* zhangsan@node2:/opt/bigdata/hadoop/hadoop-2.7.3/etc/hadoop/
[zhangsan@node1 ~]$ scp -r /opt/bigdata/hadoop/hadoop-2.7.3/etc/hadoop/* zhangsan@node3:/opt/bigdata/hadoop/hadoop-2.7.3/etc/hadoop/
|
格式化
这三台机器是由node0克隆出来的,因此会有node0的脏数据。在格式化前,删除三台机器$HADOOP_HOME/tmp目录和$HADOOP_HOME/logs目录中的数据。
1 2
| # 在node1格式化namenode [zhangsan@node1 ~]$ hdfs namenode -format
|
启动
1 2 3 4 5
| # 在node1 上执行 start-all.sh , 启动 HDFS 和 YARN [zhangsan@node1 ~]$ start-all.sh
# 启动 JobHistoryServer port: 19888 [zhangsan@node1 ~]$ mr-jobhistory-daemon.sh start historyserver
|
节点进程状态
node1节点
1 2 3 4 5
| [zhangsan@node1 ~]$ jps 14258 NodeManager 14579 Jps 13783 DataNode 13644 NameNode
|
node2节点
1 2 3 4 5
| [zhangsan@node2 ~]$ jps 4113 ResourceManager 8211 NodeManager 8382 Jps 8095 DataNode
|
node3节点
1 2 3 4 5
| [zhangsan@node3 ~]$ jps 3955 SecondaryNameNode 7928 DataNode 8044 NodeManager 8220 Jps
|
Web UI
HDFS
NameNode: http://node1:50070 (hadoop 2.x)
NameNode: http://node1:9870 (hadoop 3.x)
YARN
http://node2:8088

测试
文件准备
1 2 3 4 5 6
| [zhangsan@node1 ~]$ hdfs dfs -mkdir /input
[zhangsan@node1 ~]$ hdfs dfs ls -R / drwxr-xr-x - zhangsan supergroup 0 2022-02-15 00:12 /input
[zhangsan@node1 ~]$ hdfs dfs -put /home/zhangsan/bigdata.txt /input
|

WordCount
1
| [zhangsan@node1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input/bigdata.txt /out/02161
|

集群进程查看脚本
/home/zhangsan/bin/hdp.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| #!/bin/bash if [ $# -lt 1 ] then echo "No Args Input..." exit ; fi case $1 in "start") echo " =================== 启动 hadoop集群 ==================="
echo " --------------- 启动 hdfs ---------------" ssh node1 "/opt/bigdata/hadoop/default/sbin/start-dfs.sh" echo " --------------- 启动 yarn ---------------" ssh node2 "/opt/bigdata/hadoop/default/sbin/start-yarn.sh" echo " --------------- 启动 historyserver ---------------" ssh node1 "/opt/bigdata/hadoop/default/bin/mapred --daemon start historyserver" ;; "stop") echo " =================== 关闭 hadoop集群 ==================="
echo " --------------- 关闭 historyserver ---------------" ssh node1 "/opt/bigdata/hadoop/default/bin/mapred --daemon stop historyserver" echo " --------------- 关闭 yarn ---------------" ssh node2 "/opt/bigdata/hadoop/default/sbin/stop-yarn.sh" echo " --------------- 关闭 hdfs ---------------" ssh node1 "/opt/bigdata/hadoop/default/sbin/stop-dfs.sh" ;; *) echo "Input Args Error..." ;; esac
|