Hadoop HA

NameNode HA

修改配置文件

core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdfscluster</value>
</property>

<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/bigdata/hadoop/default/tmp/jn</value>
</property>

<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
<configuration>

<!-- 完全分布式集群名称 -->
<property>
<name>dfs.nameservices</name>
<value>hdfscluster</value>
</property>
<!-- 集群中 NameNode 节点都有哪些 -->
<property>
<name>dfs.ha.namenodes.hdfscluster</name>
<value>nn1,nn2</value>
</property>
<!-- nn1 的 RPC 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.hdfscluster.nn1</name>
<value>node1:9000</value>
</property>
<!-- nn2 的 RPC 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.hdfscluster.nn2</name>
<value>node3:9000</value>
</property>
<!-- nn1 的 http 通信地址 -->
<property>
<name>dfs.namenode.http-address.hdfscluster.nn1</name>
<value>node1:50070</value>
</property>
<!-- nn2 的 http 通信地址 -->
<property>
<name>dfs.namenode.http-address.hdfscluster.nn2</name>
<value>node3:50070</value>
</property>
<!-- 指定 NameNode 元数据在 JournalNode 上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/hdfscluster</value>
</property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要 ssh 无秘钥登录-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/zhangsan/.ssh/id_rsa</value>
</property>
<!-- 关闭权限检查-->
<!--
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
-->
<!-- 访问代理类:client,mycluster,active 配置失败自动切换实现方式-->
<property>
<name>dfs.client.failover.proxy.provider.hdfscluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<!-- 开启自动故障转移 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>

清除历史脏数据

略。

安装psmisc

1
[root@node3 ~]# yum install psmisc

启动Journal Node

注意:此处使用的是hadoop-daemons.sh!只需要在一个节点上执行启动命令即可。

1
[zhangsan@node1 default]$ hadoop-daemons.sh start journalnode

查看启动情况

node1
1
2
3
4
[zhangsan@node1 default]$ jps
44583 Jps
44025 JournalNode
87531 QuorumPeerMain
node2
1
2
3
4
[zhangsan@node2 default]$ jps
100709 JournalNode
101577 Jps
31691 QuorumPeerMain
node3
1
2
3
4
[zhangsan@node3 default]$ jps
98851 Jps
97967 JournalNode
29023 QuorumPeerMain

格式化启动NN1

注意:此处使用的是hadoop-daemon.sh !

1
2
[zhangsan@node1 default]$ hdfs namenode -format
[zhangsan@node1 default]$ hadoop-daemon.sh start namenode
查看启动情况
1
2
3
4
5
[zhangsan@node1 default]$ jps
44583 Jps
44025 JournalNode
87531 QuorumPeerMain
44463 NameNode

同步启动NN2

注意:我们使用node3作为namenode2,所以是在node3上将nn2nn1同步的。

1
2
[zhangsan@node3 default]$ hdfs namenode -bootstrapStandby
[zhangsan@node3 default]$ hadoop-daemon.sh start namenode
查看启动情况
1
2
3
4
5
[zhangsan@node3 default]$ jps
98771 NameNode
98851 Jps
97967 JournalNode
29023 QuorumPeerMain

启动DataNode

1
[zhangsan@node1 default]$ hadoop-daemons.sh start datanode

成功后,可以看到三个节点都启动了DataNode进程。

初始化ZK

ZooKeeper中,初始化 Namenode HA状态

1
2
[zhangsan@node1 default]$ hdfs zkfc -formatZK
[zhangsan@node1 logs]$ start-dfs.sh

停止HDFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[zhangsan@node1 hadoop]$ stop-dfs.sh 
Stopping namenodes on [node1 node3]
node1: stopping namenode
node3: stopping namenode
node1: stopping datanode
node3: stopping datanode
node2: stopping datanode
Stopping journal nodes [node1 node2 node3]
node3: stopping journalnode
node1: stopping journalnode
node2: stopping journalnode
Stopping ZK Failover Controllers on NN hosts [node1 node3]
node1: stopping zkfc
node3: stopping zkfc

启动HDFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[zhangsan@node1 default]$ start-dfs.sh 
Starting namenodes on [node1 node3]
node1: starting namenode, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-namenode-node1.out
node3: starting namenode, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-namenode-node3.out
node1: starting datanode, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-datanode-node1.out
node3: starting datanode, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-datanode-node3.out
node2: starting datanode, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-datanode-node2.out
Starting journal nodes [node1 node2 node3]
node3: starting journalnode, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-journalnode-node3.out
node2: starting journalnode, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-journalnode-node2.out
node1: starting journalnode, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-journalnode-node1.out
Starting ZK Failover Controllers on NN hosts [node1 node3]
node1: starting zkfc, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-zkfc-node1.out
node3: starting zkfc, logging to /opt/bigdata/hadoop/hadoop-2.7.3/logs/hadoop-zhangsan-zkfc-node3.out

之后启停方式

1
2
[zhangsan@node1 default]$ start-dfs.sh 
[zhangsan@node1 default]$ stop-dfs.sh