Hadoop QA

Container is running beyond virtual memory limits

Q

1
2
3
4
5
6
7
8
9
10
Application application_1645498549388_0001 failed 2 times due to AM Container for appattempt_1645498549388_0001_000002 `exited with exitCode: -103`
For more detailed output, check application tracking page:http://node1:8088/cluster/app/application_1645498549388_0001Then, click on links to logs of each attempt.
`Diagnostics: Container [pid=2316,containerID=container_1645498549388_0001_02_000001] is running beyond virtual memory limits. Current usage: 100.0 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.`
Dump of the process-tree for container_1645498549388_0001_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 2316 2314 2316 2316 (bash) 1 1 116006912 300 /bin/bash -c /usr/java/default/bin/java -server -Xmx512m -Djava.io.tmpdir=/opt/bigdata/hadoop/default/tmp/nm-local-dir/usercache/hadoop/appcache/application_1645498549388_0001/container_1645498549388_0001_02_000001/tmp -Dspark.yarn.app.container.log.dir=/opt/bigdata/hadoop/hadoop-2.7.3/logs/userlogs/application_1645498549388_0001/container_1645498549388_0001_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'node1:44636' --properties-file /opt/bigdata/hadoop/default/tmp/nm-local-dir/usercache/hadoop/appcache/application_1645498549388_0001/container_1645498549388_0001_02_000001/__spark_conf__/__spark_conf__.properties --dist-cache-conf /opt/bigdata/hadoop/default/tmp/nm-local-dir/usercache/hadoop/appcache/application_1645498549388_0001/container_1645498549388_0001_02_000001/__spark_conf__/__spark_dist_cache__.properties 1> /opt/bigdata/hadoop/hadoop-2.7.3/logs/userlogs/application_1645498549388_0001/container_1645498549388_0001_02_000001/stdout 2> /opt/bigdata/hadoop/hadoop-2.7.3/logs/userlogs/application_1645498549388_0001/container_1645498549388_0001_02_000001/stderr
|- 2323 2316 2316 2316 (java) 242 111 2250498048 25294 /usr/java/default/bin/java -server -Xmx512m -Djava.io.tmpdir=/opt/bigdata/hadoop/default/tmp/nm-local-dir/usercache/hadoop/appcache/application_1645498549388_0001/container_1645498549388_0001_02_000001/tmp -Dspark.yarn.app.container.log.dir=/opt/bigdata/hadoop/hadoop-2.7.3/logs/userlogs/application_1645498549388_0001/container_1645498549388_0001_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg node1:44636 --properties-file /opt/bigdata/hadoop/default/tmp/nm-local-dir/usercache/hadoop/appcache/application_1645498549388_0001/container_1645498549388_0001_02_000001/__spark_conf__/__spark_conf__.properties --dist-cache-conf /opt/bigdata/hadoop/default/tmp/nm-local-dir/usercache/hadoop/appcache/application_1645498549388_0001/container_1645498549388_0001_02_000001/__spark_conf__/__spark_dist_cache__.properties
Container killed on request. Exit code is 143
`Container exited with a non-zero exit code 143`
Failing this attempt. Failing the application.

A

yarn-site.xml

追加

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
</property>
属性说明
property default description
yarn.nodemanager.pmem-check-enabled true Whether physical memory limits will be enforced for containers.
yarn.nodemanager.vmem-check-enabled true Whether virtual memory limits will be enforced for containers.
yarn.nodemanager.vmem-pmem-ratio 2.1 Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.

Incompatible clusterIDs

Q

此时DataNode无法启动,$HADOOP_HOME/logs/日志中出现如下错误:

1
2
java.io.IOException: Incompatible clusterIDs in /opt/bigdata/hadoop/hadoop-2.7.3/tmp/dfs/data: namenode clusterID = CID-ab1341d3-2985-4733-9938-7d207ab1f9f3
de6; datanode clusterID = CID-7ba8e9f1-4917-433e-86be-db2044809204

出现这种问题的原因是,你多次对NameNode进行了格式化。每次对NameNode格式化,NameNode都会产生一个新的clusterID

1
2
3
4
5
6
7
8
(python37) [zhangsan@node0 ~]$ cat /opt/bigdata/hadoop/default/tmp/dfs/name/current/VERSION | grep clusterID 
clusterID=CID-ab1341d3-2985-4733-9938-7d207ab1f9f3

(python37) [zhangsan@node0 ~]$ cat /opt/bigdata/hadoop/default/tmp/dfs/data/current/VERSION | grep clusterID
clusterID=CID-7ba8e9f1-4917-433e-86be-db2044809204

(python37) [zhangsan@node0 ~]$ cat /opt/bigdata/hadoop/default/tmp/dfs/namesecondary/current/VERSION | grep clusterID
clusterID=CID-7ba8e9f1-4917-433e-86be-db2044809204

A

你会看到他们的clusterID不一致,修改一致即可,记得修改完重启HDFS

ClusterID

A ClusterID identifier is used to identify all the nodes in the cluster. When a Namenode is formatted, this identifier is either provided or auto generated. This ID should be used for formatting the other Namenodes into the cluster.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html


Q

1
Permission denied: user=dr.who, access=READ_EXECUTE, inode="/user":zhangsan:supergroup:drwx-wx-wx

A

core-site.xml

1
2
3
4
<property>
<name>hadoop.http.staticuser.user</name>
<value>zhangsan</value>
</property>