4.0 HDFS 集群

axing
2025-09-21 / 0 评论 / 5 阅读 / 正在检测是否收录...
温馨提示:
本文最后更新于2025年09月21日,已超过22天没有更新,若内容或图片失效,请留言反馈。

一、准备工作

HDFS 集群是建立在 Hadoop 集群之上的,由于 HDFS 是 Hadoop 最主要的守护进程,所以 HDFS 集群的配置过程是 Hadoop 集群配置过程的代表。

使用 Docker 可以更加方便地、高效地构建出一个集群环境。

1.1 做hosts

#三台机的hosts文件要提前做好 我这里因为是k8s节点已经提前做好了
[root@k8s-01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.30.50   k8s-01
192.168.30.51   k8s-02
192.168.30.52   k8s-03
192.168.30.58   k8s-vip
192.168.30.50 goodrain.me

1.2 做镜像

# 在 k8s-01 上
docker save hadoop_proto:with-hdfs -o /root/hadoop_proto_with_hdfs.tar
scp /root/hadoop_proto_with_hdfs.tar root@192.168.30.51:/root/
scp /root/hadoop_proto_with_hdfs.tar root@192.168.30.52:/root/

# 在 k8s-02、k8s-03 分别执行
docker load -i /root/hadoop_proto_with_hdfs.tar
docker images | grep hadoop_proto

1.3 在 k8s-02/03 为 DataNode 准备持久化目录

(建议开始用宿主机目录做持久化,避免容器删掉后数据/元数据丢失)
# k8s-02
mkdir -p /data/hdfs/datanode
# k8s-03
mkdir -p /data/hdfs/datanode
#如果容器里运行用户是 hadoop,建议给目录授权:
#chown -R 1000:1000 /data/hdfs/datanode  # 1000 常见是 hadoop 用户 UID,如不同再调整

1.4 在 k8s-02/03 只启动 DataNode(不要再起 NameNode)

方式 A:host 网络 + 覆盖 DN 配置 + 只拉起 DN
# k8s-02
docker run -d \
  --name hdfs-dn-02 \
  --hostname hdfs-dn-02 \
  --network host \
  -v /data/hdfs/datanode:/usr/local/hadoop/dfs/data \
  hadoop_proto:with-hdfs \
  bash
set -e
# 1) 写入 DataNode 专用的 hdfs-site.xml(指明对外通告的主机IP)
cat >/usr/local/hadoop/etc/hadoop/hdfs-site.xml << "EOF"
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <property><name>dfs.datanode.data.dir</name><value>file:///usr/local/hadoop/dfs/data</value></property>

  <!-- 集群副本策略:沿用 3,也可以先设 2/1 过渡 -->
  <property><name>dfs.replication</name><value>3</value></property>

  <!-- DataNode 对外通告的地址=宿主机IP -->
  <property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
  <property><name>dfs.datanode.hostname</name><value>192.168.30.51</value></property>

  <!-- 监听地址 -->
  <property><name>dfs.datanode.address</name><value>0.0.0.0:9866</value></property>
  <property><name>dfs.datanode.http.address</name><value>0.0.0.0:9864</value></property>
  <property><name>dfs.datanode.ipc.address</name><value>0.0.0.0:9867</value></property>
</configuration>
EOF

# 2) core-site.xml 保持指向 NameNode(192.168.30.50:9000)
#    如果镜像里已经是这个值就不用动;保险起见再写一次:
cat >/usr/local/hadoop/etc/hadoop/core-site.xml << "EOF"
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <property><name>fs.defaultFS</name><value>hdfs://192.168.30.50:9000</value></property>
  <property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>
</configuration>
EOF

# 3) 只启动 DataNode
/usr/local/hadoop/bin/hdfs --daemon start datanode
# 4) 挂住容器(观察日志)
tail -F /usr/local/hadoop/logs/*datanode*.log

1.5 在 k8s-01 修改配置

[hadoop@80b2f403b24e hadoop]$ cat  ./core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://192.168.30.50:9000</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>true</value>
  </property>
</configuration>
[hadoop@80b2f403b24e hadoop]$ cat  ./hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <!-- NameNode / DataNode 本机目录 -->
  <property><name>dfs.namenode.name.dir</name><value>file:///usr/local/hadoop/dfs/name</value></property>
  <property><name>dfs.datanode.data.dir</name><value>file:///usr/local/hadoop/dfs/data</value></property>

  <property><name>dfs.replication</name><value>3</value></property>

  <!-- NN 监听与对外地址 -->
  <property><name>dfs.namenode.rpc-address</name><value>192.168.30.50:9000</value></property>
  <property><name>dfs.namenode.rpc-bind-host</name><value>0.0.0.0</value></property>
  <property><name>dfs.namenode.http-address</name><value>192.168.30.50:9870</value></property>
  <property><name>dfs.namenode.http-bind-host</name><value>0.0.0.0</value></property>

  <!-- 客户端/DN 使用主机名(或指定IP)模式 -->
  <property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>

  <!-- 关键:放宽 NN 对 DN 注册时的 IP/主机名一致性校验 -->
  <property>
    <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
    <value>false</value>
  </property>

  <!-- 本机若同时跑 DN,以下配置保留 -->
  <property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
  <property><name>dfs.datanode.hostname</name><value>192.168.30.50</value></property>
  <property><name>dfs.datanode.address</name><value>0.0.0.0:9866</value></property>
  <property><name>dfs.datanode.http.address</name><value>0.0.0.0:9864</value></property>
  <property><name>dfs.datanode.ipc.address</name><value>0.0.0.0:9867</value></property>
</configuration>
#如果是从我之前单节点执行过来的 可以执行下面的命令修改 也可以直接复制上面的命令修改
# 进入配置目录
cd /usr/local/hadoop/etc/hadoop

# 备份
sudo cp -a hdfs-site.xml hdfs-site.xml.bak.$(date +%s)

# 用 sudo tee 覆盖写入(注意:这里是覆盖而不是追加)
sudo tee hdfs-site.xml >/dev/null <<'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <!-- NameNode / DataNode 本机目录 -->
  <property><name>dfs.namenode.name.dir</name><value>file:///usr/local/hadoop/dfs/name</value></property>
  <property><name>dfs.datanode.data.dir</name><value>file:///usr/local/hadoop/dfs/data</value></property>

  <property><name>dfs.replication</name><value>3</value></property>

  <!-- NN 监听与对外地址 -->
  <property><name>dfs.namenode.rpc-address</name><value>192.168.30.50:9000</value></property>
  <property><name>dfs.namenode.rpc-bind-host</name><value>0.0.0.0</value></property>
  <property><name>dfs.namenode.http-address</name><value>192.168.30.50:9870</value></property>
  <property><name>dfs.namenode.http-bind-host</name><value>0.0.0.0</value></property>

  <!-- 客户端/DN 使用主机名(或指定IP)模式 -->
  <property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>

  <!-- 关键:放宽 NN 对 DN 注册时的 IP/主机名一致性校验 -->
  <property>
    <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
    <value>false</value>
  </property>

  <!-- 本机若同时跑 DN,以下配置保留 -->
  <property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
  <property><name>dfs.datanode.hostname</name><value>192.168.30.50</value></property>
  <property><name>dfs.datanode.address</name><value>0.0.0.0:9866</value></property>
  <property><name>dfs.datanode.http.address</name><value>0.0.0.0:9864</value></property>
  <property><name>dfs.datanode.ipc.address</name><value>0.0.0.0:9867</value></property>
</configuration>
EOF

1.6 重启

# 重启 NameNode(以及本机 DataNode 让新配置生效)
/usr/local/hadoop/bin/hdfs --daemon stop namenode
sleep 2
/usr/local/hadoop/bin/hdfs --daemon start namenode

/usr/local/hadoop/bin/hdfs --daemon stop datanode
sleep 2
/usr/local/hadoop/bin/hdfs --daemon start datanode

二、验证
2.1 在k8s-02/03 中确认 DN 监听正确端口(9864/9866/9867)且用宿主机 IP 对外通告

# DN 是否在监听 9864/9866/9867
[root@hdfs-dn-02 tmp]# ss -lntp | egrep ':(9864|9866|9867)\b' || netstat -lntp | egrep ':(9864|9866|9867)\b'
LISTEN 0      4096         0.0.0.0:9864       0.0.0.0:*    users:(("java",pid=60,fd=389))
LISTEN 0      256          0.0.0.0:9867       0.0.0.0:*    users:(("java",pid=60,fd=390))
LISTEN 0      256          0.0.0.0:9866       0.0.0.0:*    users:(("java",pid=60,fd=333))
# 再从 NN 容器主动连过来测 DN 的 9867(IPC)和 9866(数据)是否可达
# 在 NN 容器执行下面两条,分别测 51/52:
[root@hdfs-dn-02 tmp]# bash -lc 'echo > /dev/tcp/192.168.30.51/9867 && echo OK-51-9867 || echo FAIL'
OK-51-9867
[root@hdfs-dn-02 tmp]# bash -lc 'echo > /dev/tcp/192.168.30.51/9866 && echo OK-51-9866 || echo FAIL'
OK-51-9866
[root@hdfs-dn-02 tmp]# 
[root@hdfs-dn-02 tmp]# bash -lc 'echo > /dev/tcp/192.168.30.52/9867 && echo OK-52-9867 || echo FAIL'
OK-52-9867
[root@hdfs-dn-02 tmp]# 
[root@hdfs-dn-02 tmp]# bash -lc 'echo > /dev/tcp/192.168.30.52/9866 && echo OK-52-9866 || echo FAIL'
OK-52-9866

#如果这些端口不在监听,说明 DN 的 hdfs-site.xml 没生效或没用 host 网络。
#如果 FAIL,多半是 firewalld/iptables 拦了,放行 9864/9866/9867(DN)与 9000/9870(NN)。

2.2 在k8s-01节点执行

[hadoop@80b2f403b24e hadoop]$ hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 145030643712 (135.07 GB)
Present Capacity: 92691496960 (86.33 GB)
DFS Remaining: 92691451904 (86.33 GB)
DFS Used: 45056 (44 KB)
DFS Used%: 0.00%
Replicated Blocks:
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    Low redundancy blocks with highest priority to recover: 0
    Pending deletion blocks: 0
Erasure Coded Block Groups: 
    Low redundancy block groups: 0
    Block groups with corrupt internal blocks: 0
    Missing block groups: 0
    Low redundancy blocks with highest priority to recover: 0
    Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 172.17.0.1:9866 (_gateway)
Hostname: 192.168.30.50
Decommission Status : Normal
Configured Capacity: 48343547904 (45.02 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 25802944512 (24.03 GB)
DFS Remaining: 22540574720 (20.99 GB)
DFS Used%: 0.00%
DFS Remaining%: 46.63%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Sun Sep 21 14:40:41 UTC 2025
Last Block Report: Sun Sep 21 14:40:29 UTC 2025
Num of Blocks: 2


Name: 192.168.30.51:9866 (192.168.30.51)
Hostname: 192.168.30.51
Decommission Status : Normal
Configured Capacity: 48343547904 (45.02 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 12121690112 (11.29 GB)
DFS Remaining: 36221849600 (33.73 GB)
DFS Used%: 0.00%
DFS Remaining%: 74.93%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Sun Sep 21 14:40:43 UTC 2025
Last Block Report: Sun Sep 21 14:40:19 UTC 2025
Num of Blocks: 0


Name: 192.168.30.52:9866 (192.168.30.52)
Hostname: 192.168.30.52
Decommission Status : Normal
Configured Capacity: 48343547904 (45.02 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 14414512128 (13.42 GB)
DFS Remaining: 33929027584 (31.60 GB)
DFS Used%: 0.00%
DFS Remaining%: 70.18%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Sun Sep 21 14:40:43 UTC 2025
Last Block Report: Sun Sep 21 14:40:19 UTC 2025
Num of Blocks: 0


[hadoop@80b2f403b24e hadoop]$ hdfs dfsadmin -printTopology
Rack: /default-rack
   192.168.30.52:9866 (192.168.30.52) In Service
   172.17.0.1:9866 (_gateway) In Service
   192.168.30.51:9866 (192.168.30.51) In Service

2.3 测试

#在k8s-01中插入内容
echo xing | hdfs dfs -appendToFile - /tmp/hello.txt
#在k8s-01/02/03查看内容
hdfs dfs -cat /tmp/hello.txt


[hadoop@80b2f403b24e hadoop]$ echo xing | hdfs dfs -appendToFile - /tmp/hello.txt
[hadoop@80b2f403b24e hadoop]$ hdfs dfs -cat /tmp/hello.txt

hello
xing

[root@hdfs-dn-02 tmp]# hdfs dfs -cat /tmp/hello.txt
hello
xing


[root@hdfs-dn-03 tmp]# hdfs dfs -cat /tmp/hello.txt
hello
xing

2.4 节点开机自动拉起

# 让 Docker 自启(若未开启)
sudo systemctl enable --now docker

# 给容器加重启策略:除非手动 stop,否则跟随宿主机重启而自动拉起
sudo docker update --restart unless-stopped hdfs_single2   # k8s-01 上的 NN(+DN) 容器名
sudo docker update --restart unless-stopped hdfs-dn-02     # k8s-02 DN
sudo docker update --restart unless-stopped hdfs-dn-03     # k8s-03 DN

# 验证
docker inspect -f '{{.HostConfig.RestartPolicy.Name}}' hdfs_single2
docker inspect -f '{{.HostConfig.RestartPolicy.Name}}' hdfs-dn-02
docker inspect -f '{{.HostConfig.RestartPolicy.Name}}' hdfs-dn-03
# 都应显示:unless-stopped

2.5 优化

后续要优化k8s-01节点HDFS数据挂载到宿主机 避免数据丢失
配置docker-compose 管理
0

评论 (0)

取消