Debian Hadoop如何实现高可用
Prerequisites for Debian Hadoop High Availability (HA)
Before configuring HA, ensure you have:
- At least 3 Debian nodes (for NameNode, DataNode, JournalNode, ResourceManager roles) with static IPs and proper hostname/DNS resolution.
- Java (OpenJDK 8 or 11) installed on all nodes (
sudo apt install openjdk-11-jdk
). - Hadoop (version 3.x recommended) downloaded and extracted on all nodes.
- Passwordless SSH configured between all nodes (using
ssh-keygen
andssh-copy-id
) for seamless communication. - ZooKeeper cluster (3 or 5 nodes) set up for coordination (critical for automatic failover).
1. Configure ZooKeeper Cluster (Coordination Service)
ZooKeeper is essential for monitoring NameNode/ResourceManager health and triggering automatic failover.
- Install ZooKeeper: On each ZooKeeper node, run:
sudo apt install zookeeper zookeeperd
- Configure ZooKeeper: Edit
/etc/zookeeper/conf/zoo.cfg
on all nodes to include cluster members:
CreatedataDir=/var/lib/zookeeper clientPort=2181 server.1=zoo1:2888:3888 # Replace with your node hostnames server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
myid
file in/var/lib/zookeeper
on each node with a unique ID (e.g.,1
for zoo1,2
for zoo2). - Start ZooKeeper: Run
sudo systemctl start zookeeper
on all nodes and verify status withsudo systemctl status zookeeper
.
2. Configure HDFS High Availability (NameNode HA)
HDFS HA eliminates the single point of failure (SPOF) of the NameNode using Active/Standby nodes and JournalNodes for metadata synchronization.
-
Modify
core-site.xml
: Define the HDFS namespace and ZooKeeper quorum (for ZKFC):< property> < name> fs.defaultFS< /name> < value> hdfs://mycluster< /value> < !-- Logical cluster name --> < /property> < property> < name> ha.zookeeper.quorum< /name> < value> zoo1:2181,zoo2:2181,zoo3:2181< /value> < !-- ZooKeeper ensemble --> < /property>
-
Modify
hdfs-site.xml
: Configure NameNode roles, shared storage (JournalNodes), and failover settings:< property> < name> dfs.nameservices< /name> < value> mycluster< /value> < !-- Must match fs.defaultFS --> < /property> < property> < name> dfs.ha.namenodes.mycluster< /name> < value> nn1,nn2< /value> < !-- Active and Standby NameNode IDs --> < /property> < property> < name> dfs.namenode.rpc-address.mycluster.nn1< /name> < value> namenode1:8020< /value> < !-- RPC address for nn1 --> < /property> < property> < name> dfs.namenode.rpc-address.mycluster.nn2< /name> < value> namenode2:8020< /value> < !-- RPC address for nn2 --> < /property> < property> < name> dfs.namenode.shared.edits.dir< /name> < value> qjournal://journalnode1:8485; journalnode2:8485; journalnode3:8485/mycluster< /value> < !-- JournalNode quorum --> < /property> < property> < name> dfs.ha.automatic-failover.enabled< /name> < value> true< /value> < !-- Enable automatic failover --> < /property> < property> < name> dfs.client.failover.proxy.provider.mycluster< /name> < value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider< /value> < !-- Client-side proxy for failover --> < /property> < property> < name> dfs.ha.fencing.methods< /name> < value> sshfence< /value> < !-- Prevent split-brain (e.g., kill old Active process) --> < /property> < property> < name> dfs.ha.fencing.ssh.private-key-files< /name> < value> /root/.ssh/id_rsa< /value> < !-- SSH key for fencing --> < /property>
-
Start JournalNodes: On each JournalNode node, run:
hadoop-daemon.sh start journalnode
Verify with
jps
(look forJournalNode
processes). -
Format and Start NameNodes:
- On the Active NameNode (e.g., namenode1), format the NameNode:
hdfs namenode -format
- Start the NameNodes on both nodes:
start-dfs.sh
- Check NameNode status with
hdfs haadmin -report
(should show one Active, one Standby).
- On the Active NameNode (e.g., namenode1), format the NameNode:
3. Configure YARN High Availability (ResourceManager HA)
YARN HA ensures the ResourceManager (which schedules jobs) remains available even if one instance fails.
-
Modify
yarn-site.xml
: Configure ResourceManager roles and ZooKeeper for state storage:< property> < name> yarn.resourcemanager.ha.enabled< /name> < value> true< /value> < /property> < property> < name> yarn.resourcemanager.cluster-id< /name> < value> yarn-cluster< /value> < !-- Unique cluster ID --> < /property> < property> < name> yarn.resourcemanager.ha.rm-ids< /name> < value> rm1,rm2< /value> < !-- Active and Standby ResourceManager IDs --> < /property> < property> < name> yarn.resourcemanager.zk-address< /name> < value> zoo1:2181,zoo2:2181,zoo3:2181< /value> < !-- ZooKeeper ensemble --> < /property> < property> < name> yarn.resourcemanager.ha.id< /name> < value> rm1< /value> < !-- Set to rm1 for Active RM, rm2 for Standby --> < /property>
-
Start YARN: On the Active ResourceManager (e.g., resourcemanager1), run:
start-yarn.sh
The Standby ResourceManager (e.g., resourcemanager2) will automatically sync state from ZooKeeper.
4. Validate High Availability
- Check NameNode Status: Run
hdfs haadmin -report
to confirm one Active and one Standby NameNode. - Test Failover:
- Simulate Active NameNode failure (e.g.,
kill -9
the NameNode process on the active node). - Wait 30–60 seconds (ZooKeeper election time) and run
hdfs haadmin -report
again— the Standby should become Active.
- Simulate Active NameNode failure (e.g.,
- Check ResourceManager Status: Run
yarn node -list
to verify the Active ResourceManager is handling requests. - Submit a Test Job: Run a simple MapReduce job (e.g.,
hadoop jar hadoop-mapreduce-examples.jar pi 10 100
) to ensure the cluster functions during failover.
Key Notes for Production
- Use odd number of JournalNodes (3 or 5) to avoid split-brain scenarios.
- Secure ZooKeeper with authentication (e.g., SASL) in multi-tenant environments.
- Monitor cluster health with tools like Prometheus + Grafana or Ambari to detect issues early.
- Regularly back up NameNode metadata (stored in JournalNodes) to prevent data loss.
声明:本文内容由网友自发贡献,本站不承担相应法律责任。对本内容有异议或投诉,请联系2913721942#qq.com核实处理,我们将尽快回复您,谢谢合作!
若转载请注明出处: Debian Hadoop如何实现高可用
本文地址: https://pptw.com/jishu/723810.html