Debian Hadoop如何实现高可用
Prerequisites for Debian Hadoop High Availability (HA)
Before configuring HA, ensure you have:
- At least 3 Debian nodes (for NameNode, DataNode, JournalNode, ResourceManager roles) with static IPs and proper hostname/DNS resolution.
- Java (OpenJDK 8 or 11) installed on all nodes (
sudo apt install openjdk-11-jdk). - Hadoop (version 3.x recommended) downloaded and extracted on all nodes.
- Passwordless SSH configured between all nodes (using
ssh-keygenandssh-copy-id) for seamless communication. - ZooKeeper cluster (3 or 5 nodes) set up for coordination (critical for automatic failover).
1. Configure ZooKeeper Cluster (Coordination Service)
ZooKeeper is essential for monitoring NameNode/ResourceManager health and triggering automatic failover.
- Install ZooKeeper: On each ZooKeeper node, run:
sudo apt install zookeeper zookeeperd - Configure ZooKeeper: Edit
/etc/zookeeper/conf/zoo.cfgon all nodes to include cluster members:CreatedataDir=/var/lib/zookeeper clientPort=2181 server.1=zoo1:2888:3888 # Replace with your node hostnames server.2=zoo2:2888:3888 server.3=zoo3:2888:3888myidfile in/var/lib/zookeeperon each node with a unique ID (e.g.,1for zoo1,2for zoo2). - Start ZooKeeper: Run
sudo systemctl start zookeeperon all nodes and verify status withsudo systemctl status zookeeper.
2. Configure HDFS High Availability (NameNode HA)
HDFS HA eliminates the single point of failure (SPOF) of the NameNode using Active/Standby nodes and JournalNodes for metadata synchronization.
-
Modify
core-site.xml: Define the HDFS namespace and ZooKeeper quorum (for ZKFC):< property> < name> fs.defaultFS< /name> < value> hdfs://mycluster< /value> < !-- Logical cluster name --> < /property> < property> < name> ha.zookeeper.quorum< /name> < value> zoo1:2181,zoo2:2181,zoo3:2181< /value> < !-- ZooKeeper ensemble --> < /property> -
Modify
hdfs-site.xml: Configure NameNode roles, shared storage (JournalNodes), and failover settings:< property> < name> dfs.nameservices< /name> < value> mycluster< /value> < !-- Must match fs.defaultFS --> < /property> < property> < name> dfs.ha.namenodes.mycluster< /name> < value> nn1,nn2< /value> < !-- Active and Standby NameNode IDs --> < /property> < property> < name> dfs.namenode.rpc-address.mycluster.nn1< /name> < value> namenode1:8020< /value> < !-- RPC address for nn1 --> < /property> < property> < name> dfs.namenode.rpc-address.mycluster.nn2< /name> < value> namenode2:8020< /value> < !-- RPC address for nn2 --> < /property> < property> < name> dfs.namenode.shared.edits.dir< /name> < value> qjournal://journalnode1:8485; journalnode2:8485; journalnode3:8485/mycluster< /value> < !-- JournalNode quorum --> < /property> < property> < name> dfs.ha.automatic-failover.enabled< /name> < value> true< /value> < !-- Enable automatic failover --> < /property> < property> < name> dfs.client.failover.proxy.provider.mycluster< /name> < value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider< /value> < !-- Client-side proxy for failover --> < /property> < property> < name> dfs.ha.fencing.methods< /name> < value> sshfence< /value> < !-- Prevent split-brain (e.g., kill old Active process) --> < /property> < property> < name> dfs.ha.fencing.ssh.private-key-files< /name> < value> /root/.ssh/id_rsa< /value> < !-- SSH key for fencing --> < /property> -
Start JournalNodes: On each JournalNode node, run:
hadoop-daemon.sh start journalnodeVerify with
jps(look forJournalNodeprocesses). -
Format and Start NameNodes:
- On the Active NameNode (e.g., namenode1), format the NameNode:
hdfs namenode -format - Start the NameNodes on both nodes:
start-dfs.sh - Check NameNode status with
hdfs haadmin -report(should show one Active, one Standby).
- On the Active NameNode (e.g., namenode1), format the NameNode:
3. Configure YARN High Availability (ResourceManager HA)
YARN HA ensures the ResourceManager (which schedules jobs) remains available even if one instance fails.
-
Modify
yarn-site.xml: Configure ResourceManager roles and ZooKeeper for state storage:< property> < name> yarn.resourcemanager.ha.enabled< /name> < value> true< /value> < /property> < property> < name> yarn.resourcemanager.cluster-id< /name> < value> yarn-cluster< /value> < !-- Unique cluster ID --> < /property> < property> < name> yarn.resourcemanager.ha.rm-ids< /name> < value> rm1,rm2< /value> < !-- Active and Standby ResourceManager IDs --> < /property> < property> < name> yarn.resourcemanager.zk-address< /name> < value> zoo1:2181,zoo2:2181,zoo3:2181< /value> < !-- ZooKeeper ensemble --> < /property> < property> < name> yarn.resourcemanager.ha.id< /name> < value> rm1< /value> < !-- Set to rm1 for Active RM, rm2 for Standby --> < /property> -
Start YARN: On the Active ResourceManager (e.g., resourcemanager1), run:
start-yarn.shThe Standby ResourceManager (e.g., resourcemanager2) will automatically sync state from ZooKeeper.
4. Validate High Availability
- Check NameNode Status: Run
hdfs haadmin -reportto confirm one Active and one Standby NameNode. - Test Failover:
- Simulate Active NameNode failure (e.g.,
kill -9the NameNode process on the active node). - Wait 30–60 seconds (ZooKeeper election time) and run
hdfs haadmin -reportagain— the Standby should become Active.
- Simulate Active NameNode failure (e.g.,
- Check ResourceManager Status: Run
yarn node -listto verify the Active ResourceManager is handling requests. - Submit a Test Job: Run a simple MapReduce job (e.g.,
hadoop jar hadoop-mapreduce-examples.jar pi 10 100) to ensure the cluster functions during failover.
Key Notes for Production
- Use odd number of JournalNodes (3 or 5) to avoid split-brain scenarios.
- Secure ZooKeeper with authentication (e.g., SASL) in multi-tenant environments.
- Monitor cluster health with tools like Prometheus + Grafana or Ambari to detect issues early.
- Regularly back up NameNode metadata (stored in JournalNodes) to prevent data loss.
声明:本文内容由网友自发贡献,本站不承担相应法律责任。对本内容有异议或投诉,请联系2913721942#qq.com核实处理,我们将尽快回复您,谢谢合作!
若转载请注明出处: Debian Hadoop如何实现高可用
本文地址: https://pptw.com/jishu/723810.html
