CentOS HDFS部署指南
CentOS HDFS Deployment Guide
This guide provides a step-by-step approach to deploying HDFS (Hadoop Distributed File System) on CentOS, covering both standalone and cluster setups. Follow these steps to set up a robust distributed file system.
Prerequisites
Before starting, ensure the following requirements are met:
- Operating System: CentOS 7 or later.
- Java Environment: Hadoop requires Java 8 (OpenJDK or Oracle JDK).
- Network Configuration: All nodes (NameNode, DataNodes) must be able to communicate via hostname/IP. Update
/etc/hosts
with node details (e.g.,192.168.1.10 namenode
,192.168.1.11 datanode1
). - Firewall: Open required ports (e.g., 9000 for NameNode RPC, 50070 for Web UI, 50010 for DataNode data transfer). Use
firewall-cmd
to configure:sudo firewall-cmd --permanent --zone=public --add-port=9000/tcp sudo firewall-cmd --permanent --zone=public --add-port=50070/tcp sudo firewall-cmd --reload
- SSH: Configure passwordless SSH between the NameNode and DataNodes for seamless communication. Generate keys on the NameNode and copy them to DataNodes:
ssh-keygen -t rsa ssh-copy-id datanode1 ssh-copy-id datanode2
Step 1: Install Java
Hadoop depends on Java. Install OpenJDK 8 using yum
:
sudo yum install -y java-1.8.0-openjdk-devel
Verify installation:
java -version
Ensure the output shows Java 1.8.0.
Step 2: Download and Extract Hadoop
Download the latest stable Hadoop release from the Apache website. For example, to download Hadoop 3.3.4:
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
Extract the tarball to /usr/local
and rename the directory for simplicity:
sudo tar -xzvf hadoop-3.3.4.tar.gz -C /usr/local/
sudo mv /usr/local/hadoop-3.3.4 /usr/local/hadoop
Step 3: Configure Hadoop Environment Variables
Set up environment variables to make Hadoop commands accessible globally. Create a new file /etc/profile.d/hadoop.sh
:
sudo nano /etc/profile.d/hadoop.sh
Add the following lines (adjust paths if Hadoop is installed elsewhere):
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Make the file executable and apply changes:
sudo chmod +x /etc/profile.d/hadoop.sh
source /etc/profile.d/hadoop.sh
Verify Hadoop installation:
hadoop version
Step 4: Configure HDFS Core Files
Edit Hadoop configuration files in $HADOOP_HOME/etc/hadoop
to define HDFS behavior.
4.1 core-site.xml
This file configures the default file system and NameNode address. Replace namenode
with your NameNode’s hostname:
<
configuration>
<
property>
<
name>
fs.defaultFS<
/name>
<
value>
hdfs://namenode:9000<
/value>
<
/property>
<
property>
<
name>
hadoop.tmp.dir<
/name>
<
value>
/usr/local/hadoop/tmp<
/value>
<
/property>
<
/configuration>
4.2 hdfs-site.xml
This file sets HDFS-specific parameters like replication factor and data directories. Create directories for NameNode and DataNode data:
sudo mkdir -p /usr/local/hadoop/data/namenode
sudo mkdir -p /usr/local/hadoop/data/datanode
sudo chown -R $(whoami):$(whoami) /usr/local/hadoop/data
Add the following configurations to hdfs-site.xml
:
<
configuration>
<
property>
<
name>
dfs.replication<
/name>
<
value>
3<
/value>
<
!-- Adjust based on your cluster size (e.g., 1 for standalone) -->
<
/property>
<
property>
<
name>
dfs.namenode.name.dir<
/name>
<
value>
/usr/local/hadoop/data/namenode<
/value>
<
/property>
<
property>
<
name>
dfs.datanode.data.dir<
/name>
<
value>
/usr/local/hadoop/data/datanode<
/value>
<
/property>
<
property>
<
name>
dfs.permissions.enabled<
/name>
<
value>
false<
/value>
<
!-- Disable permissions for testing (enable in production) -->
<
/property>
<
/configuration>
Optional: mapred-site.xml and yarn-site.xml
If using YARN for resource management, configure these files:
- mapred-site.xml (create if it doesn’t exist):
< configuration> < property> < name> mapreduce.framework.name< /name> < value> yarn< /value> < /property> < /configuration>
- yarn-site.xml:
< configuration> < property> < name> yarn.nodemanager.aux-services< /name> < value> mapreduce_shuffle< /value> < /property> < property> < name> yarn.nodemanager.aux-services.mapreduce.shuffle.class< /name> < value> org.apache.hadoop.mapred.ShuffleHandler< /value> < /property> < property> < name> yarn.resourcemanager.hostname< /name> < value> namenode< /value> < !-- Replace with your ResourceManager hostname --> < /property> < /configuration>
Step 5: Format the NameNode
The NameNode must be formatted before first use to initialize its storage. Run this command on the NameNode:
hdfs namenode -format
Follow the prompts to complete formatting. This step creates the necessary directory structure and metadata files.
Step 6: Start HDFS
Start the HDFS services using the start-dfs.sh
script (run from the NameNode):
start-dfs.sh
Check the status of HDFS daemons with:
jps
You should see NameNode
, DataNode
, and SecondaryNameNode
processes running.
Step 7: Verify HDFS
Confirm HDFS is operational by:
- Web UI: Open a browser and navigate to
http://< namenode-ip> :50070
(replace< namenode-ip>
with your NameNode’s IP). You should see the HDFS dashboard with cluster information. - Command Line: List the root directory to verify HDFS is accessible:
hdfs dfs -ls /
Step 8: Stop HDFS (Optional)
To stop HDFS services, run:
stop-dfs.sh
Troubleshooting Tips
- “Permission Denied” Errors: Ensure Hadoop directories have the correct ownership (
chown -R hadoop:hadoop /usr/local/hadoop
). - Port Conflicts: Verify no other services are using Hadoop ports (e.g., 9000, 50070) using
netstat -tuln
. - Daemon Failures: Check logs in
$HADOOP_HOME/logs
for errors (e.g.,NameNode.log
,DataNode.log
).
By following these steps, you’ll have a fully functional HDFS deployment on CentOS, ready to store and manage large datasets in a distributed environment.
声明:本文内容由网友自发贡献,本站不承担相应法律责任。对本内容有异议或投诉,请联系2913721942#qq.com核实处理,我们将尽快回复您,谢谢合作!
若转载请注明出处: CentOS HDFS部署指南
本文地址: https://pptw.com/jishu/724006.html