CentOS HDFS部署指南

时间2025-10-11 21:14:03发布访客分类主机资讯浏览738

导读：CentOS HDFS Deployment Guide This guide provides a step-by-step approach to deploying HDFS (Hadoop Distributed File Syst...

CentOS HDFS Deployment Guide

This guide provides a step-by-step approach to deploying HDFS (Hadoop Distributed File System) on CentOS, covering both standalone and cluster setups. Follow these steps to set up a robust distributed file system.

Prerequisites

Before starting, ensure the following requirements are met:

Operating System: CentOS 7 or later.
Java Environment: Hadoop requires Java 8 (OpenJDK or Oracle JDK).
Network Configuration: All nodes (NameNode, DataNodes) must be able to communicate via hostname/IP. Update /etc/hosts with node details (e.g., 192.168.1.10 namenode, 192.168.1.11 datanode1).

Firewall: Open required ports (e.g., 9000 for NameNode RPC, 50070 for Web UI, 50010 for DataNode data transfer). Use firewall-cmd to configure:

sudo firewall-cmd --permanent --zone=public --add-port=9000/tcp
sudo firewall-cmd --permanent --zone=public --add-port=50070/tcp
sudo firewall-cmd --reload

SSH: Configure passwordless SSH between the NameNode and DataNodes for seamless communication. Generate keys on the NameNode and copy them to DataNodes:
```
ssh-keygen -t rsa
ssh-copy-id datanode1
ssh-copy-id datanode2
```

Step 1: Install Java

Hadoop depends on Java. Install OpenJDK 8 using yum:

sudo yum install -y java-1.8.0-openjdk-devel

Verify installation:

java -version

Ensure the output shows Java 1.8.0.

Step 2: Download and Extract Hadoop

Download the latest stable Hadoop release from the Apache website. For example, to download Hadoop 3.3.4:

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz

Extract the tarball to /usr/local and rename the directory for simplicity:

sudo tar -xzvf hadoop-3.3.4.tar.gz -C /usr/local/
sudo mv /usr/local/hadoop-3.3.4 /usr/local/hadoop

Step 3: Configure Hadoop Environment Variables

Set up environment variables to make Hadoop commands accessible globally. Create a new file /etc/profile.d/hadoop.sh:

sudo nano /etc/profile.d/hadoop.sh

Add the following lines (adjust paths if Hadoop is installed elsewhere):

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Make the file executable and apply changes:

sudo chmod +x /etc/profile.d/hadoop.sh
source /etc/profile.d/hadoop.sh

Verify Hadoop installation:

hadoop version

Step 4: Configure HDFS Core Files

Edit Hadoop configuration files in $HADOOP_HOME/etc/hadoop to define HDFS behavior.

4.1 core-site.xml

This file configures the default file system and NameNode address. Replace namenode with your NameNode’s hostname:

<
    configuration>
    
    <
    property>
    
        <
    name>
    fs.defaultFS<
    /name>
    
        <
    value>
    hdfs://namenode:9000<
    /value>
    
    <
    /property>
    
    <
    property>
    
        <
    name>
    hadoop.tmp.dir<
    /name>
    
        <
    value>
    /usr/local/hadoop/tmp<
    /value>
    
    <
    /property>
    
<
    /configuration>

4.2 hdfs-site.xml

This file sets HDFS-specific parameters like replication factor and data directories. Create directories for NameNode and DataNode data:

sudo mkdir -p /usr/local/hadoop/data/namenode
sudo mkdir -p /usr/local/hadoop/data/datanode
sudo chown -R $(whoami):$(whoami) /usr/local/hadoop/data

Add the following configurations to hdfs-site.xml:

<
    configuration>
    
    <
    property>
    
        <
    name>
    dfs.replication<
    /name>
    
        <
    value>
    3<
    /value>
     <
    !-- Adjust based on your cluster size (e.g., 1 for standalone) -->
    
    <
    /property>
    
    <
    property>
    
        <
    name>
    dfs.namenode.name.dir<
    /name>
    
        <
    value>
    /usr/local/hadoop/data/namenode<
    /value>
    
    <
    /property>
    
    <
    property>
    
        <
    name>
    dfs.datanode.data.dir<
    /name>
    
        <
    value>
    /usr/local/hadoop/data/datanode<
    /value>
    
    <
    /property>
    
    <
    property>
    
        <
    name>
    dfs.permissions.enabled<
    /name>
    
        <
    value>
    false<
    /value>
     <
    !-- Disable permissions for testing (enable in production) -->
    
    <
    /property>
    
<
    /configuration>

Optional: mapred-site.xml and yarn-site.xml

If using YARN for resource management, configure these files:

mapred-site.xml (create if it doesn’t exist):

<
    configuration>
    
    <
    property>
    
        <
    name>
    mapreduce.framework.name<
    /name>
    
        <
    value>
    yarn<
    /value>
    
    <
    /property>
    
<
    /configuration>

yarn-site.xml:

<
    configuration>
    
    <
    property>
    
        <
    name>
    yarn.nodemanager.aux-services<
    /name>
    
        <
    value>
    mapreduce_shuffle<
    /value>
    
    <
    /property>
    
    <
    property>
    
        <
    name>
    yarn.nodemanager.aux-services.mapreduce.shuffle.class<
    /name>
    
        <
    value>
    org.apache.hadoop.mapred.ShuffleHandler<
    /value>
    
    <
    /property>
    
    <
    property>
    
        <
    name>
    yarn.resourcemanager.hostname<
    /name>
    
        <
    value>
    namenode<
    /value>
     <
    !-- Replace with your ResourceManager hostname -->
    
    <
    /property>
    
<
    /configuration>

Step 5: Format the NameNode

The NameNode must be formatted before first use to initialize its storage. Run this command on the NameNode:

hdfs namenode -format

Follow the prompts to complete formatting. This step creates the necessary directory structure and metadata files.

Step 6: Start HDFS

Start the HDFS services using the start-dfs.sh script (run from the NameNode):

start-dfs.sh

Check the status of HDFS daemons with:

jps

You should see NameNode, DataNode, and SecondaryNameNode processes running.

Step 7: Verify HDFS

Confirm HDFS is operational by:

Web UI: Open a browser and navigate to http://< namenode-ip> :50070 (replace < namenode-ip> with your NameNode’s IP). You should see the HDFS dashboard with cluster information.
Command Line: List the root directory to verify HDFS is accessible:
```
hdfs dfs -ls /
```

Step 8: Stop HDFS (Optional)

To stop HDFS services, run:

stop-dfs.sh

Troubleshooting Tips

“Permission Denied” Errors: Ensure Hadoop directories have the correct ownership (chown -R hadoop:hadoop /usr/local/hadoop).
Port Conflicts: Verify no other services are using Hadoop ports (e.g., 9000, 50070) using netstat -tuln.
Daemon Failures: Check logs in $HADOOP_HOME/logs for errors (e.g., NameNode.log, DataNode.log).

By following these steps, you’ll have a fully functional HDFS deployment on CentOS, ready to store and manage large datasets in a distributed environment.

声明：本文内容由网友自发贡献，本站不承担相应法律责任。对本内容有异议或投诉，请联系2913721942#qq.com核实处理，我们将尽快回复您，谢谢合作！

若转载请注明出处： CentOS HDFS部署指南
本文地址： https://pptw.com/jishu/724006.html

CentOS HDFS与YARN集成实践 Overlay技术如何助力容器化部署