Spark 集群部署

本文最后更新于 2024-06-06，文章已经超过60天没有更新，文章内容可能已经过时。

需要提前部署 hadoop 并启动

1、解压软件包

cd /opt/software
tar -xvf spark-3.1.1-bin-hadoop3.2.tgz
mv spark-3.1.1-bin-hadoop3.2 /opt/module/
cd /opt/module
mv spark-3.1.1-bin-hadoop3.2 spark-3.1.1

2、profile 配置

vim /etc/profile

#Spark
export SPARK_HOME=/opt/module/spark-3.1.1
export PATH=$PATH:$SPARK_HOME/bin

source /etc/profile

3、workers 配置

cp $SPARK_HOME/conf/workers.template $SPARK_HOME/conf/workers
vim $SPARK_HOME/conf/workers

master
slave1
slave2

4、spark-env 配置

cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh
vim $SPARK_HOME/conf/spark-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_212
SPARK_MASTER_HOST=master
SPARK_MASTER_PORT=7077

5、分发

scp -r /opt/module/spark-3.1.1 slave2:/opt/module/
scp -r /opt/module/spark-3.1.1 slave1:/opt/module/
scp -r /etc/profile slave1:/etc/
scp -r /etc/profile slave2:/etc/

6、启动

$SPARK_HOME/sbin/start-all.sh

7、配置历史服务

cd $SPARK_HOME/conf/
cp spark-defaults.conf.template spark-defaults.conf

vim $SPARK_HOME/conf/spark-defaults.sh
spark.eventLog.enabled      true
spark.eventLog.dir          hdfs://master:8020/directory

# 添加历史服务器配置
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080 
-Dspark.history.fs.logDirectory=hdfs://master:8020/directory 
-Dspark.history.retainedApplications=30"

# 分发
scp $SPARK_HOME/conf/spark-env.sh slave2:$SPARK_HOME/conf/
scp $SPARK_HOME/conf/spark-env.sh slave1:$SPARK_HOME/conf/

scp $SPARK_HOME/conf/spark-defaults.conf slave2:$SPARK_HOME/conf/
scp $SPARK_HOME/conf/spark-defaults.conf slave2:$SPARK_HOME/conf/

启动历史服务器

# 在HDFS创建一个历史服务器路径
hadoop fs -mkdir /directory

$SPARK_HOME/sbin/stop-all.sh && $SPARK_HOME/sbin/start-all.sh
# 启动历史服务器
$SPARK_HOME/sbin/stop-history-server.sh && $SPARK_HOME/sbin/start-history-server.sh


# 执行SparkPi测试任务
$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 $SPARK_HOME/examples/jars/spark-examples_2.12-3.1.1.jar 10

8、高可用

需要zookeeper

https://d5v.cc/archives/1695829443388

8.1 高可用用途

高可用可以提供容灾服务，当集群的唯一一个Master宕机后，还能用备用的Master接管

master	slave1	slave2
Master	Master	Master
Zookeeper	Zookeeper	Zookeeper
Worker	Worker	Worker

8.2 修改8080端口为8989

vim $SPARK_HOME/conf/spark-env.sh

# 注释掉 SPARK_MASTER_*
SPARK_MASTER_WEBUI_PORT=8989
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER 
-Dspark.deploy.zookeeper.url=master,slave1,slave2
-Dspark.deploy.zookeeper.dir=/spark"

scp $SPARK_HOME/conf/spark-env.sh slave1:$SPARK_HOME/conf/
scp $SPARK_HOME/conf/spark-env.sh slave2:$SPARK_HOME/conf/

# 重启spark集群
$SPARK_HOME/sbin/stop-all.sh
$SPARK_HOME/sbin/start-all.sh

8.3 单独启动 slave1 的master作为备用

$SPARK_HOME/sbin/start-master.sh

8.4 提交测试任务

$SPARK_HOME/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
$SPARK_HOME/examples/jars/spark-examples_2.12-3.1.1.jar \
10