Cài Apache Spark standalone bản pre-built

Mình nhận được nhiều phản hồi từ bài viết BigData - Cài đặt Apache Spark trên Ubuntu 14.04 rằng sao cài khó và phức tạp thế. Thực ra bài viết đó mình hướng dẫn cách build và install từ source.

Thực tế, Spark còn hỗ trợ cho ta nhiều phiên bản pre-built cùng với Hadoop. Pre-build tức Spark đã được build sẵn và chỉ cần sử dụng thôi. Cách làm như sau.

Note (2025): This tutorial uses outdated versions (Java 7, older Spark versions). For current installations, use Java 11+ and the latest Spark version from the official website. The general installation steps remain similar.

1. Cài đặt Java

Nếu chưa cài thì bạn cài theo cách sau:

$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

2. Tải Apache Spark bản pre-built

Chọn phiên bản thích hợp và tải Apache Spark từ Website: https://spark.apache.org/downloads.html Nhớ chọn dòng "Pre-built for Apache Hadoop ...."

3. Sử dụng

Giải nén và mở terminal tại thư mục spark, và sử dụng thôi.

Test thuật toán tính số PI.

$ ./bin/run-example SparkPi 10
...
Pi is roughly 3.139484

Nếu bạn nào dùng PySpark có thể mở thử PySpark Shell để kiểm tra bài toán WordCount:

Mở PySpark Shell bằng lệnh:

$ ./bin/pyspark

Kết quả của WordCount:

Bạn đã cài đặt xong Apache Spark với 3 bước siêu đơn giản. Bạn có thể tham khảo thêm phần Sử dụng spark-submit ở bài viết cũ trước đây: BigData - Cài đặt Apache Spark trên Ubuntu 14.04

Chúc bạn thành công.

Thực tế, Spark còn hỗ trợ cho ta nhiều phiên bản pre-built cùng với Hadoop. Pre-build tức Spark đã được build sẵn và chỉ cần sử dụng thôi. Cách làm như sau.

Note (2025): This tutorial uses outdated versions (Java 7, older Spark versions). For current installations, use Java 11+ and the latest Spark version from the official website. The general installation steps remain similar.

1. Cài đặt Java

Nếu chưa cài thì bạn cài theo cách sau:

$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

2. Tải Apache Spark bản pre-built

Chọn phiên bản thích hợp và tải Apache Spark từ Website: https://spark.apache.org/downloads.html Nhớ chọn dòng "Pre-built for Apache Hadoop ...."

3. Sử dụng

Giải nén và mở terminal tại thư mục spark, và sử dụng thôi.

Test thuật toán tính số PI.

$ ./bin/run-example SparkPi 10
...
Pi is roughly 3.139484

Nếu bạn nào dùng PySpark có thể mở thử PySpark Shell để kiểm tra bài toán WordCount:

Mở PySpark Shell bằng lệnh:

$ ./bin/pyspark

Kết quả của WordCount:

Chúc bạn thành công.

Resources

me@duyet.net

Cài Apache Spark standalone bản pre-built

1. Cài đặt Java

2. Tải Apache Spark bản pre-built

3. Sử dụng

Related Posts

PySpark - Thiếu thư viện Python trên Worker

Chạy Apache Spark với Jupyter Notebook

PySpark Getting Started

Cài đặt Apache Spark trên Ubuntu 14.04

Cài Apache Spark standalone bản pre-built

1. Cài đặt Java

2. Tải Apache Spark bản pre-built

3. Sử dụng

Related Posts

PySpark - Thiếu thư viện Python trên Worker

Chạy Apache Spark với Jupyter Notebook

PySpark Getting Started

Cài đặt Apache Spark trên Ubuntu 14.04