Tôi là Duyệt

Showing posts 11-19 of 19 from apache-spark topic (Page 2 of 2). Checking out all my favorite topics here.

Tue Sep 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time)Data

IPython Notebook là một công cụ tiện lợi cho Python. Ta có thể Debug chương trình PySpark Line-by-line trên IPython Notebook một cách dễ dàng, tiết kiệm được nhiều thời gian.

Thu Sep 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time)Data

PySpark - Thiếu thư viện Python trên Worker

Apache Spark chạy trên Cluster, với Java thì đơn giản. Với Python thì package python phải được cài trên từng Node của Worker. Nếu không bạn sẽ gặp phải lỗi thiếu thư viện.

Thu Jun 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time)Machine Learning

Chạy vnTokenizer trên môi trường Apache Spark

vnTokenizer là công cụ chuyên dùng tách từ, gán nhãn từ loại cho tiếng Việt, của tác giả Lê Hồng Phương. vnTokenizer được viết bằng Java, có thể sử dụng như Tools Command Line hoặc Programming.

Sat Dec 12 2015 00:00:00 GMT+0000 (Coordinated Universal Time)Data

Apache Spark on Docker

Docker and Spark are two technologies which are very hyped these days

Wed Dec 02 2015 00:00:00 GMT+0000 (Coordinated Universal Time)Data Engineer

Bigdata - Map-Reduce và bài toán Word Count

Map-Reduce là một giải pháp! Map-Reduce được phát minh bởi các kỹ sư Google để giải quyết bài toán xử lý một khối lượng dữ liệu cực lớn, vượt quá khả năng xử lý của một máy tính đơn có cấu hình khủng.

Mon Oct 26 2015 00:00:00 GMT+0000 (Coordinated Universal Time)Talk

Seminar - Giới thiệu Apache Spark và PredictionIO

Hôm nay buổi seminar về Apache Spark và giới thiệu PredictionIO tại ISLab (ĐH Công nghệ thông tin, KP6 phường Linh Trung) đã thành công tốt đẹp.

Tue Jul 14 2015 00:00:00 GMT+0000 (Coordinated Universal Time)News

Big Data - Monitoring Spark with Graphite and Grafana

This post I have read from [HammerLab](http://www.hammerlab.org/2015/02/27/monitoring-spark-with-graphite-and-grafana/), Contact me if Vietnamese version neccessary. In this post, they'll discuss using Graphite...

Sat Apr 18 2015 00:00:00 GMT+0000 (Coordinated Universal Time)Data

PySpark Getting Started

Hadoop is the standard tool for distributed computing across really large data sets and is the reason why you see "Big Data" on advertisements as you walk through the airport. It has become an operating system for Big Data, providing a rich ecosystem of tools and techniques that allow you to use a large cluster of relatively cheap commodity hardware to do computing at supercomputer scale. Two ideas from Google in 2003 and 2004 made Hadoop possible: a framework for distributed storage (The Google File System), which is implemented as HDFS in Hadoop, and a framework for distributed computing (MapReduce).

Fri Mar 27 2015 00:00:00 GMT+0000 (Coordinated Universal Time)BigData

Cài đặt Apache Spark trên Ubuntu 14.04

Trong lúc tìm hiểu vài thứ về BigData cho một số dự án, mình quyết định chọn Apache Spark thay cho Hadoop. Theo như giới thiệu từ trang chủ của Apache Spark, thì tốc độ của nó cao hơn 100x so với Hadoop MapReduce khi chạy trên bộ nhớ, và nhanh hơn 10x lần khi chạy trên đĩa, tương thích hầu hết các CSDL phân tán (HDFS, HBase, Cassandra, ...). Ta có thể sử dụng Java, Scala hoặc Python để triển khai các thuật toán trên Spark.

Showing posts 11-19 of 19 from apache-spark topic (Page 2 of 2). Checking out all my favorite topics here.

Tue Sep 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time)Data

Chạy Apache Spark với Jupyter Notebook

Thu Sep 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time)Data

PySpark - Thiếu thư viện Python trên Worker

Thu Jun 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time)Machine Learning

Chạy vnTokenizer trên môi trường Apache Spark

Sat Dec 12 2015 00:00:00 GMT+0000 (Coordinated Universal Time)Data

Apache Spark on Docker

Docker and Spark are two technologies which are very hyped these days

Wed Dec 02 2015 00:00:00 GMT+0000 (Coordinated Universal Time)Data Engineer

Bigdata - Map-Reduce và bài toán Word Count

Mon Oct 26 2015 00:00:00 GMT+0000 (Coordinated Universal Time)Talk

Seminar - Giới thiệu Apache Spark và PredictionIO

Hôm nay buổi seminar về Apache Spark và giới thiệu PredictionIO tại ISLab (ĐH Công nghệ thông tin, KP6 phường Linh Trung) đã thành công tốt đẹp.

Tue Jul 14 2015 00:00:00 GMT+0000 (Coordinated Universal Time)News

Big Data - Monitoring Spark with Graphite and Grafana

Sat Apr 18 2015 00:00:00 GMT+0000 (Coordinated Universal Time)Data

PySpark Getting Started

Fri Mar 27 2015 00:00:00 GMT+0000 (Coordinated Universal Time)BigData

Cài đặt Apache Spark trên Ubuntu 14.04

Showing posts 11-19 of 19 from apache-spark topic (Page 2 of 2). Checking out all my favorite topics here.

Resources

me@duyet.net

Showing posts 11-19 of 19 from apache-spark topic (Page 2 of 2). Checking out all my favorite topics here.