2.3 Hadoop模块与生态圈

2016-03-04 01:35:18 4,268 0

一、Hadoop核心模块

在Hadoop2.0之后，Hadoop核心模块主要包括以下四个部分：

Hadoop Common: 支持hadoop其他模块的公用的工具类
Hadoop Distributed File System (HDFS™): Hadoop分布式文件存储系统
Hadoop YARN: Hadoop集群资源管理和运行中job的管理
Hadoop MapReduce: 基于YARN，用于并行处理大数据集的计算框架。

二、Hadoop生态圈

Hadoop生态圈中的技术，即基于Hadoop的基础之前发展和衍生出来的技术。简单来说，就像基于JDBC衍生出来的Ibatis、Hibernate等框架一样。在实际中，会极大地简化我们的开发。

Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
Avro™: A data serialization system.
Cassandra™: A scalable multi-master database with no single points of failure.
Chukwa™: A data collection system for managing large distributed systems.
HBase™: A scalable, distributed database that supports structured data storage for large tables.
Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
Mahout™: A Scalable machine learning and data mining library.
Pig™: A high-level data-flow language and execution framework for parallel computation.
Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.
ZooKeeper™: A high-performance coordination service for distributed applications.

上一篇：2.2 Hadoop2.x对Hadoop1.x的改进下一篇：2.4 Hadoop发行版选择

欢迎转载,请注明出处!!!