Munich Personal RePEc Archive

Big Data et Technologies de Stockage et de Traitement des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK)

Keita, Moussa (2021): Big Data et Technologies de Stockage et de Traitement des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK).

Preview

PDF
MPRA_paper_110334.pdf
Download (3MB) | Preview

Abstract

Over the past decade, many technological solutions have been designed to meet the multiple challenges of Big Data, namely the problematic of storing and processing huge volumes of data generated at continuous pace. Two major concepts are at the heart of the solutions designed to meet the challenges: storage in distributed architecture and parallelized processing. HADOOP is one of the first frameworks that implemented this approach. In this document, we provide a general overview of the HADOOP framework, its main functionalities as well as some technological layers that form its ecosystem. First, we present the basic components of HADOOP technology: HDFS, MAPREDUCE and YARN. And secondly, we present some tools that allow exploiting data stored in HADOOP environment. Especially, we present HIVE a query engine, HBASE a distributed database, KAFKA a tool of ingestion and integration of streams of data and SPARK a parallelized data processing engine.

Item Type:	MPRA Paper
Original Title:	Big Data et Technologies de Stockage et de Traitement des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK)
English Title:	Big Data and Technologies of Storage and Processing of Massive Data: Understand the basics of the HADOOP ecosystem (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA and SPARK)
Language:	French
Keywords:	Big data, data Science, Hadoop, HDFS, MAPREDUCE, YARN, Spark, Kafka, Hbase, java, python, scala
Subjects:	C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs
Item ID:	110334
Depositing User:	Moussa keita
Date Deposited:	24 Oct 2021 14:59
Last Modified:	24 Oct 2021 15:00
References:	Chambers, Bill, (2017), Spark: The Definitive Guide. O'Reilly Media. Chang, et al., (2006), Bigtable: A Distributed Storage System for Structured Data Dean, J., Ghemawat S., (2004), MapReduce: Simplified Data Processing on Large Clusters: 137–150. Dimiduk, Nick; Khurana, Amandeep, (2012), HBase in Action (1st ed.). Manning Publications. p. 350. ISBN 978-1617290527. George, Lars, (2011), HBase: The Definitive Guide (1st ed.). O'Reilly Media. p. 556. ISBN 978-1449396107. Ghemawat et al., (2003), The Google file system, Proceedings of the nineteenth ACM Symposium on Operating Systems Principles Lam, Chuck, (2010), Hadoop in Action (1st ed.). Manning Publications. p. 325. ISBN 978-1-935-18219-1. Lejeune, J, (2015), Hadoop:une plate-forme d’exécution de programme Map-Reduce, École des Mines de Nantes, 83p. M. Grover, (2017), Zookeeper fundamentals, deployment, and applications. Renaut, B.,(2014), Hadoop/Big Data, Université de Nice Sophia-Antipolis, 114p Vohra, Deepak, (2016), Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools (1st ed.). Apress. p. 429. ISBN 978-1-4842-2199-0. White, T., (2015), Hadoop: the definitive guide, Beijing, China: Tsinghua University Press. Zaharia M. et al, (2010), Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. USENIX Symp. Networked Systems Design and Implementation
URI:	https://mpra.ub.uni-muenchen.de/id/eprint/110334

All papers reproduced by permission. Reproduction and distribution subject to the approval of the copyright owners.

View Item

Atom RSS 1.0 RSS 2.0

Contact us: mpra@ub.uni-muenchen.de

This repository has been built using EPrints software.

MPRA is a RePEc service hosted by .