Keita, Moussa (2021): Big Data et Technologies de Stockage et de Traitement des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK).
Preview |
PDF
MPRA_paper_110334.pdf Download (3MB) | Preview |
Abstract
Over the past decade, many technological solutions have been designed to meet the multiple challenges of Big Data, namely the problematic of storing and processing huge volumes of data generated at continuous pace. Two major concepts are at the heart of the solutions designed to meet the challenges: storage in distributed architecture and parallelized processing. HADOOP is one of the first frameworks that implemented this approach. In this document, we provide a general overview of the HADOOP framework, its main functionalities as well as some technological layers that form its ecosystem. First, we present the basic components of HADOOP technology: HDFS, MAPREDUCE and YARN. And secondly, we present some tools that allow exploiting data stored in HADOOP environment. Especially, we present HIVE a query engine, HBASE a distributed database, KAFKA a tool of ingestion and integration of streams of data and SPARK a parallelized data processing engine.
Item Type: | MPRA Paper |
---|---|
Original Title: | Big Data et Technologies de Stockage et de Traitement des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK) |
English Title: | Big Data and Technologies of Storage and Processing of Massive Data: Understand the basics of the HADOOP ecosystem (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA and SPARK) |
Language: | French |
Keywords: | Big data, data Science, Hadoop, HDFS, MAPREDUCE, YARN, Spark, Kafka, Hbase, java, python, scala |
Subjects: | C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs |
Item ID: | 110334 |
Depositing User: | Moussa keita |
Date Deposited: | 24 Oct 2021 14:59 |
Last Modified: | 24 Oct 2021 15:00 |
References: | Chambers, Bill, (2017), Spark: The Definitive Guide. O'Reilly Media. Chang, et al., (2006), Bigtable: A Distributed Storage System for Structured Data Dean, J., Ghemawat S., (2004), MapReduce: Simplified Data Processing on Large Clusters: 137–150. Dimiduk, Nick; Khurana, Amandeep, (2012), HBase in Action (1st ed.). Manning Publications. p. 350. ISBN 978-1617290527. George, Lars, (2011), HBase: The Definitive Guide (1st ed.). O'Reilly Media. p. 556. ISBN 978-1449396107. Ghemawat et al., (2003), The Google file system, Proceedings of the nineteenth ACM Symposium on Operating Systems Principles Lam, Chuck, (2010), Hadoop in Action (1st ed.). Manning Publications. p. 325. ISBN 978-1-935-18219-1. Lejeune, J, (2015), Hadoop:une plate-forme d’exécution de programme Map-Reduce, École des Mines de Nantes, 83p. M. Grover, (2017), Zookeeper fundamentals, deployment, and applications. Renaut, B.,(2014), Hadoop/Big Data, Université de Nice Sophia-Antipolis, 114p Vohra, Deepak, (2016), Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools (1st ed.). Apress. p. 429. ISBN 978-1-4842-2199-0. White, T., (2015), Hadoop: the definitive guide, Beijing, China: Tsinghua University Press. Zaharia M. et al, (2010), Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. USENIX Symp. Networked Systems Design and Implementation |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/110334 |