Keita, Moussa (2017): Data Science sous Python: Algorithme, Statistique, DataViz, DataMining et Machine-Learning.
Preview |
PDF
MPRA_paper_76653.pdf Download (3MB) | Preview |
Abstract
Data Science is a technical discipline that associates statistical concepts to computer algorithms and calculations for processing and modeling mass data derived from observation phenomena (economic, industrial, commercial, financial, managerial, social, etc. ..). In the area of Business Intelligence, the Data Science has become an indispensable tool to help decision making for company managers in the sense that it allows to exploit and valorize the internal and external informational patrimony of the company. In recent years, Python has rapidly become one of the most used programming languages at by Data Scientists to exploit the growing potential of Big Data. The gain of popularity of this language, today, is largely explained by the numerous possibilities offered by its powerful libraries including that of numerical analysis and scientific computing (numpy, scipy, pandas), data visualization ( matplotlib) but also Machine Learning (scikit-learn). Presented in a pedagogical approach, this manuscript revisits the concepts essential for mastering Data Science with Python. The work is organized into seven chapters. The first chapter is is devoted to the presentation of the basics of programming on Python. The second chapter is devoted to the study of strings and regular expressions. The aim of this chapter is to familiarize with the processing and the use of strings values which constitute the values of variables commonly found in unstructured databases. The third chapter is devoted to presenting the methods of file management and text processing. The purpose of this chapter is to deepen the previous chapter by presenting the methods commonly used for the processing of unstructured data which are generally in the form of text files. The fourth chapter is devoted to the presentation of the methods of processing and organization of data originally stored as data tables. The fifth chapter is dedicated to presenting classical statistical analysis methods (descriptive analyzes, statistical tests, linear and logistic regression, ...). The sixth chapter is devoted to presenting of methods of datavisualization: histograms, bars graphs, pie-plots, box-plots, scatter-plots, trend curves, 3D graphs, ...). Finally, the seventh chapter is devoted to presenting of methods of data mining and machine-learning. In this chapter, we present methods such as data dimensions reductions (Principal Components Analysis, Factor Analysis, Multiple Correspondence Analysis) but also of classification methods (Hierarchical Classification, K-Means Clustering, Support Vector Machine, Random Forest).
Item Type: | MPRA Paper |
---|---|
Original Title: | Data Science sous Python: Algorithme, Statistique, DataViz, DataMining et Machine-Learning |
English Title: | Data Science with Python: Algorithm, Statistics, DataViz, DataMining and Machine-Learning |
Language: | French |
Keywords: | Programmation, langage Python, Data science, Traitement et analyses de données, data visualization. |
Subjects: | C - Mathematical and Quantitative Methods > C8 - Data Collection and Data Estimation Methodology ; Computer Programs |
Item ID: | 76653 |
Depositing User: | Moussa keita |
Date Deposited: | 07 Feb 2017 14:56 |
Last Modified: | 26 Sep 2019 08:30 |
References: | Biernat Éric et Lutz Michel, (2015), Data Science : fondamentaux et études de cas: Machine Learning avec Python et R, Edition Eyrolles Bird Steven, Klein Ewan, et Loper Edward Natural( 2009), Language Processing with Python, O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 Cordeau Bob et Pointal Laurent, (2010),Une introduction à Python 3, disponible (04/10/2016) à http://hebergement.u-psud.fr/iut-orsay/Pedagogie/MPHY/Python/courspython3.pdf Fuchs Patrick et Poulain Pierre, (2014), Cours de Python, Université Paris, Diderot, Disponible à http://www.dsimb.inserm.fr/~fuchs/python/cours_python.pdf (04/10/2016) Le GOFF Vincent,(2011), Apprenez à programmer avec Python, Le site du Zero, ISBN : 979-10-90085-03-9. Lutz Mark et Bailly Yves (2005), Python précis et concis, O’Reilly, 2e édition. Martelli Alex, (2004), Python en concentré, Edition O’Reilly. Rakotomalala Ricco, (2015), Machine learning avec scikit-learn, Programmation python, Université Lyon Lumière 2, disponible à http://eric.univ-lyon2.fr/~ricco/cours/cours_programmation_python.html Summerfield Mark, (2009), Programming in Python 3, Addison-Wesley, 2e edition. Swinnen, Gérard,(2010), Apprendre à programmer avec Python 3, Edition Eyrolles Ziadé Tarek, (2007), Python : Petit guide à l’usage du développeur agile, Edition Dunod. Ziadé Tarek,(2008), Expert Python Programming, Packt Publishing. Ziadé Tarek,(2009), Programmation Python. Conception et optimisation, Eyrolles, 2e édition. |
URI: | https://mpra.ub.uni-muenchen.de/id/eprint/76653 |