Spark for Data Science.
Annotation
Saved in:
Online Access: |
Full text (MCPHS users only) |
---|---|
Main Author: | |
Format: | Electronic eBook |
Language: | English |
Published: |
Packt Publishing,
2016
|
Edition: | 1. |
Subjects: | |
Local Note: | ProQuest Ebook Central |
MARC
LEADER | 00000cam a2200000ua 4500 | ||
---|---|---|---|
001 | in00000291137 | ||
006 | m o d | ||
007 | cr |n||||||||| | ||
008 | 161118s2016 xx o 000 0 eng d | ||
005 | 20240703151023.9 | ||
020 | |a 1785884778 |q (ebk) | ||
020 | |a 9781785884771 | ||
020 | |z 1785885650 | ||
029 | 1 | |a AU@ |b 000066233138 | |
029 | 1 | |a CHNEW |b 000949169 | |
029 | 1 | |a CHVBK |b 483153435 | |
035 | |a (OCoLC)963606569 | ||
035 | |a (OCoLC)ocn963606569 | ||
037 | |a 958872 |b MIL | ||
040 | |a IDEBK |b eng |e pn |c IDEBK |d OCLCQ |d IDEBK |d COO |d EBLCP |d MERUC |d REB |d CHVBK |d OCLCQ |d OCLCF |d OCLCO |d OCL |d OCLCQ |d OCLCO |d LVT |d UKAHL |d OCLCQ |d OCLCO |d OCLCQ |d OCLCO |d OCLCL | ||
050 | 4 | |a T55.4-60.8 | |
082 | 0 | 4 | |a 005.7 |2 23 |
100 | 1 | |a Srinivas Duvvuri; Bikramaditya Singhal. | |
245 | 1 | 0 | |a Spark for Data Science. |
250 | |a 1. | ||
260 | |b Packt Publishing, |c 2016. | ||
300 | |a 1 online resource (344) | ||
336 | |a text |b txt |2 rdacontent | ||
337 | |a computer |b c |2 rdamedia | ||
338 | |a online resource |b cr |2 rdacarrier | ||
505 | 0 | |a Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Big Data and Data Science -- An Introduction; Big data overview; Challenges with big data analytics; Computational challenges; Analytical challenges; Evolution of big data analytics; Spark for data analytics; The Spark stack; Spark core; Spark SQL; Spark streaming; MLlib; GraphX; SparkR; Summary; References; Chapter 2: The Spark Programming Model; The programming paradigm; Supported programming languages; Scala; Java; Python; R; Choosing the right language. | |
505 | 8 | |a The Spark engineDriver program; The Spark shell; SparkContext; Worker nodes; Executors; Shared variables; Flow of execution; The RDD API; RDD basics; Persistence; RDD operations; Creating RDDs; Transformations on normal RDDs; The filter operation; The distinct operation; The intersection operation; The union operation; The map operation; The flatMap operation; The keys operation; The cartesian operation; Transformations on pair RDDs; The groupByKey operation; The join operation; The reduceByKey operation; The aggregate operation; Actions; The collect() function; The count() function. | |
505 | 8 | |a The take(n) functionThe first() function; The takeSample() function; The countByKey() function; Summary; References; Chapter 3: Introduction to DataFrames; Why DataFrames?; Spark SQL; The Catalyst optimizer; The DataFrame API; DataFrame basics; RDDs versus DataFrames; Similarities; Differences; Creating DataFrames; Creating DataFrames from RDDs; Creating DataFrames from JSON; Creating DataFrames from databases using JDBC; Creating DataFrames from Apache Parquet; Creating DataFrames from other data sources; DataFrame operations; Under the hood; Summary; References. | |
505 | 8 | |a Chapter 4: Unified Data AccessData abstractions in Apache Spark; Datasets; Working with Datasets; Creating Datasets from JSON; Datasets API's limitations; Spark SQL; SQL operations; Under the hood; Structured Streaming; The Spark streaming programming model; Under the hood; Comparison with other streaming engines; Continuous applications; Summary; References; Data Analysis on Chapter 5: Spark; Data analytics life cycle; Data acquisition; Data preparation; Data consolidation; Data cleansing; Missing value treatment; Outlier treatment; Duplicate values treatment; Data transformation. | |
505 | 8 | |a Basics of statisticsSampling; Simple random sample; Systematic sampling; Stratified sampling; Data distributions; Frequency distributions; Probability distributions; Descriptive statistics; Measures of location; Mean; Median; Mode; Measures of spread; Range; Variance; Standard deviation; Summary statistics; Graphical techniques; Inferential statistics; Discrete probability distributions; Bernoulli distribution; Binomial distribution; Sample problem; Poisson distribution; Sample problem; Continuous probability distributions; Normal distribution; Standard normal distribution. | |
520 | 8 | |a Annotation |b Analyze your data and delve deep into the world of machine learning with the latest Spark version, 2.0About This Book Perform data analysis and build predictive models on huge datasets that leverage Apache Spark Learn to integrate data science algorithms and techniques with the fast and scalable computing features of Spark to address big data challenges Work through practical examples on real-world problems with sample code snippetsWho This Book Is ForThis book is for anyone who wants to leverage Apache Spark for data science and machine learning. If you are a technologist who wants to expand your knowledge to perform data science operations in Spark, or a data scientist who wants to understand how algorithms are implemented in Spark, or a newbie with minimal development experience who wants to learn about Big Data Analytics, this book is for you!What You Will Learn Consolidate, clean, and transform your data acquired from various data sources Perform statistical analysis of data to find hidden insights Explore graphical techniques to see what your data looks like Use machine learning techniques to build predictive models Build scalable data products and solutions Start programming using the RDD, DataFrame and Dataset APIs Become an expert by improving your data analytical skillsIn DetailThis is the era of Big Data. The words 'Big Data' implies big innovation and enables a competitive advantage for businesses. Apache Spark was designed to perform Big Data analytics at scale, and so Spark is equipped with the necessary algorithms and supports multiple programming languages. Whether you are a technologist, a data scientist, or a beginner to Big Data analytics, this book will provide you with all the skills necessary to perform statistical data analysis, data visualization, predictive modeling, and build scalable data products or solutions using Python, Scala, and R. With ample case studies and real-world examples, Spark for Data Science will help you ensure the successful execution of your data science projects. Style and approachThis book takes a step-by-step approach to statistical analysis and machine learning, and is explained in a conversational and easy-to-follow style. Each topic is explained sequentially with a focus on the fundamentals as well as the advanced concepts of algorithms and techniques. Real-world examples with sample code snippets are also included. | |
588 | 0 | |a Print version record. | |
590 | |a ProQuest Ebook Central |b Ebook Central College Complete | ||
630 | 0 | 0 | |a Spark (Electronic resource : Apache Software Foundation) |
650 | 0 | |a Data mining. | |
650 | 0 | |a Machine learning. | |
758 | |i has work: |a Spark for data science (Text) |1 https://id.oclc.org/worldcat/entity/E39PCYMh3tY7xjjmrDHDjpTMk8 |4 https://id.oclc.org/worldcat/ontology/hasWork | ||
852 | |b E-Collections |h ProQuest | ||
856 | 4 | 0 | |u https://ebookcentral.proquest.com/lib/mcphs/detail.action?docID=4709436 |z Full text (MCPHS users only) |t 0 |
938 | |a Askews and Holts Library Services |b ASKH |n AH30656447 | ||
938 | |a ProQuest Ebook Central |b EBLB |n EBL4709436 | ||
938 | |a ProQuest MyiLibrary Digital eBook Collection |b IDEB |n cis34561501 | ||
947 | |a FLO |x pq-ebc-base | ||
999 | f | f | |s b4279438-981f-4c23-8c54-0320c7bd7908 |i 1cdbce1f-b9df-4529-b8f8-e227ce8df240 |t 0 |
952 | f | f | |a Massachusetts College of Pharmacy and Health Sciences |b Online |c Online |d E-Collections |t 0 |e ProQuest |h Other scheme |
856 | 4 | 0 | |t 0 |u https://ebookcentral.proquest.com/lib/mcphs/detail.action?docID=4709436 |y Full text (MCPHS users only) |