Spark for Data Science.

Annotation

Saved in:

Bibliographic Details
Online Access:	Full text (MCPHS users only)
Main Author:	Srinivas Duvvuri; Bikramaditya Singhal
Format:	Electronic eBook
Language:	English
Published:	Packt Publishing, 2016
Edition:	1.
Subjects:	Spark (Electronic resource : Apache Software Foundation) Data mining. Machine learning.
Local Note:	ProQuest Ebook Central

MARC


LEADER	00000cam a2200000ua 4500
001	in00000291137
006	m o d
007	cr \|n\|\|\|\|\|\|\|\|\|
008	161118s2016 xx o 000 0 eng d
005	20240703151023.9
020			\|a 1785884778 \|q (ebk)
020			\|a 9781785884771
020			\|z 1785885650
029	1		\|a AU@ \|b 000066233138
029	1		\|a CHNEW \|b 000949169
029	1		\|a CHVBK \|b 483153435
035			\|a (OCoLC)963606569
035			\|a (OCoLC)ocn963606569
037			\|a 958872 \|b MIL
040			\|a IDEBK \|b eng \|e pn \|c IDEBK \|d OCLCQ \|d IDEBK \|d COO \|d EBLCP \|d MERUC \|d REB \|d CHVBK \|d OCLCQ \|d OCLCF \|d OCLCO \|d OCL \|d OCLCQ \|d OCLCO \|d LVT \|d UKAHL \|d OCLCQ \|d OCLCO \|d OCLCQ \|d OCLCO \|d OCLCL
050		4	\|a T55.4-60.8
082	0	4	\|a 005.7 \|2 23
100	1		\|a Srinivas Duvvuri; Bikramaditya Singhal.
245	1	0	\|a Spark for Data Science.
250			\|a 1.
260			\|b Packt Publishing, \|c 2016.
300			\|a 1 online resource (344)
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
505	0		\|a Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Big Data and Data Science -- An Introduction; Big data overview; Challenges with big data analytics; Computational challenges; Analytical challenges; Evolution of big data analytics; Spark for data analytics; The Spark stack; Spark core; Spark SQL; Spark streaming; MLlib; GraphX; SparkR; Summary; References; Chapter 2: The Spark Programming Model; The programming paradigm; Supported programming languages; Scala; Java; Python; R; Choosing the right language.
505	8		\|a The Spark engineDriver program; The Spark shell; SparkContext; Worker nodes; Executors; Shared variables; Flow of execution; The RDD API; RDD basics; Persistence; RDD operations; Creating RDDs; Transformations on normal RDDs; The filter operation; The distinct operation; The intersection operation; The union operation; The map operation; The flatMap operation; The keys operation; The cartesian operation; Transformations on pair RDDs; The groupByKey operation; The join operation; The reduceByKey operation; The aggregate operation; Actions; The collect() function; The count() function.
505	8		\|a The take(n) functionThe first() function; The takeSample() function; The countByKey() function; Summary; References; Chapter 3: Introduction to DataFrames; Why DataFrames?; Spark SQL; The Catalyst optimizer; The DataFrame API; DataFrame basics; RDDs versus DataFrames; Similarities; Differences; Creating DataFrames; Creating DataFrames from RDDs; Creating DataFrames from JSON; Creating DataFrames from databases using JDBC; Creating DataFrames from Apache Parquet; Creating DataFrames from other data sources; DataFrame operations; Under the hood; Summary; References.
505	8		\|a Chapter 4: Unified Data AccessData abstractions in Apache Spark; Datasets; Working with Datasets; Creating Datasets from JSON; Datasets API's limitations; Spark SQL; SQL operations; Under the hood; Structured Streaming; The Spark streaming programming model; Under the hood; Comparison with other streaming engines; Continuous applications; Summary; References; Data Analysis on Chapter 5: Spark; Data analytics life cycle; Data acquisition; Data preparation; Data consolidation; Data cleansing; Missing value treatment; Outlier treatment; Duplicate values treatment; Data transformation.
505	8		\|a Basics of statisticsSampling; Simple random sample; Systematic sampling; Stratified sampling; Data distributions; Frequency distributions; Probability distributions; Descriptive statistics; Measures of location; Mean; Median; Mode; Measures of spread; Range; Variance; Standard deviation; Summary statistics; Graphical techniques; Inferential statistics; Discrete probability distributions; Bernoulli distribution; Binomial distribution; Sample problem; Poisson distribution; Sample problem; Continuous probability distributions; Normal distribution; Standard normal distribution.
520	8		\|a Annotation \|b Analyze your data and delve deep into the world of machine learning with the latest Spark version, 2.0About This Book Perform data analysis and build predictive models on huge datasets that leverage Apache Spark Learn to integrate data science algorithms and techniques with the fast and scalable computing features of Spark to address big data challenges Work through practical examples on real-world problems with sample code snippetsWho This Book Is ForThis book is for anyone who wants to leverage Apache Spark for data science and machine learning. If you are a technologist who wants to expand your knowledge to perform data science operations in Spark, or a data scientist who wants to understand how algorithms are implemented in Spark, or a newbie with minimal development experience who wants to learn about Big Data Analytics, this book is for you!What You Will Learn Consolidate, clean, and transform your data acquired from various data sources Perform statistical analysis of data to find hidden insights Explore graphical techniques to see what your data looks like Use machine learning techniques to build predictive models Build scalable data products and solutions Start programming using the RDD, DataFrame and Dataset APIs Become an expert by improving your data analytical skillsIn DetailThis is the era of Big Data. The words 'Big Data' implies big innovation and enables a competitive advantage for businesses. Apache Spark was designed to perform Big Data analytics at scale, and so Spark is equipped with the necessary algorithms and supports multiple programming languages. Whether you are a technologist, a data scientist, or a beginner to Big Data analytics, this book will provide you with all the skills necessary to perform statistical data analysis, data visualization, predictive modeling, and build scalable data products or solutions using Python, Scala, and R. With ample case studies and real-world examples, Spark for Data Science will help you ensure the successful execution of your data science projects. Style and approachThis book takes a step-by-step approach to statistical analysis and machine learning, and is explained in a conversational and easy-to-follow style. Each topic is explained sequentially with a focus on the fundamentals as well as the advanced concepts of algorithms and techniques. Real-world examples with sample code snippets are also included.
588	0		\|a Print version record.
590			\|a ProQuest Ebook Central \|b Ebook Central College Complete
630	0	0	\|a Spark (Electronic resource : Apache Software Foundation)
650		0	\|a Data mining.
650		0	\|a Machine learning.
758			\|i has work: \|a Spark for data science (Text) \|1 https://id.oclc.org/worldcat/entity/E39PCYMh3tY7xjjmrDHDjpTMk8 \|4 https://id.oclc.org/worldcat/ontology/hasWork
852			\|b E-Collections \|h ProQuest
856	4	0	\|u https://ebookcentral.proquest.com/lib/mcphs/detail.action?docID=4709436 \|z Full text (MCPHS users only) \|t 0
938			\|a Askews and Holts Library Services \|b ASKH \|n AH30656447
938			\|a ProQuest Ebook Central \|b EBLB \|n EBL4709436
938			\|a ProQuest MyiLibrary Digital eBook Collection \|b IDEB \|n cis34561501
947			\|a FLO \|x pq-ebc-base
999	f	f	\|s b4279438-981f-4c23-8c54-0320c7bd7908 \|i 1cdbce1f-b9df-4529-b8f8-e227ce8df240 \|t 0
952	f	f	\|a Massachusetts College of Pharmacy and Health Sciences \|b Online \|c Online \|d E-Collections \|t 0 \|e ProQuest \|h Other scheme
856	4	0	\|t 0 \|u https://ebookcentral.proquest.com/lib/mcphs/detail.action?docID=4709436 \|y Full text (MCPHS users only)

Spark for Data Science.

MARC

Similar Items