|
|
|
|
LEADER |
00000cam a2200000uu 4500 |
001 |
in00000345527 |
006 |
m o d |
007 |
cr |n|---||||| |
008 |
170624s2016 enk o 000 0 eng d |
005 |
20240725195350.1 |
020 |
|
|
|a 9781785886423
|
020 |
|
|
|a 1785886428
|
029 |
1 |
|
|a CHNEW
|b 000961523
|
029 |
1 |
|
|a CHVBK
|b 491699336
|
029 |
1 |
|
|a AU@
|b 000070078298
|
035 |
|
|
|a (OCoLC)990674354
|
035 |
|
|
|a (OCoLC)ocn990674354
|
040 |
|
|
|a EBLCP
|b eng
|e pn
|c EBLCP
|d MERUC
|d CHVBK
|d OCLCQ
|d OCLCO
|d OCLCF
|d OCLCQ
|d LVT
|d OCLCQ
|d LOA
|d OCLCO
|d K6U
|d OCLCQ
|d OCLCO
|d OCLCL
|
050 |
|
4 |
|a QA76.9.D343
|b .D83 2017
|
082 |
0 |
4 |
|a 006.312
|2 23
|
100 |
1 |
|
|a Dua, Rajdeep.
|
245 |
1 |
0 |
|a Machine Learning with Spark - Second Edition.
|
250 |
|
|
|a 2nd ed.
|
260 |
|
|
|a Birmingham :
|b Packt Publishing,
|c 2016.
|
300 |
|
|
|a 1 online resource (523 pages)
|
336 |
|
|
|a text
|b txt
|2 rdacontent
|
337 |
|
|
|a computer
|b c
|2 rdamedia
|
338 |
|
|
|a online resource
|b cr
|2 rdacarrier
|
505 |
0 |
|
|a Cover -- Credits -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Getting Up and Running with Spark -- Installing and setting up Spark locally -- Spark clusters -- The Spark programming model -- SparkContext and SparkConf -- SparkSession -- The Spark shell -- Resilient Distributed Datasets -- Creating RDDs -- Spark operations -- Caching RDDs -- Broadcast variables and accumulators -- SchemaRDD -- Spark data frame -- The first step to a Spark program in Scala -- The first step to a Spark program in Java -- The first step to a Spark program in Python -- The first step to a Spark program in R -- SparkR DataFrames -- Getting Spark running on Amazon EC2 -- Launching an EC2 Spark cluster -- Configuring and running Spark on Amazon Elastic Map Reduce -- UI in Spark -- Supported machine learning algorithms by Spark -- Benefits of using Spark ML as compared to existing libraries -- Spark Cluster on Google Compute Engine -- DataProc -- Hadoop and Spark Versions -- Creating a Cluster -- Submitting a Job -- Summary -- Chapter 2: Math for Machine Learning -- Linear algebra -- Setting up the Scala environment in Intellij -- Setting up the Scala environment on the Command Line -- Fields -- Real numbers -- Complex numbers -- Vectors -- Vector spaces -- Vector types -- Vectors in Breeze -- Vectors in Spark -- Vector operations -- Hyperplanes -- Vectors in machine learning -- Matrix -- Types of matrices -- Matrix in Spark -- Distributed matrix in Spark -- Matrix operations -- Determinant -- Eigenvalues and eigenvectors -- Singular value decomposition -- Matrices in machine learning -- Functions -- Function types -- Functional composition -- Hypothesis -- Gradient descent -- Prior, likelihood, and posterior -- Calculus -- Differential calculus -- Integral calculus.
|
505 |
8 |
|
|a Lagranges multipliers -- Plotting -- Summary -- Chapter 3: Designing a Machine Learning System -- What is Machine Learning? -- Introducing MovieStream -- Business use cases for a machine learning system -- Personalization -- Targeted marketing and customer segmentation -- Predictive modeling and analytics -- Types of machine learning models -- The components of a data-driven machine learning system -- Data ingestion and storage -- Data cleansing and transformation -- Model training and testing loop -- Model deployment and integration -- Model monitoring and feedback -- Batch versus real time -- Data Pipeline in Apache Spark -- An architecture for a machine learning system -- Spark MLlib -- Performance improvements in Spark ML over Spark MLlib -- Comparing algorithms supported by MLlib -- Classification -- Clustering -- Regression -- MLlib supported methods and developer APIs -- Spark Integration -- MLlib vision -- MLlib versions compared -- Spark 1.6 to 2.0 -- Summary -- Chapter 4: Obtaining, Processing, and Preparing Data with Spark -- Accessing publicly available datasets -- The MovieLens 100k dataset -- Exploring and visualizing your data -- Exploring the user dataset -- Count by occupation -- Movie dataset -- Exploring the rating dataset -- Rating count bar chart -- Distribution of number ratings -- Processing and transforming your data -- Filling in bad or missing data -- Extracting useful features from your data -- Numerical features -- Categorical features -- Derived features -- Transforming timestamps into categorical features -- Extract time of Day -- Extract time of day -- Text features -- Simple text feature extraction -- Sparse Vectors from Titles -- Normalizing features -- Using ML for feature normalization -- Using packages for feature extraction -- TFID -- IDF -- Word2Vector -- Skip-gram model -- Standard scalar -- Summary.
|
505 |
8 |
|
|a Chapter 5: Building a Recommendation Engine with Spark -- Types of recommendation models -- Content-based filtering -- Collaborative filtering -- Matrix factorization -- Explicit matrix factorization -- Implicit Matrix Factorization -- Basic model for Matrix Factorization -- Alternating least squares -- Extracting the right features from your data -- Extracting features from the MovieLens 100k dataset -- Training the recommendation model -- Training a model on the MovieLens 100k dataset -- Training a model using Implicit feedback data -- Using the recommendation model -- ALS Model recommendations -- User recommendations -- Generating movie recommendations from the MovieLens 100k dataset -- Inspecting the recommendations -- Item recommendations -- Generating similar movies for the MovieLens 100k dataset -- Inspecting the similar items -- Evaluating the performance of recommendation models -- ALS Model Evaluation -- Mean Squared Error -- Mean Average Precision at K -- Using MLlib's built-in evaluation functions -- RMSE and MSE -- MAP -- FP-Growth algorithm -- FP-Growth Basic Sample -- FP-Growth Applied to Movie Lens Data -- Summary -- Chapter 6: Building a Classification Model with Spark -- Types of classification models -- Linear models -- Logistic regression -- Multinomial logistic regression -- Visualizing the StumbleUpon dataset -- Extracting features from the Kaggle/StumbleUpon evergreen classification dataset -- StumbleUponExecutor -- Linear support vector machines -- The naive Bayes model -- Decision trees -- Ensembles of trees -- Random Forests -- Gradient-Boosted Trees -- Multilayer perceptron classifier -- Extracting the right features from your data -- Training classification models -- Training a classification model on the Kaggle/StumbleUpon evergreen classification dataset -- Using classification models.
|
505 |
8 |
|
|a Generating predictions for the Kaggle/StumbleUpon evergreen classification dataset -- Evaluating the performance of classification models -- Accuracy and prediction error -- Precision and recall -- ROC curve and AUC -- Improving model performance and tuning parameters -- Feature standardization -- Additional features -- Using the correct form of data -- Tuning model parameters -- Linear models -- Iterations -- Step size -- Regularization -- Decision trees -- Tuning tree depth and impurity -- The naive Bayes model -- Cross-validation -- Summary -- Chapter 7: Building a Regression Model with Spark -- Types of regression models -- Least squares regression -- Decision trees for regression -- Evaluating the performance of regression models -- Mean Squared Error and Root Mean Squared Error -- Mean Absolute Error -- Root Mean Squared Log Error -- The R-squared coefficient -- Extracting the right features from your data -- Extracting features from the bike sharing dataset -- Training and using regression models -- BikeSharingExecutor -- Training a regression model on the bike sharing dataset -- Linear regression -- Generalized linear regression -- Decision tree regression -- Ensembles of trees -- Random forest regression -- Gradient boosted tree regression -- Improving model performance and tuning parameters -- Transforming the target variable -- Impact of training on log-transformed targets -- Tuning model parameters -- Creating training and testing sets to evaluate parameters -- Splitting data for Decision tree -- The impact of parameter settings for linear models -- Iterations -- Step size -- L2 regularization -- L1 regularization -- Intercept -- The impact of parameter settings for the decision tree -- Tree depth -- Maximum bins -- The impact of parameter settings for the Gradient Boosted Trees -- Iterations -- MaxBins -- Summary.
|
505 |
8 |
|
|a Chapter 8: Building a Clustering Model with Spark -- Types of clustering models -- k-means clustering -- Initialization methods -- Mixture models -- Hierarchical clustering -- Extracting the right features from your data -- Extracting features from the MovieLens dataset -- K-means -- training a clustering model -- Training a clustering model on the MovieLens dataset -- K-means -- interpreting cluster predictions on the MovieLens dataset -- Interpreting the movie clusters -- Interpreting the movie clusters -- K-means -- evaluating the performance of clustering models -- Internal evaluation metrics -- External evaluation metrics -- Computing performance metrics on the MovieLens dataset -- Effect of iterations on WSSSE -- Bisecting KMeans -- Bisecting K-means -- training a clustering model -- WSSSE and iterations -- Gaussian Mixture Model -- Clustering using GMM -- Plotting the user and item data with GMM clustering -- GMM -- effect of iterations on cluster boundaries -- Summary -- Chapter 9: Dimensionality Reduction with Spark -- Types of dimensionality reduction -- Principal components analysis -- Singular value decomposition -- Relationship with matrix factorization -- Clustering as dimensionality reduction -- Extracting the right features from your data -- Extracting features from the LFW dataset -- Exploring the face data -- Visualizing the face data -- Extracting facial images as vectors -- Loading images -- Converting to grayscale and resizing the images -- Extracting feature vectors -- Normalization -- Training a dimensionality reduction model -- Running PCA on the LFW dataset -- Visualizing the Eigenfaces -- Interpreting the Eigenfaces -- Using a dimensionality reduction model -- Projecting data using PCA on the LFW dataset -- The relationship between PCA and SVD -- Evaluating dimensionality reduction models.
|
588 |
0 |
|
|a Print version record.
|
590 |
|
|
|a ProQuest Ebook Central
|b Ebook Central Academic Complete
|
630 |
0 |
0 |
|a Spark (Electronic resource : Apache Software Foundation)
|
650 |
|
0 |
|a Machine learning.
|
700 |
1 |
|
|a Ghotra, Manpreet Singh.
|
700 |
1 |
|
|a Pentreath, Nick.
|
758 |
|
|
|i has work:
|a Machine Learning with Spark - Second Edition (Text)
|1 https://id.oclc.org/worldcat/entity/E39PCYTyG3Gm8QRPqQCBFw8kCP
|4 https://id.oclc.org/worldcat/ontology/hasWork
|
776 |
0 |
8 |
|i Print version:
|a Dua, Rajdeep.
|t Machine Learning with Spark - Second Edition.
|d Birmingham : Packt Publishing, ©2016
|
852 |
|
|
|b E-Collections
|h ProQuest
|
856 |
4 |
0 |
|u https://ebookcentral.proquest.com/lib/mcphs/detail.action?docID=4853045
|z Full text (MCPHS users only)
|t 0
|
938 |
|
|
|a EBL - Ebook Library
|b EBLB
|n EBL4853045
|
947 |
|
|
|a FLO
|x pq-ebc-base
|
999 |
f |
f |
|s 6906d250-3fe6-4589-85b4-68d2d9e109da
|i 079c6bac-969e-4b59-a0f8-1b0d26b67ec0
|t 0
|
952 |
f |
f |
|a Massachusetts College of Pharmacy and Health Sciences
|b Online
|c Online
|d E-Collections
|t 0
|e ProQuest
|h Other scheme
|
856 |
4 |
0 |
|t 0
|u https://ebookcentral.proquest.com/lib/mcphs/detail.action?docID=4853045
|y Full text (MCPHS users only)
|