Scala Machine Learning Projects : Build real-world machine learning and deep learning projects with Scala.
Scala is one of the widely used programming language in the world when it comes to handle large amount of data. With the rise of machine learning, data scientists and machine learning experts do prefer scala as a language in order to handle and scale efficient machine learning applications. You will...
Saved in:
Online Access: |
Full text (MCPHS users only) |
---|---|
Main Author: | |
Format: | Electronic eBook |
Language: | English |
Published: |
Birmingham :
Packt Publishing,
2018
|
Subjects: | |
Local Note: | ProQuest Ebook Central |
Table of Contents:
- Cover
- Copyright and Credits
- Packt Upsell
- Contributors
- Table of Contents
- Preface
- Chapter 1: Analyzing Insurance Severity Claims
- Machine learning and learning workflow
- Typical machine learning workflow
- Hyperparameter tuning and cross-validation
- Analyzing and predicting insurance severity claims
- Motivation
- Description of the dataset
- Exploratory analysis of the dataset
- Data preprocessing
- LR for predicting insurance severity claims
- Developing insurance severity claims predictive model using LR
- GBT regressor for predicting insurance severity claims
- Boosting the performance using random forest regressor
- Random Forest for classification and regression
- Comparative analysis and model deployment
- Spark-based model deployment for large-scale dataset
- Summary
- Chapter 2: Analyzing and Predicting Telecommunication Churn
- Why do we perform churn analysis, and how do we do it?
- Developing a churn analytics pipeline
- Description of the dataset
- Exploratory analysis and feature engineering
- LR for churn prediction
- SVM for churn prediction
- DTs for churn prediction
- Random Forest for churn prediction
- Selecting the best model for deployment
- Summary
- Chapter 3: High Frequency Bitcoin Price Prediction from Historical and Live Data
- Bitcoin, cryptocurrency, and online trading
- State-of-the-art automated trading of Bitcoin
- Training
- Prediction
- High-level data pipeline of the prototype
- Historical and live-price data collection
- Historical data collection
- Transformation of historical data into a time series
- Assumptions and design choices
- Data preprocessing
- Real-time data through the Cryptocompare API
- Model training for prediction
- Scala Play web service
- Concurrency through Akka actors
- Web service workflow
- JobModule
- Scheduler.
- SchedulerActor
- PredictionActor and the prediction step
- TraderActor
- Predicting prices and evaluating the model
- Demo prediction using Scala Play framework
- Why RESTful architecture?
- Project structure
- Running the Scala Play web app
- Summary
- Chapter 4: Population-Scale Clustering and Ethnicity Prediction
- Population scale clustering and geographic ethnicity
- Machine learning for genetic variants
- 1000 Genomes Projects dataset description
- Algorithms, tools, and techniques
- H2O and Sparkling water
- ADAM for large-scale genomics data processing
- Unsupervised machine learning
- Population genomics and clustering
- How does K-means work?
- DNNs for geographic ethnicity prediction
- Configuring programming environment
- Data pre-processing and feature engineering
- Model training and hyperparameter tuning
- Spark-based K-means for population-scale clustering
- Determining the number of optimal clusters
- Using H2O for ethnicity prediction
- Using random forest for ethnicity prediction
- Summary
- Chapter 5: Topic Modeling
- A Better Insight into Large-Scale Texts
- Topic modeling and text clustering
- How does LDA algorithm work?
- Topic modeling with Spark MLlib and Stanford NLP
- Implementation
- Step 1
- Creating a Spark session
- Step 2
- Creating vocabulary and tokens count to train the LDA after text pre-processing
- Step 3
- Instantiate the LDA model before training
- Step 4
- Set the NLP optimizer
- Step 5
- Training the LDA model
- Step 6
- Prepare the topics of interest
- Step 7
- Topic modelling
- Step 8
- Measuring the likelihood of two documents
- Other topic models versus the scalability of LDA
- Deploying the trained LDA model
- Summary
- Chapter 6: Developing Model-based Movie Recommendation Engines
- Recommendation system
- Collaborative filtering approaches.
- Content-based filtering approaches
- Hybrid recommender systems
- Model-based collaborative filtering
- The utility matrix
- Spark-based movie recommendation systems
- Item-based collaborative filtering for movie similarity
- Step 1
- Importing necessary libraries and creating a Spark session
- Step 2
- Reading and parsing the dataset
- Step 3
- Computing similarity
- Step 4
- Testing the model
- Model-based recommendation with Spark
- Data exploration
- Movie recommendation using ALS
- Step 1
- Import packages, load, parse, and explore the movie and rating dataset
- Step 2
- Register both DataFrames as temp tables to make querying easier
- Step 3
- Explore and query for related statistics
- Step 4
- Prepare training and test rating data and check the counts
- Step 5
- Prepare the data for building the recommendation model using ALS
- Step 6
- Build an ALS user product matrix
- Step 7
- Making predictions
- Step 8
- Evaluating the model
- Selecting and deploying the best model
- Summary
- Chapter 7: Options Trading Using Q-learning and Scala Play Framework
- Reinforcement versus supervised and unsupervised learning
- Using RL
- Notation, policy, and utility in RL
- Policy
- Utility
- A simple Q-learning implementation
- Components of the Q-learning algorithm
- States and actions in QLearning
- The search space
- The policy and action-value
- QLearning model creation and training
- QLearning model validation
- Making predictions using the trained model
- Developing an options trading web app using Q-learning
- Problem description
- Implementating an options trading web application
- Creating an option property
- Creating an option model
- Putting it altogether
- Evaluating the model
- Wrapping up the options trading app as a Scala web app
- The backend
- The frontend
- Running and Deployment Instructions.
- Model deployment
- Summary
- Clients Chapter 8: Subscription Assessment for Bank Telemarketing using Deep Neural Networks
- Client subscription assessment through telemarketing
- Dataset description
- Installing and getting started with Apache Zeppelin
- Building from the source
- Starting and stopping Apache Zeppelin
- Creating notebooks
- Exploratory analysis of the dataset
- Label distribution
- Job distribution
- Marital distribution
- Education distribution
- Default distribution
- Housing distribution
- Loan distribution
- Contact distribution
- Month distribution
- Day distribution
- Previous outcome distribution
- Age feature
- Duration distribution
- Campaign distribution
- Pdays distribution
- Previous distribution
- emp_var_rate distributions
- cons_price_idx features
- cons_conf_idx distribution
- Euribor3m distribution
- nr_employed distribution
- Statistics of numeric features
- Implementing a client subscription assessment model
- Hyperparameter tuning and feature selection
- Number of hidden layers
- Number of neurons per hidden layer
- Activation functions
- Weight and bias initialization
- Regularization
- Summary
- Chapter 9: Fraud Analytics Using Autoencoders and Anomaly Detection
- Outlier and anomaly detection
- Autoencoders and unsupervised learning
- Working principles of an autoencoder
- Efficient data representation with autoencoders
- Developing a fraud analytics model
- Description of the dataset and using linear models
- Problem description
- Preparing programming environment
- Step 1
- Loading required packages and libraries
- Step 2
- Creating a Spark session and importing implicits
- Step 3
- Loading and parsing input data
- Step 4
- Exploratory analysis of the input data
- Step 5
- Preparing the H2O DataFrame
- Step 6
- Unsupervised pre-training using autoencoder.
- Step 7
- Dimensionality reduction with hidden layers
- Step 8
- Anomaly detection
- Step 9
- Pre-trained supervised model
- Step 10
- Model evaluation on the highly-imbalanced data
- Step 11
- Stopping the Spark session and H2O context
- Auxiliary classes and methods
- Hyperparameter tuning and feature selection
- Summary
- Chapter 10: Human Activity Recognition using Recurrent Neural Networks
- Working with RNNs
- Contextual information and the architecture of RNNs
- RNN and the long-term dependency problem
- LSTM networks
- Human activity recognition using the LSTM model
- Dataset description
- Setting and configuring MXNet for Scala
- Implementing an LSTM model for HAR
- Step 1
- Importing necessary libraries and packages
- Step 2
- Creating MXNet context
- Step 3
- Loading and parsing the training and test set
- Step 4
- Exploratory analysis of the dataset
- Step 5
- Defining internal RNN structure and LSTM hyperparameters
- Step 6
- LSTM network construction
- Step 7
- Setting up an optimizer
- Step 8
- Training the LSTM network
- Step 9
- Evaluating the model
- Tuning LSTM hyperparameters and GRU
- Summary
- Chapter 11: Image Classification using Convolutional Neural Networks
- Image classification and drawbacks of DNNs
- CNN architecture
- Convolutional operations
- Pooling layer and padding operations
- Subsampling operations
- Convolutional and subsampling operations in DL4j
- Configuring DL4j, ND4s, and ND4j
- Convolutional and subsampling operations in DL4j
- Large-scale image classification using CNN
- Problem description
- Description of the image dataset
- Workflow of the overall project
- Implementing CNNs for image classification
- Image processing
- Extracting image metadata
- Image feature extraction
- Preparing the ND4j dataset
- Training the CNNs and saving the trained models.