Scala for Data Science.

Leverage the power of Scala with different tools to build scalable, robust data science applicationsAbout This Book A complete guide for scalable data science solutions, from data ingestion to data visualization Deploy horizontally scalable data processing pipelines and take advantage of web framewo...

Full description

Saved in:

Bibliographic Details
Online Access:	Full text (MCPHS users only)
Main Author:	Bugnion, Pascal
Format:	Electronic eBook
Language:	English
Published:	Packt Publishing, 2016
Subjects:	Epitonium. Data mining. Data Mining
Local Note:	ProQuest Ebook Central

Table of Contents:

Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Scala and Data Science
Data science
Programming in data science
Why Scala?
Static typing and type inference
Scala encourages immutability
Scala and functional programs
Null pointer uncertainty
Easier parallelism
Interoperability with Java
When not to use Scala
Summary
References
Chapter 2: Manipulating Data with Breeze
Code examples
Installing Breeze
Getting help on Breeze
Basic Breeze data types
Vectors
Dense and sparse vectors and the vector trait
Matrices
Building vectors and matrices
Advanced indexing and slicing
Mutating vectors and matrices
Matrix multiplication, transposition, and the orientation of vectors
Data preprocessing and feature engineering
Breeze
function optimization
Numerical derivatives
Regularization
An example
logistic regression
Towards re-usable code
Alternatives to Breeze
Summary
References
Chapter 3: Plotting with breeze-viz
Diving into Breeze
Customizing plots
Customizing the line type
More advanced scatter plots
Multi-plot example
scatterplot matrix plots
Managing without documentation
Breeze-viz reference
Data visualization beyond breeze-viz
Summary
Chapter 4: Parallel Collections and Futures
Parallel collections
Limitations of parallel collections
Error handling
Setting the parallelism level
An example
cross-validation with parallel collections
Futures
Future composition
using a future's result
Blocking until completion
Controlling parallel execution with execution contexts
Futures example
stock price fetcher
Summary
References
Chapter 5: Scala and SQL through JDBC
Interacting with JDBC.
First steps with JDBC
Connecting to a database server
Creating tables
Inserting data
Reading data
JDBC summary
Functional wrappers for JDBC
Safer JDBC connections with the loan pattern
Enriching JDBC statements with the "pimp my library" pattern
Wrapping result sets in a stream
Looser coupling with type classes
Type classes
Coding against type classes
When to use type classes
Benefits of type classes
Creating a data access layer
Summary
References
Chapter 6: Slick
A Functional Interface for SQL
FEC data
Importing Slick
Defining the schema
Connecting to the database
Creating tables
Inserting data
Querying data
Invokers
Operations on columns
Aggregations with "Group by
Accessing database metadata
Slick versus JDBC
Summary
References
Chapter 7: Web APIs
A whirlwind tour of JSON
Querying web APIs
JSON in Scala
an exercise in pattern matching
JSON4S types
Extracting fields using XPath
Extraction using case classes
Concurrency and exception handling with futures
Authentication
adding HTTP headers
HTTP
a whirlwind overview
Adding headers to HTTP requests in Scala
Summary
References
Chapter 8: Scala and MongoDB
MongoDB
Connecting to MongoDB with Casbah
Connecting with authentication
Inserting documents
Extracting objects from the database
Complex queries
Casbah query DSL
Custom type serialization
Beyond Casbah
Summary
References
Chapter 9: Concurrency with Akka
GitHub follower graph
Actors as people
Hello world with Akka
Case classes as messages
Actor construction
Anatomy of an actor
Follower network crawler
Fetcher actors
Routing
Message passing between actors
Queue control and the pull pattern
Accessing the sender of a message.
Stateful actors
Follower network crawler
Fault tolerance
Custom supervisor strategies
Life-cycle hooks
What we have not talked about
Summary
References
Chapter 10: Distributed Batch Processing with Spark
Installing Spark
Acquiring the example data
Resilient distributed datasets
RDDs are immutable
RDDs are lazy
RDDs know their lineage
RDDs are resilient
RDDs are distributed
Transformations and actions on RDDs
Persisting RDDs
Key-value RDDs
Double RDDs
Building and running stand-alone programs
Running Spark applications locally
Reducing logging output and Spark configuration
Running Spark applications on EC2
Spam filtering
Lifting the hood
Data shuffling and partitions
Summary
Reference
Chapter 11: Spark SQL and DataFrames
DataFrames
a whirlwind introduction
Aggregation operations
Joining DataFrames together
Custom functions on DataFrames
DataFrame immutability and persistence
SQL statements on DataFrames
Complex data types
arrays, maps, and structs
Structs
Arrays
Maps
Interacting with data sources
JSON files
Parquet files
Standalone programs
Summary
References
Chapter 12: Distributed Machine Learning with MLlib
Introducing MLlib
Spam classification
Pipeline components
Transformers
Estimators
Evaluation
Regularization in logistic regression
Cross-validation and model selection
Beyond logistic regression
Summary
References
Chapter 13: Web APIs with Play
Client-server applications
Introduction to web frameworks
Model-View-Controller architecture
Single page applications
Building an application
The Play framework
Dynamic routing
Actions
Composing the response
Understanding and parsing the request
Interacting with JSON.
Querying external APIs and consuming JSON
Calling external web services
Parsing JSON
Asynchronous actions
Creating APIs with Play: a summary
Rest APIs: best practice
Summary
References
Chapter 14: Visualization with D3 and the Play Framework
GitHub user data
Do I need a backend?
JavaScript dependencies through web-jars
Towards a web application: HTML templates
Modular JavaScript through RequireJS
Bootstrapping the applications
Client-side program architecture
Designing the model
The event bus
AJAX calls through JQuery
Response views
Drawing plots with NVD3
Summary
References
Appendix: Pattern Matching and Extractors
Pattern matching in for comprehensions
Pattern matching internals
Extracting sequences
Summary
Reference
Index.

Scala for Data Science.

Similar Items