Scala for Data Science.
Leverage the power of Scala with different tools to build scalable, robust data science applicationsAbout This Book A complete guide for scalable data science solutions, from data ingestion to data visualization Deploy horizontally scalable data processing pipelines and take advantage of web framewo...
Saved in:
Online Access: |
Full text (MCPHS users only) |
---|---|
Main Author: | |
Format: | Electronic eBook |
Language: | English |
Published: |
Packt Publishing,
2016
|
Subjects: | |
Local Note: | ProQuest Ebook Central |
Table of Contents:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Table of Contents
- Preface
- Chapter 1: Scala and Data Science
- Data science
- Programming in data science
- Why Scala?
- Static typing and type inference
- Scala encourages immutability
- Scala and functional programs
- Null pointer uncertainty
- Easier parallelism
- Interoperability with Java
- When not to use Scala
- Summary
- References
- Chapter 2: Manipulating Data with Breeze
- Code examples
- Installing Breeze
- Getting help on Breeze
- Basic Breeze data types
- Vectors
- Dense and sparse vectors and the vector trait
- Matrices
- Building vectors and matrices
- Advanced indexing and slicing
- Mutating vectors and matrices
- Matrix multiplication, transposition, and the orientation of vectors
- Data preprocessing and feature engineering
- Breeze
- function optimization
- Numerical derivatives
- Regularization
- An example
- logistic regression
- Towards re-usable code
- Alternatives to Breeze
- Summary
- References
- Chapter 3: Plotting with breeze-viz
- Diving into Breeze
- Customizing plots
- Customizing the line type
- More advanced scatter plots
- Multi-plot example
- scatterplot matrix plots
- Managing without documentation
- Breeze-viz reference
- Data visualization beyond breeze-viz
- Summary
- Chapter 4: Parallel Collections and Futures
- Parallel collections
- Limitations of parallel collections
- Error handling
- Setting the parallelism level
- An example
- cross-validation with parallel collections
- Futures
- Future composition
- using a future's result
- Blocking until completion
- Controlling parallel execution with execution contexts
- Futures example
- stock price fetcher
- Summary
- References
- Chapter 5: Scala and SQL through JDBC
- Interacting with JDBC.
- First steps with JDBC
- Connecting to a database server
- Creating tables
- Inserting data
- Reading data
- JDBC summary
- Functional wrappers for JDBC
- Safer JDBC connections with the loan pattern
- Enriching JDBC statements with the "pimp my library" pattern
- Wrapping result sets in a stream
- Looser coupling with type classes
- Type classes
- Coding against type classes
- When to use type classes
- Benefits of type classes
- Creating a data access layer
- Summary
- References
- Chapter 6: Slick
- A Functional Interface for SQL
- FEC data
- Importing Slick
- Defining the schema
- Connecting to the database
- Creating tables
- Inserting data
- Querying data
- Invokers
- Operations on columns
- Aggregations with "Group by
- Accessing database metadata
- Slick versus JDBC
- Summary
- References
- Chapter 7: Web APIs
- A whirlwind tour of JSON
- Querying web APIs
- JSON in Scala
- an exercise in pattern matching
- JSON4S types
- Extracting fields using XPath
- Extraction using case classes
- Concurrency and exception handling with futures
- Authentication
- adding HTTP headers
- HTTP
- a whirlwind overview
- Adding headers to HTTP requests in Scala
- Summary
- References
- Chapter 8: Scala and MongoDB
- MongoDB
- Connecting to MongoDB with Casbah
- Connecting with authentication
- Inserting documents
- Extracting objects from the database
- Complex queries
- Casbah query DSL
- Custom type serialization
- Beyond Casbah
- Summary
- References
- Chapter 9: Concurrency with Akka
- GitHub follower graph
- Actors as people
- Hello world with Akka
- Case classes as messages
- Actor construction
- Anatomy of an actor
- Follower network crawler
- Fetcher actors
- Routing
- Message passing between actors
- Queue control and the pull pattern
- Accessing the sender of a message.
- Stateful actors
- Follower network crawler
- Fault tolerance
- Custom supervisor strategies
- Life-cycle hooks
- What we have not talked about
- Summary
- References
- Chapter 10: Distributed Batch Processing with Spark
- Installing Spark
- Acquiring the example data
- Resilient distributed datasets
- RDDs are immutable
- RDDs are lazy
- RDDs know their lineage
- RDDs are resilient
- RDDs are distributed
- Transformations and actions on RDDs
- Persisting RDDs
- Key-value RDDs
- Double RDDs
- Building and running stand-alone programs
- Running Spark applications locally
- Reducing logging output and Spark configuration
- Running Spark applications on EC2
- Spam filtering
- Lifting the hood
- Data shuffling and partitions
- Summary
- Reference
- Chapter 11: Spark SQL and DataFrames
- DataFrames
- a whirlwind introduction
- Aggregation operations
- Joining DataFrames together
- Custom functions on DataFrames
- DataFrame immutability and persistence
- SQL statements on DataFrames
- Complex data types
- arrays, maps, and structs
- Structs
- Arrays
- Maps
- Interacting with data sources
- JSON files
- Parquet files
- Standalone programs
- Summary
- References
- Chapter 12: Distributed Machine Learning with MLlib
- Introducing MLlib
- Spam classification
- Pipeline components
- Transformers
- Estimators
- Evaluation
- Regularization in logistic regression
- Cross-validation and model selection
- Beyond logistic regression
- Summary
- References
- Chapter 13: Web APIs with Play
- Client-server applications
- Introduction to web frameworks
- Model-View-Controller architecture
- Single page applications
- Building an application
- The Play framework
- Dynamic routing
- Actions
- Composing the response
- Understanding and parsing the request
- Interacting with JSON.
- Querying external APIs and consuming JSON
- Calling external web services
- Parsing JSON
- Asynchronous actions
- Creating APIs with Play: a summary
- Rest APIs: best practice
- Summary
- References
- Chapter 14: Visualization with D3 and the Play Framework
- GitHub user data
- Do I need a backend?
- JavaScript dependencies through web-jars
- Towards a web application: HTML templates
- Modular JavaScript through RequireJS
- Bootstrapping the applications
- Client-side program architecture
- Designing the model
- The event bus
- AJAX calls through JQuery
- Response views
- Drawing plots with NVD3
- Summary
- References
- Appendix: Pattern Matching and Extractors
- Pattern matching in for comprehensions
- Pattern matching internals
- Extracting sequences
- Summary
- Reference
- Index.