Scala for Data Science.

Leverage the power of Scala with different tools to build scalable, robust data science applicationsAbout This Book A complete guide for scalable data science solutions, from data ingestion to data visualization Deploy horizontally scalable data processing pipelines and take advantage of web framewo...

Full description

Saved in:
Bibliographic Details
Online Access: Full text (MCPHS users only)
Main Author: Bugnion, Pascal
Format: Electronic eBook
Language:English
Published: Packt Publishing, 2016
Subjects:
Local Note:ProQuest Ebook Central
Table of Contents:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Scala and Data Science
  • Data science
  • Programming in data science
  • Why Scala?
  • Static typing and type inference
  • Scala encourages immutability
  • Scala and functional programs
  • Null pointer uncertainty
  • Easier parallelism
  • Interoperability with Java
  • When not to use Scala
  • Summary
  • References
  • Chapter 2: Manipulating Data with Breeze
  • Code examples
  • Installing Breeze
  • Getting help on Breeze
  • Basic Breeze data types
  • Vectors
  • Dense and sparse vectors and the vector trait
  • Matrices
  • Building vectors and matrices
  • Advanced indexing and slicing
  • Mutating vectors and matrices
  • Matrix multiplication, transposition, and the orientation of vectors
  • Data preprocessing and feature engineering
  • Breeze
  • function optimization
  • Numerical derivatives
  • Regularization
  • An example
  • logistic regression
  • Towards re-usable code
  • Alternatives to Breeze
  • Summary
  • References
  • Chapter 3: Plotting with breeze-viz
  • Diving into Breeze
  • Customizing plots
  • Customizing the line type
  • More advanced scatter plots
  • Multi-plot example
  • scatterplot matrix plots
  • Managing without documentation
  • Breeze-viz reference
  • Data visualization beyond breeze-viz
  • Summary
  • Chapter 4: Parallel Collections and Futures
  • Parallel collections
  • Limitations of parallel collections
  • Error handling
  • Setting the parallelism level
  • An example
  • cross-validation with parallel collections
  • Futures
  • Future composition
  • using a future's result
  • Blocking until completion
  • Controlling parallel execution with execution contexts
  • Futures example
  • stock price fetcher
  • Summary
  • References
  • Chapter 5: Scala and SQL through JDBC
  • Interacting with JDBC.
  • First steps with JDBC
  • Connecting to a database server
  • Creating tables
  • Inserting data
  • Reading data
  • JDBC summary
  • Functional wrappers for JDBC
  • Safer JDBC connections with the loan pattern
  • Enriching JDBC statements with the "pimp my library" pattern
  • Wrapping result sets in a stream
  • Looser coupling with type classes
  • Type classes
  • Coding against type classes
  • When to use type classes
  • Benefits of type classes
  • Creating a data access layer
  • Summary
  • References
  • Chapter 6: Slick
  • A Functional Interface for SQL
  • FEC data
  • Importing Slick
  • Defining the schema
  • Connecting to the database
  • Creating tables
  • Inserting data
  • Querying data
  • Invokers
  • Operations on columns
  • Aggregations with "Group by
  • Accessing database metadata
  • Slick versus JDBC
  • Summary
  • References
  • Chapter 7: Web APIs
  • A whirlwind tour of JSON
  • Querying web APIs
  • JSON in Scala
  • an exercise in pattern matching
  • JSON4S types
  • Extracting fields using XPath
  • Extraction using case classes
  • Concurrency and exception handling with futures
  • Authentication
  • adding HTTP headers
  • HTTP
  • a whirlwind overview
  • Adding headers to HTTP requests in Scala
  • Summary
  • References
  • Chapter 8: Scala and MongoDB
  • MongoDB
  • Connecting to MongoDB with Casbah
  • Connecting with authentication
  • Inserting documents
  • Extracting objects from the database
  • Complex queries
  • Casbah query DSL
  • Custom type serialization
  • Beyond Casbah
  • Summary
  • References
  • Chapter 9: Concurrency with Akka
  • GitHub follower graph
  • Actors as people
  • Hello world with Akka
  • Case classes as messages
  • Actor construction
  • Anatomy of an actor
  • Follower network crawler
  • Fetcher actors
  • Routing
  • Message passing between actors
  • Queue control and the pull pattern
  • Accessing the sender of a message.
  • Stateful actors
  • Follower network crawler
  • Fault tolerance
  • Custom supervisor strategies
  • Life-cycle hooks
  • What we have not talked about
  • Summary
  • References
  • Chapter 10: Distributed Batch Processing with Spark
  • Installing Spark
  • Acquiring the example data
  • Resilient distributed datasets
  • RDDs are immutable
  • RDDs are lazy
  • RDDs know their lineage
  • RDDs are resilient
  • RDDs are distributed
  • Transformations and actions on RDDs
  • Persisting RDDs
  • Key-value RDDs
  • Double RDDs
  • Building and running stand-alone programs
  • Running Spark applications locally
  • Reducing logging output and Spark configuration
  • Running Spark applications on EC2
  • Spam filtering
  • Lifting the hood
  • Data shuffling and partitions
  • Summary
  • Reference
  • Chapter 11: Spark SQL and DataFrames
  • DataFrames
  • a whirlwind introduction
  • Aggregation operations
  • Joining DataFrames together
  • Custom functions on DataFrames
  • DataFrame immutability and persistence
  • SQL statements on DataFrames
  • Complex data types
  • arrays, maps, and structs
  • Structs
  • Arrays
  • Maps
  • Interacting with data sources
  • JSON files
  • Parquet files
  • Standalone programs
  • Summary
  • References
  • Chapter 12: Distributed Machine Learning with MLlib
  • Introducing MLlib
  • Spam classification
  • Pipeline components
  • Transformers
  • Estimators
  • Evaluation
  • Regularization in logistic regression
  • Cross-validation and model selection
  • Beyond logistic regression
  • Summary
  • References
  • Chapter 13: Web APIs with Play
  • Client-server applications
  • Introduction to web frameworks
  • Model-View-Controller architecture
  • Single page applications
  • Building an application
  • The Play framework
  • Dynamic routing
  • Actions
  • Composing the response
  • Understanding and parsing the request
  • Interacting with JSON.
  • Querying external APIs and consuming JSON
  • Calling external web services
  • Parsing JSON
  • Asynchronous actions
  • Creating APIs with Play: a summary
  • Rest APIs: best practice
  • Summary
  • References
  • Chapter 14: Visualization with D3 and the Play Framework
  • GitHub user data
  • Do I need a backend?
  • JavaScript dependencies through web-jars
  • Towards a web application: HTML templates
  • Modular JavaScript through RequireJS
  • Bootstrapping the applications
  • Client-side program architecture
  • Designing the model
  • The event bus
  • AJAX calls through JQuery
  • Response views
  • Drawing plots with NVD3
  • Summary
  • References
  • Appendix: Pattern Matching and Extractors
  • Pattern matching in for comprehensions
  • Pattern matching internals
  • Extracting sequences
  • Summary
  • Reference
  • Index.