Spark: The Definitive Guide

Spark: The Definitive Guide
Author :
Publisher : "O'Reilly Media, Inc."
Total Pages : 594
Release :
ISBN-10 : 9781491912294
ISBN-13 : 1491912294
Rating : 4/5 (94 Downloads)

Book Synopsis Spark: The Definitive Guide by : Bill Chambers

Download or read book Spark: The Definitive Guide written by Bill Chambers and published by "O'Reilly Media, Inc.". This book was released on 2018-02-08 with total page 594 pages. Available in PDF, EPUB and Kindle. Book excerpt: Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Sparkâ??s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation

Learning Spark

Learning Spark
Author :
Publisher : "O'Reilly Media, Inc."
Total Pages : 289
Release :
ISBN-10 : 9781449359058
ISBN-13 : 1449359051
Rating : 4/5 (58 Downloads)

Book Synopsis Learning Spark by : Holden Karau

Download or read book Learning Spark written by Holden Karau and published by "O'Reilly Media, Inc.". This book was released on 2015-01-28 with total page 289 pages. Available in PDF, EPUB and Kindle. Book excerpt: Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates. Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning. Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm Learn how to deploy interactive, batch, and streaming applications Connect to data sources including HDFS, Hive, JSON, and S3 Master advanced topics like data partitioning and shared variables

Learning Spark

Learning Spark
Author :
Publisher : O'Reilly Media
Total Pages : 400
Release :
ISBN-10 : 9781492050018
ISBN-13 : 1492050016
Rating : 4/5 (18 Downloads)

Book Synopsis Learning Spark by : Jules S. Damji

Download or read book Learning Spark written by Jules S. Damji and published by O'Reilly Media. This book was released on 2020-07-16 with total page 400 pages. Available in PDF, EPUB and Kindle. Book excerpt: Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Author :
Publisher : "O'Reilly Media, Inc."
Total Pages : 687
Release :
ISBN-10 : 9781449338770
ISBN-13 : 1449338771
Rating : 4/5 (70 Downloads)

Book Synopsis Hadoop: The Definitive Guide by : Tom White

Download or read book Hadoop: The Definitive Guide written by Tom White and published by "O'Reilly Media, Inc.". This book was released on 2012-05-10 with total page 687 pages. Available in PDF, EPUB and Kindle. Book excerpt: Ready to unlock the power of your data? With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN). Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster—or run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems

Kafka: The Definitive Guide

Kafka: The Definitive Guide
Author :
Publisher : "O'Reilly Media, Inc."
Total Pages : 315
Release :
ISBN-10 : 9781491936115
ISBN-13 : 1491936118
Rating : 4/5 (15 Downloads)

Book Synopsis Kafka: The Definitive Guide by : Neha Narkhede

Download or read book Kafka: The Definitive Guide written by Neha Narkhede and published by "O'Reilly Media, Inc.". This book was released on 2017-08-31 with total page 315 pages. Available in PDF, EPUB and Kindle. Book excerpt: Every enterprise application creates data, whether it’s log messages, metrics, user activity, outgoing messages, or something else. And how to move all of this data becomes nearly as important as the data itself. If you’re an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds. Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer. Understand publish-subscribe messaging and how it fits in the big data ecosystem. Explore Kafka producers and consumers for writing and reading messages Understand Kafka patterns and use-case requirements to ensure reliable data delivery Get best practices for building data pipelines and applications with Kafka Manage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasks Learn the most critical metrics among Kafka’s operational measurements Explore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems

High Performance Spark

High Performance Spark
Author :
Publisher : "O'Reilly Media, Inc."
Total Pages : 356
Release :
ISBN-10 : 9781491943175
ISBN-13 : 1491943173
Rating : 4/5 (75 Downloads)

Book Synopsis High Performance Spark by : Holden Karau

Download or read book High Performance Spark written by Holden Karau and published by "O'Reilly Media, Inc.". This book was released on 2017-05-25 with total page 356 pages. Available in PDF, EPUB and Kindle. Book excerpt: Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing. With this book, you’ll explore: How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark’s key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark’s Streaming components and external community packages

Trino: The Definitive Guide

Trino: The Definitive Guide
Author :
Publisher : "O'Reilly Media, Inc."
Total Pages : 310
Release :
ISBN-10 : 9781098107680
ISBN-13 : 1098107683
Rating : 4/5 (80 Downloads)

Book Synopsis Trino: The Definitive Guide by : Matt Fuller

Download or read book Trino: The Definitive Guide written by Matt Fuller and published by "O'Reilly Media, Inc.". This book was released on 2021-04-14 with total page 310 pages. Available in PDF, EPUB and Kindle. Book excerpt: Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. With this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's Hive, Cassandra, a relational database, or a proprietary data store. Analysts, software engineers, and production engineers will learn how to manage, use, and even develop with Trino. Initially developed by Facebook, open source Trino is now used by Netflix, Airbnb, LinkedIn, Twitter, Uber, and many other companies. Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization. Get started: Explore Trino's use cases and learn about tools that will help you connect to Trino and query data Go deeper: Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more Put Trino in production: Secure Trino, monitor workloads, tune queries, and connect more applications; learn how other organizations apply Trino