Introduction Apache Spark

$95.00

Category:

Description

Introduction to Apache Spark is designed to introduce you to one of the most important Big Data technologies on the market, Apache Spark. You will start by learning about some of the basic concepts behind Spark, including the Resilient Distributed Datasets which tie everything together. From there, you will learn how to work with datasets in Spark using a functional programming approach as well as SQL. Finally, you will learn how to use the Eclipse IDE to write programs to work with data, learning a common technique for deploying code for Apache Spark jobs.

Instructor

MITCHELL PEARSON

BI Consultant and Trainer
As a Business Intelligence Consultant and Trainer for Pragmatic Works, Mitchell’s focus is on the full BI Stack (SSIS, SSAS and SSRS). In addition to the BI Stack, he also has experience with Data Modeling, T-SQL, MDX, Power Pivot and the Power BI Tools. Mitchell graduated from the University of North Florida in 2007 and is constantly expanding his knowledge on all things SQL Server.

 

What to Know Before the Class

The target audience of this course is an application or database developer interested in learning about Big Data technologies.  No knowledge of Spark or Hadoop is assumed.  Knowledge of development languages like Java, C#, or Python are helpful but not required.


Duration: 9:16:20
Introduction to Apache Spark is designed to introduce you to one of the most important Big Data technologies on the market, Apache Spark. You will start by learning about some of the basic concepts behind Spark, including the Resilient Distributed Datasets which tie everything together. From there, you will learn how to work with datasets in Spark using a functional programming approach as well as SQL. Finally, you will learn how to use the Eclipse IDE to write programs to work with data, learning a common technique for deploying code for Apache Spark jobs.
Introduction to Apache Spark - What you need to get started
Module 00 - Introduction to Apache Spark
08:21
Module 01A - Getting Started with Apache Spark (Introduction)
38:50
Module 01B - Getting Started with Apache Spark (Installing Spark and IntelliJ IDEA)
49:44
Module 02A - Learning with Spark-Shell (Introduction)
36:57
Module 02B - Learning with Spark-Shell (Key Spark Functions)
58:53
Module 02C - Learning with Spark-Shell (Reviewing the Word Count App)
21:49
Module 02D - Learning with Spark-Shell (Custom Functions)
12:44
Module 03A - Spark SQL (Introduction)
30:21
Module 03B - Spark SQL (Functional Spark SQL)
51:26
Module 03C - Spark SQL (The Query Approach)
44:37
Module 03D - Spark SQL (The Combines Approach)
52:37
Module 03E - Spark SQL (User Defined Functions)
22:35
Module 04A - Administration (Deploying Spark Jobs)
26:50
Module 04B - Administration (IntelliJ IDEA)
22:08
Module 04C - Administration (Passing in Parameters)
25:50
Module 04D - Administration (Debugging with IntelliJ IDEA)
13:57
Module 04E - Administration (Expanding Projects)
38:41
Class Survey

Spark does not publish minimum requirements for single-node machines like VMs or laptops, but at least 8 GB of RAM is recommended. Spark can run on any edition of Windows, Linux, or Mac OS which supports Oracle Java 1.8.