Cloudera Data Engineering: Developing Applications with Apache Spark Course
- 32 Hours
- 4/8 Lectures
Cloudera Data Engineering: Developing Applications with Apache Spark Course Overview
Register for the Cloudera Data Engineering: Developing Applications with Apache Spark training course offered by MeeshaSoftware, which has been certified by Cloudera. You will study the important ideas and information that developers need to know in order to utilize Apache Spark to construct high-performance, parallel applications on the Cloudera Data Platform while enrolled in this course (CDP).
You will learn how to create Spark applications that link with CDP key components, including Hive and Kafka, via a combination of hands-on experiments and interactive lectures. This will allow you to take full use of the CDP ecosystem. Learn how to deal with "big data" that is stored in a distributed file system, as well as how to query structured data using Spark SQL, do real-time processing on streaming data using Spark Streaming, and utilize Spark to interact with structured data using Spark Streaming.
The training is geared for software developers as well as data engineers.
Objectives of Instruction:
You will be able to, if you have successfully completed this Cloudera Data Engineering: Developing Applications with Apache Spark course:
- Within a CDP cluster, data may be distributed, stored, and processed.
- Applications may be written, configured, and deployed using Apache Spark.
- Exploration, processing, and analysis of distributed data may be accomplished with the help of Spark's interpreters and applications.
- Spark SQL, DataFrames, and Hive tables may be used to query the data.
- When you need to process a stream of data, use Spark Streaming in conjunction with Kafka.
You are going to learn:
Module 1: Introduction to Zeppelin in the First Module.
- Why Use Notebooks?
- Zeppelin Notes Demo: Apache Spark In 5 Minutes
Module 2: HDFS Introduction
- HDFS Overview
- Components of HDFS and Their Interactions
- Additional Interactions with the HDFS
- Working with HDFS is the exercise for the Ozone Overview.
Module 3: YARN Introduction
- YARN Overview
- Components of YARN and Their Interactions
- Exercise: Working with YARN Activity: Working with YARN
Module 4: The Distributed Processing History is covered in Module.
- The Decade of the Disc: 2000 to 2010
- The Years That Will Be Remembered: 2010 to 2020
- The Years of the GPU: 2020
Module 5:Working with RDDs is covered in Module.
- Datasets that are Resilient and Distributed (RDDs)
- Working with RDDs is the exercise.
Module 6: Working with Dataframes is the topic of the sixth module.
An Explanation of the DataFrame Format
Module 7: Working with Dataframes is covered in the Module of the Hive tutorial.
Module 8: Integration of Hive with Spark is Covered in Module.
Integration of Hive and Spark Exercise: Integrating Hive with Spark
Module 9: Data Visualization using Zeppelin is the Topic of Module.
- An Overview of Zeppelin's Data Visualization Capabilities
- Collaboration Exercise Between Zeppelin and AdventureWorks Using Zeppelin Analytics
Module 10: Challenges Faced by Distributed Processing, which is the Topic of Module:
Shuffle \sSkew \sOrder
Module 11: Spark Distributed Processing, which is covered in Module.
Explore Query Execution Order As Part Of Spark's Distributed Processing Exercise
Module 12: Persistence Storage Levels for Spark Distributed DataFrames and Datasets:
- Spark Distributed Persistence DataFrames and Datasets
- Exercise on Viewing Persisted RDDs and Working with Persisting Data Frames
Module 13: Writing, Configuring, and Running Spark Applications is the Topic of the Module.
- Building and Operating an Application Application Deployment Mode Writing a Spark Application Building an Application
- Configuring Application Properties Using the Web User Interface for the Spark Application Exercise: Application Development with Spark: Writing, Configuring, and Running
Module 14: Introduction to Structured Streaming is covered in Module 14 of the course.
- Structured Streaming: An Exercising Introduction
- Exercise: Processing Real-Time Data Streams
Module 15: Message Processing with Apache Kafka is the Topic of the Module.
- What exactly is this Apache Kafka?
- Apache Kafka Overview Scaling Apache Kafka
- The Architecture of the Apache Kafka Cluster
- Tools for the Command Line Used with Apache Kafka
Module 16: Structured Streaming using Apache Kafka is the Topic
- Receiving Kafka Messages
- Sending Kafka Messages
- Work with Kafka as a kind of exercise Messages Constantly Flowing
Module 17: Aggregating and Joining Streaming DataFrames
Exercise: Aggregating and Joining Streaming DataFrames Streaming Aggregation and Joining Streaming DataFrames Streaming Aggregation and Joining Streaming DataFrames
Module 18: Working with Datasets in Scala is Covered in Appendix.
- Using Datasets in the Scala Programming Language
- Exercise: Using Datasets in the Scala Programming Language
Course Reviews - 0
- 46 Views
- 32 Hours
- 4/8 Lectures