This hands-on training course delivers the key concepts and expertise developers need to use Apache Spark to develop high-performance parallel applications.
Participants will learn how to use Spark SQL to query structured data and Spark Streaming to perform real-time processing on streaming data from a variety of sources.
Developers will also practice writing applications that use core Spark to perform ETL processing and iterative algorithms.
The course covers how to work with “big data” stored in a distributed file system, and execute Spark applications on a Hadoop cluster.
After taking this course, participants will be prepared to face real-world challenges and build applications to execute faster decisions, better decisions, and interactive analysis, applied to a wide variety of use cases, architectures, and industries.
Code | Titre | Durée | Prix HT |
---|---|---|---|
HADP01 | Developer Training for Apache Spark™ and Hadoop | 4 jours | Nous consulter |
Objectifs
|
|||
Public Developers |
|||
Pré-requis Apache Spark examples and hands-on exercises are presented in Scala and Python. |
|||
Post-Formation |
|||
Méthodes 50% Pratique 50% Théorique |
|||
Programme IntroductionIntroduction to Apache Hadoop and the Hadoop Ecosystem Apache Hadoop File StorageApache Hadoop Cluster Components Distributed Processing on an Apache Hadoop ClusterYARN Architecture Apache Spark BasicsWhat is Apache Spark? Working with DataFrames and SchemasCreating DataFrames from Data Sources Analyzing Data with DataFrame QueriesQuerying DataFrames Using Column Expressions RDD OverviewRDD Overview Transforming Data with RDDsWriting and Passing Transformation Functions Aggregating Data with Pair RDDsKey-Value Pair RDDs Querying Tables and Views with SQLQuerying Tables in Spark Using SQL Working with Datasets in ScalaDatasets and DataFrames Writing, Configuring, and Running Spark ApplicationsWriting a Spark Application Spark Distributed ProcessingReview: Apache Spark on a Cluster Distributed Data PersistenceDataFrame and Dataset Persistence Common Patterns in Spark Data ProcessingCommon Apache Spark Use Cases Introduction to Structured StreamingApache Spark Streaming Overview Structured Streaming with Apache KafkaOverview Aggregating and Joining Streaming DataFramesStreaming Aggregation Message Processing with Apache KafkaWhat Is Apache Kafka? |
|||
Environnement |
|||
Mot-clés Hadoop Spark |
Rue du Lac Windermere, Byzance Center, Bloc A - 1053 Les Berges du Lac - Tunisie
Tel: (+216) 31 400 501
Fax: (+216) 32 400 501
Mobile: (+216) 55 666 600
E-mail: contact@formafast.com