Big Data Hadoop Analyst Online Training course will ensure robust data processing applications using Apache Hadoop. Students will learn debugging, Hadoop development, and implementation of workflows and common algorithms. Students will also learn how to leverage Hive, Sqoop, Oozie, Flume, Pig, Yarn, and Hadoop Testing.
Preview
By end of this training you wold be provided with hadoop certification and learn:
The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysis.
The fundamentals of data ETL (extract, transform, load), ingestion, and processing with Hadoop tools.
How Pig, Hive, and Impala improve productivity for typical analysis tasks.
Joining diverse datasets to gain valuable business insight
Performing real-time, complex queries on datasets.
Course Contents
Day 1
Introduction
Hadoop Fundamentals
The Motivation for Hadoop
Hadoop Overview
Data Storage: HDFS
Distributed Data Processing:YARN,MapReduce, and Spark
Data Processing and Analysis: Pig, Hive,
and Impala
Data Integration: Sqoop
Other Hadoop Data Tools
Exercise Scenarios Explanation
Day 2
Introduction to Pig
What Is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig
Basic Data Analysis with Pig
Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions
Processing Complex Data with Pig
Storage Formats
Complex/Nested Data Types
Grouping
Built-In Functions for Complex Data
Iterating Grouped Data
Day 3
Multi-Dataset Operations with Pig
Techniques for Combining Data Sets
Joining Data Sets in Pig
Set Operations
Splitting Data Sets
Pig Troubleshooting and Optimization
Troubleshooting Pig
Logging
Using Hadoop’s Web UI
Data Sampling and Debugging
Performance Overview
Understanding the Execution Plan
Tips for Improving the Performance of Your
Day 4
Pig Jobs
Introduction to Hive and Impala
What Is Hive?
What Is Impala?
Schema and Data Storage
Comparing Hive to Traditional Databases
Hive Use Cases
Querying with Hive and Impala
Databases and Tables
Basic Hive and Impala Query Language
Syntax
Data Types
Differences Between Hive and Impala Query
Using Hue to Execute Queries
Using the Impala Shell
Day 5
Data Management
Data Storage
Creating Databases and Tables
Loading Data
Altering Databases and Tables
Simplifying Queries with Views
Storing Query Results
Data Storage and Performance
Partitioning Tables
Choosing a File Format
Managing Metadata
Controlling Access to Data
Relational Data Analysis with Hive and Impala
Joining Datasets
Common Built-In Functions
Aggregation and Windowing
Day 6
Working with Impala
How Impala Executes Queries
Extending Impala with User-Defined
Functions
Improving Impala Performance
Analyzing Text and Complex Data with Hive
Complex Values in Hive
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Conclusion
Hive Optimization
Understanding Query Performance
Controlling Job Execution Plan
Bucketing
Indexing Data
Extending Hive
SerDes
Data Transformation with Custom Scripts
User-Defined Functions
Parameterized Queries
Choosing the Best Tool for the Job
Comparing MapReduce, Pig, Hive, Impala, andRelational Databases
Which to Choose?
Conclusion
Training Hours
Time: 12:00 NOON GMT | 07:00AM EST | 4:00AM PST | 6:00AM CST | 5:00AM MST | 5:30PM IST | 01:00PM GMT+1
Audience
1.This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Knowledge of SQL is assumed, as is basic Linux command-line familiarity.
2.Knowledge of at least one scripting language (e.g., Bash scripting, Perl, Python, Ruby) would be helpful but is not essential.
3.Prior knowledge of Apache Hadoop is not required