Hortonworks Certified Associate
- Enroll This Course
- COST EFFECTIVE
- RESILIENT TO FAILURE
This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. Topics include: Hadoop, YARN, HDFS, using Spark for interactive data exploration, building and deploying Spark applications, optimization of applications, creating Spark
pipelines with multiple libraries, working with different filetypes, building data frames, exploring the Spark SQL API, using Spark Streaming and an introduction to Spark MLlib.
Using Hadoop and MapReduce
• Using HBase
• Importing Data from MySQL to HBase
• Using Apache ZooKeeper
• Examining Configuration Files
• Using Backup and Snapshot
• HBase Shell Operations
• Creating Tables with Multiple Column Families
• Exploring HBase Schema
• Blocksize and Bloom filters
• Exporting Data
• Using a Java Data Access Object Application to
Interact with HBase
Students must have a basic familiarity with data management systems. Familiarity with Hadoop or databases is helpful but not required. Students new to Hadoop are encouraged to attend theHDP Overview: Apache Hadoop Essentials course.
Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit www.cossindia.net for more information.
The exam has five main categories of tasks that involve:
• High Availability
View the complete list of objectives below, which includes links to the corresponding documentation and/or other resources.
Hortonworks Certified Developer:
Recognize use cases for data science on Hadoop
Describe the Hadoop and YARN architecture
Describe supervised and unsupervised learning differences
Use Mahout to run a machine learning algorithm on Hadoop
Describe the data science life cycle
Use Pig to transform and prepare data on Hadoop
Write a Python script
Describe options for running Python code on a Hadoop cluster
Write a Pig User-Defined Function in Python
Use Pig streaming on Hadoop with a Python script
Use machine learning algorithms
Describe use cases for Natural Language Processing (NLP)
Use the Natural Language Toolkit (NLTK)
Describe the components of a Spark application
Write a Spark application in Python
Run machine learning algorithms using Spark MLlib
Take data science into production
Introduction to Apache Hive and Apache Impala.
Installing and configuring Apache Pig, Apache Hive & Apache Impala.
Querying with Hive and Impala.
Introduction to Apache Pig.
Interacting with Pig.
Processing complex data with Pig.
Multi-Dataset operations with Pig.
Daemons and Services
HDFS and YARN (Hands-on)
Hadoop configuration files.
Getting data into HDFS
Import/ Export Data with SQOOP,
Capturing data with Apache Flume
Message processing with Apache Kafka.
Hadoop clients including Hue.