Tel: +91 9645 91 2007, +91 9746 234 160
Linspire solutions
.net training

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

What is Big Data?
Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, technqiues and frameworks.

What Comes Under Big Data?
Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data.

  • Black Box Data : It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.
  • Social Media Data :Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.
  • Stock Exchange Data :The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
  • Power Grid Data :The power grid data holds information consumed by a particular node with respect to a base station.
  • Transport Data :Transport data includes model, capacity, distance and availability of a vehicle.
  • Search Engine Data :Search engines retrieve lots of data from different databases.

The Motivation for Hadoop

  • Problems with Traditional Large-Scale Systems
  • Introducing Hadoop
  • Hadoopable Problems

Hadoop: Basic Concepts and HDFS

  • The Hadoop Project and Hadoop Components
  • The Hadoop Distributed File System

Introduction to MapReduce

  • MapReduce Overview
  • Example: WordCount
  • Mappers
  • Reducers

Hadoop Clusters and the Hadoop Ecosystem

  • Hadoop Cluster Overview
  • Hadoop Jobs and Tasks
  • Other Hadoop Ecosystem Components
  • Writing a MapReduce Program in Java
  • Basic MapReduce API Concepts
  • Writing MapReduce Drivers, Mappers,and Reducers in Java
  • Speeding Up Hadoop Development by Using Eclipse
  • Differences Between the Old and New MapReduce APIs
  • Writing a MapReduce Program Using Streaming
  • Writing Mappers and Reducers with the Streaming API

Partitioners and Reducers

  • How Partitioners and Reducers Work Together
  • Determining the Optimal Number of Reducers for a Job
  • Writing Customer Partitioners

Practical Development Tips and Techniques

  • Strategies for Debugging MapReduce Code
  • Testing MapReduce Code Locally by Using LocalJobRunner
  • Writing and Viewing Log Files
  • Retrieving Job Information with Counters
  • Reusing Objects
  • Creating Map-Only MapReduce Jobs

An Introduction to Hive, Imapala, and Pig

  • The Motivation for Hive, Impala, and Pig
  • Hive Overview
  • Impala Overview
  • Pig Overview
  • Choosing Between Hive, Impala, and Pig

1) Introduction to Apache Hadoop

  • i)The Case for Apache Hadoop
  • Why Hadoop is needed
  • What problems Hadoop solves
  • What comprises Hadoop and the Hadoop Ecosystem
  • ii)HDFS
  • What features HDFS provides
  • How HDFS reads and writes files
  • How the NameNode uses memory
  • How Hadoop provides file security
  • How to use the NameNode Web UI
  • How to use the Hadoop File Shell
  • iii) Getting Data Into HDFS
  • How to import data into HDFS with Flume
  • How to import data into HDFS with Sqoop
  • What REST interfaces Hadoop provides
  • Best practices for importing data
  • v)MapReduce
  • What MapReduce is
  • What features MapReduce provides
  • What the basic concepts of MapReduce are
  • What the architecture of MapReduce is
  • What featurs MapReduce version 2 provides
  • How MapReduce handles failure
  • How to use the JobTracker Web UI

2) Planning, Installing, and Configuring a Hadoop Cluster

  • i)Planning Your Hadoop Cluster
  • What issues to consider when planning your Hadoop cluster
  • What types of hardware are typically used for Hadoop nodes
  • How to optimally configure your network topology
  • How to select the right operating system and Hadoop distribution
  • How to plan for cluster management
  • ii)Hadoop Installation and Initial Configuration
  • The different installation configurations avaialable in Hadoop
  • How to install Hadoop
  • How to specify Hadoop configuration
  • How to configure HDFS
  • How to configure MapReduce
  • How to locate and configure Hadoop log files
  • iii)Installing and Configuring Hive, Impala,and Pig
  • Hive features and basic configuration
  • Impala features and basic configuration
  • Pig features and installation
  • iv)Hadoop Clients
  • What Hadoop clients are
  • How to install and configure Hadoop clients
  • How to install and configure Hue
  • How Hue authenticates and authorizes user access
  • v)Advanced Cluster Configuration
  • Advanced Configuration Parameters
  • Configuring Hadoop Ports
  • Explicitly including and Excluding Hosts
  • Configuring HDFS for Rack Awareness
  • Configuring HDFS High Availability
  • vi)Hadoop Security
  • Why security is important for Hadoop
  • How Hadoop's security model evolved
  • What Kerberos is and how it relates to Hadoop
  • What to consider when securing Hadoop

3) Cluster Operations and Maintenance

  • i)Managing and Scheduling Jobs
  • How to view and stop jobs running on a cluster
  • The options available for scheduling Hadoop jobs
  • How to configure the Fair Scheduler
  • ii)Cluster Maintenance
  • How to check the status of HDFS
  • How to copy data between clusters
  • How to add and remove nodes
  • How to rebalance the cluster
  • How to upgrade your cluster
  • iii)Cluster Maintenance and Troubleshooting
  • What general system conditions to monitor
  • How to monitor a Hadoop cluster
  • Some techniques for troubleshooting problems on a Hadoop cluster
  • Some common misconfigurations, and their resolutions

4) Security and HDFS Federation

  • i)Kerberos Configuration
  • What are the phases required for a client to access a service
  • Kerberos Client Commands
  • Configuring HDFS Security
  • Configuring MapReduce Security
  • Troubleshooting Hadoop Security
  • ii)Configuring HDFS Federation
  • What is HDFS Federation
  • Benefits of HDFS Federation
  • How HDFS Federation works
  • Federation Configuration

A job is what everyone is in search for. We help our trainees to get to a job. To make this possible, we have an active placement cell with an aim to bring best opportunities to our trainees. The cell helps the trainees to improve their employability through systematic exposure to job interview techniques.
The members of the placement cell always keep their eyes and ears open to find any job genuine job openings available. Linspire solutions has placed over 90% of the trainees in the last 5 years. Our placement process aims to match the requirements of the recruiters and the aspirations of the trainees.

placements from linspire