Analysis of Task Scheduling in Hadoop MapReduce Framework
Keywords:
Hadoop, MapReduce Performance, HDFS, Sched- uler, scheduling, FIFO, FAIR and CAPACITYAbstract
There are many open source platform available for storage and computation of big data, hadoop is one of them. Hadoop can be used for implementation of programming model like MapReduce which is very efficient for processing the shorter jobs with low response time. MapReduce framework, which is popular for computation of big data in parallel, distributed across the cluster. In our experiment we are analyzing scheduling of each task in MapReduce[11] framework with the help of two applications Word count and grep on FIFO (first in first out) scheduler, Fair scheduler and capacity scheduler. The jobs are submitted simultaneously for execution to analyze the task scheduling. We tried to varied the workload as well as Map Tasks on each slaves to observe the effect on tasks scheduling. Experiment has been carried out on text files of 1GB, 2GB and 5GB with variations in Map Tasks as 1, 2, 3, 4 and 5 on each slave nodes, before executing the jobs. In the execution, first we submit Grep followed by Word Count in all the above cases for different workloads with different Map Tasks. We observed that in FIFO scheduler jobs are submitted as per the policy first in first out,the jobs that are submitted first will be executed first. In Fair scheduler and Capacity scheduler all the jobs are given the equal share of resources, means both the job executes simultaneously. Observation of results can help us to conclude that in all the three schedulers i.e FIFO schedeuler, Fair schedeuler, and Capacity schedulers, FIFO scheduler takes more turnaround time for bigger data size where as it outperforms for the shorter jobs. However, now a days much more big data applications are developed with MapReduce model which requires low turnaround time for the larger jobs as well as for shorter jobs. As a result, it becomes necessary to verify the performance of MapReduce, especially for larger jobs which is more popular now a days and which has attracted more and more attentions from research, industry and academia.