Volume : III, Issue : V, May - 2014

Extending Hadoop to Improve Support for Multiple–input Applications

Sarath C, Mrs Usha K

Abstract :

Hadoop is a MapReduce programming model which provides a cost effective solution for many data–intensive applications. Hadoop stores data distributively and exploits data locality by assigning tasks to where data is stored. Many data–intensive applications, however, require two (or more) input data for each of their tasks. Such applications pose significant challenges for Hadoop as the inputs to one task may reside on multiple nodes, and Hadoop is unable to discover data locality in this scenario. This often leads to excessive data transfers and significant degradations in application performance. So, Bi–Hadoop was introduced as an efficient extension of Hadoop to better support binary–input applications. Bi–Hadoop integrates an easy–to–use user interface, a binary–input aware task scheduler, and a caching subsystem. Experiments show that Bi–Hadoop can significantly improve the execution of binary–input applications by reducing the data transfer overhead, and outperforms existing Hadoop by more than 3x. In this paper, we introduce a further enhancement of Bi–Hadoop by incorporating support for multiple input applications, that is, applications in which the input may reside on more than two nodes.

Keywords :

Article: Download PDF   DOI : 10.36106/ijsr  

Cite This Article:

Sarath C, Mrs Usha K "Extending Hadoop to Improve Support for Multiple-input Applications International Journal of Scientific Research, Vol.III, Issue. V, May 2014


Number of Downloads : 793


References :