Hadoop Combiners

Role of Hadoop Combiner in MapReduce API

In my previous blog, I discussed about Hadoop Counter. In this post, I would like to focus on Hadoop Combiner, a highly useful function offered by Hadoop. Similar to my previous post, I would be demonstrating the functionality of Hadoop Combiner using an example and would be utilizing the same dataset (Customer Complaints), which was used in my previous post, I am sure this would

Apache Spark Job

Techniques to Tune Apache Spark Job

To write a Spark program it is crucial to first understand Apache Spark’s underlying execution model. In this blog post, we will talk about some of the key aspects we need to consider while writing the Spark code to execute the job efficiently. We will also discuss the best practices and optimization tips for Apache Spark to achieve better performance and cleaner code, whilst

Next level of Digital Engineering

How to Bring your Business to its Next level of Digital Engineering

Introduction Most digital transformations don’t yield the benefits that enterprises expect. 53% of enterprises fail to provide any business value whatsoever from their digital transformation efforts, says an Everest Group report. The number looks horrific. So, what’s the reasoning behind this failure? There could be many causes of under delivering expected value in digital transformation, however planning with purpose to ensure the right people,

Hadoop – Counters in MapReduce API with Example

Big data is gaining massive popularity in today’s information-driven era. It is considered as one of the hottest IT buzzwords in 2015. It has the potential to solve key business problems by taming large volumes of data and creating meaningful insights. In order to maximize its potential developers are relying on parallel processing architectures, such as Hadoop etc., to process large amounts of data. The