Hadoop Debugging with Counters

Aug

2018

Hadoop Debugging with Counters

By Eric Downing. Filed in BigData, Java |

I have been enjoying learning Hadoop and have had to debug issues within a current process job to enhance it for new data points. It is quite the challenge to debug a distributed system but I have found two ways to get some meaningful input from the system. One way is using Counters and the other is to use MultipleOutputs to capture output from errors(This will be in a later post). This article will show how counters can be used.

The Counter method allows you to specify a set of enum values to specify the counter name:

public enum Counters {
  CHOCOLATE,
  VANILLA,
  MINT_CHOCO_CHIP
}

This will be used as the identifier for the counter that is in use. The same code can be used within the Mapper or the reducer depending on where you are looking for the counts. The difference is where in the final output of the Hadoop process that the counts show up.

if (flavor.contains("CHOCOLATE")) {
    context.getCounter(Counters.CHOCOLATE).increment(1);
}
if (flavor.contains("VANILLA")) {
    context.getCounter(Counters.VANILLA).increment(1);
}
if (flavor.contains("MINT_CHOCO_CHIP")) {
    context.getCounter(Counters.MINT_CHOCO_CHIP).increment(1);
}

This way you will find a line for each counter that was incremented during the Hadoop process.

I have used it to highlight problems with in the code or just to make sure that a segment of code has been run.

Tags: BigData, hadoop

This entry was posted on Thursday, August 23rd, 2018 at 9:09 am and is filed under BigData, Java. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Hadoop Debugging with Counters

Leave a Reply