Hadoop Debugging with Counters

By Eric Downing. Filed in BigData, Java  |  
TOP del.icio.us digg

I have been enjoying learning Hadoop and have had to debug issues within a current process job to enhance it for new data points. It is quite the challenge to debug a distributed system but I have found two ways to get some meaningful input from the system. One way is using Counters and the other is to use MultipleOutputs to capture output from errors(This will be in a later post). This article will show how counters can be used.

The Counter method allows you to specify a set of enum values to specify the counter name:

public enum Counters {
  CHOCOLATE,
  VANILLA,
  MINT_CHOCO_CHIP
}

This will be used as the identifier for the counter that is in use. The same code can be used within the Mapper or the reducer depending on where you are looking for the counts. The difference is where in the final output of the Hadoop process that the counts show up.

if (flavor.contains("CHOCOLATE")) {
    context.getCounter(Counters.CHOCOLATE).increment(1);
}
if (flavor.contains("VANILLA")) {
    context.getCounter(Counters.VANILLA).increment(1);
}
if (flavor.contains("MINT_CHOCO_CHIP")) {
    context.getCounter(Counters.MINT_CHOCO_CHIP).increment(1);
}

This way you will find a line for each counter that was incremented during the Hadoop process.

I have used it to highlight problems with in the code or just to make sure that a segment of code has been run.

Tags: ,

Leave a Reply