Tag Archive


3D 3dprinting android ant BigData bitcoin Browsers C/C++ cryptocurrency CSS dd ddrescue dogecoin DOS editors find Games Git hadoop html html5 Java Linux litecoin node perl Postgres Programming Python scripting Shell SQL Swing TOTK Utilities utilization vi Video Web Web Design Wifi Windows Wordpress XML Zelda

Hadoop: Checking for hung jobs

I was working on my EMR cluster and needed to test some issues with steps of our process but I don’t have access to the web tools to monitor processes. When I ran one of my hadoop processes it just sat at the beginning of the process. It turned out I had some hung processes but needed to use the Hadoop yarn command-line tool to find out what was going on.

Using the following command shows the queue for processes queued to run:

> yarn application -list

If you do find an issue with a hung process, use the following to end the process:

>yarn application -kill <application-id>

Hadoop Debugging with Counters

I have been enjoying learning Hadoop and have had to debug issues within a current process job to enhance it for new data points. It is quite the challenge to debug a distributed system but I have found two ways to get some meaningful input from the system. One way is using Counters and the other is to use MultipleOutputs to capture output from errors(This will be in a later post). This article will show how counters can be used.

The Counter method allows you to specify a set of enum values to specify the counter name:

public enum Counters {
  CHOCOLATE,
  VANILLA,
  MINT_CHOCO_CHIP
}

This will be used as the identifier for the counter that is in use. The same code can be used within the Mapper or the reducer depending on where you are looking for the counts. The difference is where in the final output of the Hadoop process that the counts show up.

if (flavor.contains("CHOCOLATE")) {
    context.getCounter(Counters.CHOCOLATE).increment(1);
}
if (flavor.contains("VANILLA")) {
    context.getCounter(Counters.VANILLA).increment(1);
}
if (flavor.contains("MINT_CHOCO_CHIP")) {
    context.getCounter(Counters.MINT_CHOCO_CHIP).increment(1);
}

This way you will find a line for each counter that was incremented during the Hadoop process.

I have used it to highlight problems with in the code or just to make sure that a segment of code has been run.