Hadoop Challenges: December 2010

2010-12-14

Configuring Nagios

Ok, first pain point was getting into Nagio's core CGI web interface. This is easy enough after looking around, just execute the following, and then restart 'httpd':

htpasswd -bc /etc/nagios/htpasswd.users nagios YOURPASSWORD

Of course, you can add as many users as you want. I was also thinking of making this completely open, but that isn't very secure.

Now you will see some errors, when you run the verification script. For my part I mostly had typos, and was confused with the templating of hosts, etc. This took me awhile to figure out.

Two folders needed to be created
-/var/log/nagios/*spool/checkresults*
-/var/log/nagios/*rw

Create a file under /var/log/nagios/rw called 'nagios.cmd'. Chmod it to 755.

Those two new subdirectories *MUST* be chown'd to nagios:nagios and all subitems:

chown -hR nagios:nagios /var/log/nagios/*

Now, I am getting errors on the web UI (the one that comes with Nagios-Core):

Installing Latest Nagios onto Ganglia GMETAD Machine

Ok, so once again I compiled my own RPMs from the latest tarball on the nagios website for the core, and plugin pieces. Needed gd-devel and libjpeg-devel as dependecies in order to create the RPMs, but that was pretty easy.

Installing Nagios was a slightly more challenging feat: had to install gd as a dependency, that was fine. But also needed "perl (Net::SNMP)" for Nagios to install properly from RPM.

First, I tried installing from CPAN, as my initial Perl instincts kicked in only to be met with total and utter failure.

At last, came across the correct yum package to install:

yum install perl-Net-SNMP -y

That fixed it all up for me. On to configuring it to work with Ganglia for email alerts!

Ganglia Installation

Ok, I will save you a lot of trouble by telling you right now that, at the time of writing this blog you should compile your own RPM from the tarball available on sourceforge with the following statment:

rpmbuild -ta --target=x86_64,no_arch ./ganglia-3.1.7.tar.gz

You will need some extra binaries from EPEL on your machine in order to compile the RPM, including expat-devel, apr-devel, libconfuse-devel, rpm-build, and some others depending on your base CentOS 5 install.

Take these RPMs, and distribute them to all of the nodes in your cluster, or puppet it up for those that are savvy enough now to do so. Careful of the rpm provider, as there is a bug in it for now with already installed RPMs.

For a node:

rpm -i libganglia-3.1.7-YOUARCH.rpm ganglia-gmond-3.1.7-YOURARCH.rpm ganglia-gmond-modules-python-3.1.7-YOURARCH.rpm

For the front-end:

rpm -i libganglia-3.1.7-YOURARCH.rpm ganglia-gmetad-3.1.7-YOURARCH.rpm ganglia-web-3.1.7-noarch.rpm

Configure your /etc/ganglia/gmetad.conf with a *single* new data_source line for all of your nodes in the cluster, start your services and you are off to the races.

Trust me when I say that installing from the current yum repositories (base and EPEL) is a complete and total waste of your time.

2010-12-01

Modifying hadoop configuration files on a Windows Machine == TROUBLE!!!

Ok, so I have all of our hadoop configuration files in subversion, and I have a cron job that runs every minute to sync up the puppetmaster's copy to what is in svn, and then touch the site.pp file so that all of the puppet agents eventually get the changes that have been committed to our revision control system. It works really well, and I like the fact that I can work on configuration files on my own machine without having to PuTTY into another machine and having to work with nano or vi (which are great programs, just not as convenient as notepad or EditPlus).

That said, one must be careful to not lose the "LINUX"ness from the files if you decide to modify these guys on a windows box, using tortiseSVN and a windows text editor.

You will see an error like the following when you attempt to start your services:

[root@HANODE2 ~]# service hadoop-0.20-datanode restart
Stopping Hadoop datanode daemon (hadoop-datanode): /etc/hadoop-0.20/conf/hadoop-: command not found
: command not foundnf/hadoop-env.sh: line 7:
......
: command not foundnf/hadoop-env.sh: line 52:
no datanode to stop
[ OK ]
Starting Hadoop datanode daemon (hadoop-datanode): /etc/hadoop-0.20/conf/hadoop-: command not found
: command not foundnf/hadoop-env.sh: line 7:
......
: command not foundnf/hadoop-env.sh: line 10:
: command not foundnf/hadoop-env.sh: line 52:
/hadoop-hadoop-datanode-HANODE2.outlog/hadoop
: command not foundnf/hadoop-env.sh: line 2:
......
: command not foundnf/hadoop-env.sh: line 49:
: command not foundnf/hadoop-env.sh: line 52:
Exception in thread "main" java.lang.NoClassDefFoundError:
Caused by: java.lang.ClassNotFoundException:
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
. Program will exit.in class:
[ OK ]

The one-time fix is easy, just make the following call on your hadoop configuration files:

dos2unix /etc/hadoop/conf.MYCONFIGURATION/*

Though this does not scale well with the number of machines that your configuration will be deployed to in your cluster, so we have two options here:

1) Use a text editor that is linux file friendly
2) Add a step to the download on the puppetmaster that makes these files "linux"ee again before the agents get a chance to grab the latest changes.

I've chosen to simply add the following lines to my cron script so that on download of the latest from SVN, the files are forced into linux format for all to love:

dos2unix -q /etc/puppet/modules/hadoop/files/*
dos2unix -q /etc/puppet/modules/hadoop/files/conf.my_cluster/*
dos2unix -q /etc/puppet/modules/hadoop/manifests/*
dos2unix -q /etc/puppet/modules/hadoop/manifests/classes/*

Hadoop Challenges

Total Pageviews