Ok, so setting up a Hadoop cluster isn't super straight-forward, but there are lots of folks out there attempting to bring it all together for us. If I miss a reference here or there, please email me and let me know so that I can source the entries that I place in this blog.
Our setup, for the time being is rather simple. We have 4 nodes in total, one is our NameNode/JobTracker (herein refered to as Node1), and the other three are DataNodes (refered to as Node2-4). This will grow over the next couple of months into at least a 10 node configuration, so having a proper Configuration Management system installed will be crucial to the survival of the opscon (me).
We also have a server that is our "Staging" instance (referred to as Node0, but perhaps another name would be more fitting). This server has Active Directory integration and will be where all users will go to load data, run queries, etc. For the time being this server will also be the host of the MySQL instance for Hive and Puppet, as well as the host for the Puppet Master daemon.
We are implementing Cloudera's distribution of Hadoop, including Hive, Pig, Sqoop, and a variety of other utilities.
My next entry will be around Active Directory installation on our Staging server. It will be fairly brief, and will detail how to ensure that Pig can be executed by a user with near-zero permissions.
No comments:
Post a Comment