Total Pageviews

2010-11-29

Running Sqoop under an Active Directory User Account

We installed Sqoop on a VM here for some end users to import data from our SQLServer instances throughout the enterprise. This is great news for us all, as no one really needs to get dirty by exporting data in SSMS.

So on this VM, we have CentOS 5.5, and installed Hive, Pig, and Sqoop. All properly configured to communicate with our cluster.

We installed Likewise-Open for Active Directory login information. I don't want to be in charge of maintaining a list of user's that can have access to the items in our hadoop interface. That's just a pain. If we find that people are misbehaving, we can narrow this down as much as we need to.

We have a domain group called HadoopUsers. I have certain people assigned to this group, and I can see that they are a member of that group in linux by issuing:
id MYDOMAIN\\userX


All that remains is giving that group sufficient access to the local machine in order to run Sqoop.

Running Hive and Pig seems to work without any elevated privileges, but you will see something like the following exception when trying to import data using Sqoop without them:

10/11/26 08:55:24 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File /tmp/sqoop-MyDomain\userX/compile/9dc233654e097695be7aaf4dd4d5cd81/QueryResult.jar does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:372)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1270)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1246)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1218)
at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:722)
at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient.access$300(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:808)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495)
at com.cloudera.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:107)
at com.cloudera.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:166)
at com.cloudera.sqoop.manager.SqlManager.importQuery(SqlManager.java:418)
at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:352)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:170)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:196)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:205)

-sh-3.2$


All I simply did to relieve this issue, was to add the MyDomain\\HadoopUsers group to the sudoers file and give them root access. This is a tad harzardous, I know but it is simple and it works. We can again, narrow this down further if we find that users are not behaving themselves.

Here is the line I added to the bottom of '/etc/sudoers':
%MyDomain\\HadoopUsers ALL=(ALL) ALL

1 comment: