Installing Hadoop and Pig on a windows machine

Thursday, October 23, 2014 Veronica Comper


The first advice I can give about installing Pig and Hadoop on a Windows machine is to do it through a virtual machine. It is simpler and faster to install Pig and Hadoop on a Linux operating system and this is easy to do with a virtual machine.

For my virtual machine I used Virtual box 4.3.10 and Ubuntu 12.04 as the guest operating system. I used Ubuntu 12.04, although Ubuntu 14.04 was available, as the latest version gave me some problems with the graphical interface in the virtual machine even after I installed all the guest additions. The Virtual box and Ubuntu were very easy to install, just follow the installation wizard and this link (https://help.ubuntu.com/community/VirtualBox ) will help if you have any problems.

After installing the virtual machine with my guest operating system, I followed this link, which worked like a charm, to install Hadoop and java http://askubuntu.com/questions/144433/how-to-install-hadoop.

Following the Hadoop and Java installation I used this link to install Pighttp://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.1/CDH4-Installation-Guide/cdh4ig_topic_16_2.html.

Now your system should be all working and you can run Pig commands on your local machine to debug your code.

To write my Pig scripts I use Intellij (http://www.jetbrains.com/idea/ ) as it can be used on Ubuntu and it has a Pig language plugin. The plugin is fairly limited for now, as it only allows for syntax highlighting but it is still useful and hopefully they expand it to include other features.