I am new to Hadoop and have run into problems trying to run it on my Windows 7 machine. Particularly I am interested in running Hadoop 2.1.0 as its release notes mention that running on Windows is supported. I know that I can try to run 1.x versions on Windows with Cygwin or even use prepared VM by for example Cloudera, but these options are in some reasons less convenient for me.
Having examined a tarball from http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-2.1.0-beta/ I found that there really are some *.cmd scripts that can be run without Cygwin. Everything worked fine when I formatted HDFS partition but when I tried to run hdfs namenode daemon I faced two errors: first, non-fatal, was that winutils.exe could not be found (it really wasn't present in the tarball downloaded). I found the sources of this component in the Apache Hadoop sources tree and compiled it with Microsoft SDK and MSBuild. Thanks to detailed error message it was clear where to put the executable to satisfy Hadoop. But the second error which is fatal doesn't contain enough information for me to solve:
13/09/05 10:20:09 FATAL namenode.NameNode: Exception in namenode join
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:451)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200)
...
13/09/05 10:20:09 INFO util.ExitUtil: Exiting with status 1
Looks like something else should be compiled. I'm going to try to build Hadoop from the source with Maven but isn't there a simpler way? Isn't there some option-I-know-not-of that can disable native code and make that tarball usable on Windows?
Thank you.
UPDATED. Yes, indeed. "Homebrew" package contained some extra files, most importantly winutils.exe and hadoop.dll. With these files, name node and data node started successfully. I think the question can be closed. I didn't delete it in case someone faces the same difficulty.
UPDATED 2. To build the "homebrew" package I did the following:
- Got sources, and unpacked them.
- Read carefully BUILDING.txt.
- Installed dependencies:
3a) Windows SDK 7.1
3b) Maven (I used 3.0.5) 3c) JDK (I used 1.7.25)
3d) ProtocolBuffer (I used 2.5.0 - http://protobuf.googlecode.com/files/protoc-2.5.0-win32.zip). It is enough just to put compiler (protoc.exe) into some of the PATH folders.
3e) A set of UNIX command-line tools (I installed Cygwin)
- Started command line of Windows SDK. Start | All programs | Microsoft Windows SDK v7.1 | ... Command Prompt (I modified this shortcut, adding option /release in the command line to build release versions of native code). All the next steps are made from inside SDK command line window)
-
Set up the environment:
set JAVA_HOME={path_to_JDK_root}
It seems that JAVA_HOME MUST NOT contain space!
set PATH={path_to_maven_bin};%PATH%
set Platform=x64
set PATH={path_to_cygwin_bin};%PATH%
set PATH={path_to_protoc.exe};%PATH%
- Changed dir to the sources root folder (BUILDING.txt warns that there are some limitations on the path length so sources root should have the short name - I used D:\hds)
-
Ran the building process:
mvn package -Pdist -DskipTests
You can try without 'skipTests' but on my machine, some tests failed and the building was terminated. It may be connected to symbolic link issues mentioned in BUILDING .txt. 8. Picked the result in Hadoop-dist\target\hadoop-2.1.0-beta (windows executables and DLLs are in 'bin' folder)