hadoo双系统安装教程 linux要安装在linux双系统安装教程 linux下么

Hadoop第6周练习—在Eclipse中安装Hadoop插件及测试(Linux操作系统) - 推酷
Hadoop第6周练习—在Eclipse中安装Hadoop插件及测试(Linux操作系统)
运行环境说明
硬软件环境
机器网络环境
Hadoop Location
上传测试数据
设置运行参数
运行并查看结果
:传递参数问题
Test_2_Adjust.java
配置运行参数
函数的作用
运行环境说明
硬软件环境
主机操作系统: Windows 64 bit ,双核 4 线程,主频 2.2G , 6G 内存
虚拟软件: VMware& Workstation 9.0.0 build-812388
虚拟机操作系统: CentOS 64 位,单核, 1G 内存
JDK : 1.7.0_55 64 bit
Hadoop : 1.1.2
机器网络环境
集群包含三个节点: 1 个 namenode 、 2 个 datanode ,其中节点之间可以相互 ping 通。节点 IP 地址和主机名分布如下:
10.88.147.221
、 SNN 、 JobTracer
10.88.147.222
、 TaskTracer
10.88.147.223
、 TaskTracer
所有节点均是 CentOS6.5 64bit 系统,防火墙均禁用,所有节点上均创建了一个 hadoop 用户,用户主目录是 /usr/hadoop 。所有节点上均创建了一个目录 /usr/local/hadoop ,并且拥有者是 hadoop 用户。
在 linux 或 win 下安装 eclipse ,并且连接到 Hadoop 集群(关键步骤是编译插件),运行上周课里的求最高温度的 map-reduce 程序作为测试,抓图整个过程
可以以多种方式下载 Eclipse ,下面介绍直接从 eplise 官网下载和从中国镜像站点下载,下载把 eclipse 上传到 Hadoop 环境中。
第一种方式从 elipse 官网下载:
http://www.eclipse.org/downloads/?osType=linux
我们运行的环境为 CentOS 64 位系统,需要选择 eclipse 类型为 linux ,然后点击 linux 64bit 链接下载
会根据用户所在地,推荐最佳的下载地址
在该页面的下部分也可以根据自己的情况选择合适的镜像站点进行下载
第二种方式从镜像站点直接下载 elipse :
http://mirror./eclipse/technology/epp/downloads/release/luna/R/
在镜像站点选择
( http://mirror./eclipse/technology/epp/downloads/release/luna/R/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz )
在 /home/hadoop/Downloads/ 目录中,使用如下命令解压 elipse 并移动到 /usr/local 目录下:
cd /home/hadoop/Downloads
tar -zxvf eclipse-jee-luna-SR1-linux-gtk-x86_64.tar.gz
sudo mv eclipse /usr/local/
cd /usr/local
登录到虚拟机桌面,进入 /usr/local/eclipse 目录,通过如下命令启动 eclipse :
cd /usr/local/eclipse
为了方便操作,可以在虚拟机的桌面上建立 elipse 的快捷操作
将 hadoop-eclipse-plugin-1.1.2.jar 拷贝到 eclipse 的 plugins 目录:
cd /home/hadoop/Downloads
mv hadoop-eclipse-plugin-1.1.2.jar /usr/local/eclipse/plugins
cd /usr/local/eclipse/plugins
ll hadoop-eclipse-plugin-1.1.2.jar
启动 eclipse ,打开窗口 window--&preferences , 配置 Hadoop MapReduce 的安装路径,在实验环境为 /usr/local/hadoop-1.1.2 ,如下图所示:
点击 eclipse 菜单 Window--&Show View--&Other 窗口,选择 MapReducer Locations ,如下图所示:
添加完毕后在视图区域中出现 MapReduce 视图,同时在视图区域右上方出现蓝色小象的添加按钮,如下图所示
使用如下命令启动 Hadoop :
cd /usr/local/hadoop-1.1.2/bin
start-all.sh
Hadoop Location
点击蓝色小象新增按钮,提示输入 MapReduce 和 HDFS Master 相关信息,其中:
Lacation Name :为该位置命名,能够识别该;
MapReduce Master :与 $HADOOP_DIRCONF/mapred-site.xml 配置保持一致;
HDFS Master :与 $HADOOP_DIRCONF/core-site.xml 配置保持一致
User Name :登录 hadoop 用户名,可以随意填写
上传测试数据
配置完毕后,在 eclipse 的左侧 DFS Locations 出现 CentOS HDFS 的目录树,该目录为 HDFS 文件系统中的目录信息:
为运行求最高温度 MapReduce ,通过 eclipse 上传测试数据,可以在 DFS Locations 相对应目录点击右键,选择 Upload file to DFS 出现选择文件界面,如下图所示:
可以看到 MaxTemperatureData.txt 已经成功上传到 HDFS 文件系统中
安装插件之后,可以在 New Project 页面建立 Map/Reduce Project :
需要注意的是填写完项目名称 Rock 后,需要指定 Hadoop MapReduce 运行包的路径,填写完毕后点击完成即可
在 Rock 项目中创建 /src/chapter06 包,在该包下创建上一周求每年最大温度的代码,分别为 MaxTemperature.java 、 MaxTemperatureMapper.java 和 MaxTemperatureReducer.java
设置运行参数
打开 MaxTemperature.java ,点击 Run-Run Configurations 设置运行参数,需要在 Arguments 页签填写 MaxTemperature 运行的输入路径和输出路径参数,需要注意的是输入、输出路径参数路径需要全路径,否则运行会报错:
输入:运行数据路径,这里为 hdfs://hadoop1:9000/usr/hadoop/in/MaxTemperatureData.txt
输出:运行结果路径,这里为 hdfs://hadoop1:9000/usr/hadoop/out_ch6_eplipse
运行并查看结果
设置运行参数完毕后,点击运行按钮:
运行成功后,刷新 CentOS HDFS 中的输出路径 out_ch6_eclipse 目录,打开 part-r-00000 文件,可以看到运行结果:
:传递参数问题
(选作)请阅读 Exercise_1.java ,编译并且运行。该程序从 Test_1 改编而来,其主要差别在于能够让用户在结果文件中的每一行前面添加一个用户自定义的字符串,而这个字符串将由参数传递到程序中。例如,运行
$hadoop jar Exercise_1.jar input_path output_path hadoop
之后,第三个参数“ hadoop ”将会在结果文件中显示,例如附件“ result_1 ”所显示的。
问题:着重考虑 Exercise_1.java 里面”需要注意的部分“,改写 Test_2 程序,得到的结果必须跟附件 &resule_2& 一致,并且其中 hadoop 必须由参数传递。
Test_2_Adjust.java
:红色字体为与 Test_2.java 不同之处
java.io.IOE
java.text.DateF
java.text.SimpleDateF
java.util.D
org.apache.hadoop.conf.C
org.apache.hadoop.conf.C
org.apache.hadoop.fs.P
org.apache.hadoop.io.*;
org.apache.hadoop.mapreduce.*;
org.apache.hadoop.mapreduce.lib.input.FileInputF
org.apache.hadoop.mapreduce.lib.output.FileOutputF
org.apache.hadoop.mapreduce.lib.output.TextOutputF
org.apache.hadoop.util.T
org.apache.hadoop.util.ToolR
Test_2_Adjust
Configured
implements
用于计数各种异常数据
&&& & * MAP
Mapper&LongWritable, Text, Text, Text& {
map(LongWritable key, Text value, Context context)
IOException, InterruptedException {
&&&&&&&&&&& String line = value.toString();
读取源数据
&&&&&&&&&&&
&&&&&&&&&&&&&&&
&&&&&&&&&&&&&&& String[] lineSplit = line.split(
String anum = lineSplit[0];
String bnum = lineSplit[1];
&&&&&&&&&&&&&&&
String name = context.getConfiguration().get(&name&);
&&&&&&&&&&&&&&& context.write( new Text(bnum + &,& + name), new Text(anum)); //
&&&&&&&&&&& }
(java.lang.ArrayIndexOutOfBoundsException e) {
&&&&&&&&&&&&&&& context.getCounter(Counter.
).increment(1);
出错令计数器
&&&&&&&&&&&&&&&
&&& & * REDUCE
Reducer&Text, Text, Text, Text& {
reduce(Text key, Iterable&Text& values, Context context)
IOException, InterruptedException {
String valueS
&&&&&&&&&&& String out =
&&&&&&&&&&&
(Text value : values) {
valueString = value.toString();
&&&&&&&&&&&&&&& out += valueString +
&&&&&&&&&&&
String[] keySplit = key.toString().split(&,&);
context.write(newText(keySplit[0]),newText(out + keySplit[1]));
run(String[] args)
Exception {
Configuration conf = getConf();
&&&&&&& /**
获取传入的
conf.set(&name&, args[2]);
&&&&&&& Job job = new Job(conf, &Test_2_Adjust&); & //
&&&&&&& job.setJarByClass(Test_2_Adjust. class ); &&& //
&&&&&&& FileInputFormat.addInputPath(job,
Path(args[0])); &
&&&&&&& FileOutputFormat.setOutputPath(job,
Path(args[1])); &&&
&&&&&&& job.setMapperClass(Map.
); &&&&&&&&
&&&&&&& job.setReducerClass(Reduce.
&&&&&&& job.setOutputFormatClass(TextOutputFormat.
&&&&&&& job.setOutputKeyClass(Text.
指定输出的
&&&&&&& job.setOutputValueClass(Text.
指定输出的
&&&&&&& job.waitForCompletion(
输出任务完成情况
&&&&&&& System.
任务名称:
+ job.getJobName());
&&&&&&& System.
任务成功:
+ (job.isSuccessful() ?
&&&&&&& System.
输入行数:
+ job.getCounters().findCounter(
&org.apache.hadoop.mapred.Task$Counter&
&MAP_INPUT_RECORDS&
).getValue());
&&&&&&& System.
输出行数:
+ job.getCounters().findCounter(
&org.apache.hadoop.mapred.Task$Counter&
&MAP_OUTPUT_RECORDS&
).getValue());
&&&&&&& System.
跳过的行:
+ job.getCounters().findCounter(Counter.
).getValue());
job.isSuccessful() ? 0 : 1;
设置系统说明
main(String[] args)
Exception {
判断参数个数是否正确
如果无参数运行则显示以作程序说明
if(args.length != 3) {
System. err .println(&&);
System. err .println(&Usage: Test_2_Adjust & input path & & output path & & name &&);
System. err .println(&Example: hadoop jar ~/Test_2_Adjust.jar hdfs://localhost:9000/usr/hadoop/Test_2_Data.txt hdfs://localhost:9000/usr/hadoop/out_ch6_test2_adjust&);
System. err .println(&Counter:&);
System. err .println(&\t& + &LINESKIP& + &\t&+ &Lines which are too short&);
System. exit (-1);
记录开始时间
&&&&&&& DateFormat formatter =
SimpleDateFormat(
&yyyy-MM-dd HH:mm:ss&
&&&&&&& Date start =
res = ToolRunner.run(
Configuration(),
Test_2_Adjust(), args);
输出任务耗时
&&&&&&& Date end =
) ((end.getTime() - start.getTime()) / 60000.0);
&&&&&&& System.
任务开始:
+ formatter.format(start));
&&&&&&& System.
任务结束:
+ formatter.format(end));
&&&&&&& System.
任务耗时:
+ String. valueOf (time) +
System. exit (res);
打开 eclipse ,在 Rock 项目 src/chapter6 包下新建 Test_2_Adjust.java :
使用 SSH 工具(参见第 1 、 2 周 2.1.3.1Linux 文件传输工具所描述)把提供的测试数据 Test_2_Data.txt 上传到本地目录 /usr/local/hadoop-1.1.2/input 中,然后使用 eclipse 的 HDFS 插件工具上传该文件到 /usr/hadoop/in 目录中,如下图所示:
配置运行参数
新建一个 Java 应用运行程序,需要在 Arguments 页签填写 Test_2_Adjust 运行的输入路径、输出路径和输入字符三个参数,需要注意的是输入、输出路径参数路径需要全路径,否则运行会报错:
输入:运行数据路径,这里为 hdfs://hadoop1:9000/usr/hadoop/in/Test_2_Data.txt
输出:运行结果路径,这里为 hdfs://hadoop1:9000/usr/hadoop/out_ch6_test2_adjust
输入字符:为“ hadoop ”字符
配置完毕后,执行 MapReduce 作业,执行成功到 eclipse 的 HDFS 插件工具查看结果,下图显示在每行加入了传入参数:
函数的作用
(选作)阅读 Exercise_2.java ,这是一个与 Exercise_1.java 有相同效果的程序,自行搜索关于 setup 函数的资料,回答上述两个程序的不同。
不同之处是: Exercise2 中的 name 写成了 reduce 的成员变量,而且该变量在 setup 方法进行初始化。从第二题代码运行的 console 可以看出,不管运行多少次 map 和 reduce 方法, setup 都只运行一次。 setup 方法是继承于父类,在 Mapper 类和 Reducer 类中都有 setup , Mapper 的 setup 方法会在 map 之前运行, Reducer 的 setup 方法会在 reduce 之前运行。同样的 Mapper 类和 Reducer 类还有对应的 cleanup 方法,在整个过程中也只运行一次。而且 Mapper 的 cleanup 方法会在 map 之后运行, Reducer 的 cleanup 方法会在 reduce 之后运行。
已发表评论数()
请填写推刊名
描述不能大于100个字符!
权限设置: 公开
仅自己可见
正文不准确
标题不准确
排版有问题
主题不准确
没有分页内容
图片无法显示
视频无法显示
与原文不一致后使用快捷导航没有帐号?
查看: 10688|回复: 30
Linux安装软件时缺少依赖包的简单较完美解决方法!
论坛徽章:17
本帖最后由 吴飚 于
02:32 编辑
大家在linux下源码安装时,有木有经常碰到缺少这个包那个包的,然后不知所措?看到最近有几个筒子安装thrift,安装python因缺少依赖包而进行不下去了。
我用的是红帽,装系统的时候习惯把所有的有的没的都选择上,基本没有出现过缺少包的情况,大家可以试试。
如果装系统的时候忘了选择所有的包,来,这里有你想要的。
如果电脑可以联网,可以试试以下的yum命令,把命令copy过去贴到命令行执行下就OK了:
yum -y install gcc gcc-c++ autoconf libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel libxml2 libxml2-devel zlib zlib-devel glibc glibc-devel glib2 glib2-devel bzip2 bzip2-devel ncurses ncurses-devel curl curl-devel e2fsprogs e2fsprogs-devel krb5-devel libidn libidn-devel openssl openssl-devel nss_ldap openldap openldap-devel&&openldap-clients openldap-servers libxslt-devel libevent-devel ntp&&libtool-ltdl bison libtool vim-enhanced&&复制代码
这里会安装基本上所有你需要的。已经有好几个筒子完美解决了缺少包的问题。。
如果yum不好使的,参考:
Redhat/CentOS使用yum配置的完美解决方案(顺带yum安装memcached)
论坛徽章:16
飚哥 我的连不上网啊
论坛徽章:25
好狠呢这招
高级会员, 积分 980, 距离下一级还需 20 积分
论坛徽章:4
呵呵 把开发包、开发工具装完,基本需要的也ok了吧
论坛徽章:17
sunev_yu 发表于
飚哥 我的连不上网啊
那你拿盘吧。。
论坛徽章:18
连接不上网的时候,你这yum就变yun(晕) 了。
论坛徽章:17
James_Chen 发表于
连接不上网的时候,你这yum就变yun(晕) 了。
我说了要联网啊。
论坛徽章:21
最恶心的是连接不上网络。
论坛徽章:5
吴总,yum是redhat独有的吧!
论坛徽章:17
RE: Linux源码安装软件时缺少依赖包的简单较完美解决方法!
东东 发表于
吴总,yum是redhat独有的吧!
手机回帖试试。。很多系统都有yum 的。。
扫一扫加入本版微信群CentOS 6.4下Hadoop2.3.0详细安装过程
Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。
Hadoop的框架最核心的设计就是:HDFS和MapReduce.HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。
1,系统架构
集群角色:
主机名 ip地址 角色
name01 192.168.52.128 NameNode、ResourceManager(JobTracker)
data01 192.168.52.129 NameNode、ResourceManager(JobTracker)
data02 192.168.52.130 DataNode、NodeManager(TaskTracker)
系统环境:
centos6.5 x64 vmware vpc
hadoop版本:hadoop-2.3.0
2,环境准备
2.1 系统设置
关闭iptables:
/sbin/service iptables stop
/sbin/chkconfig iptables off
关闭selinux: setenforce 0
sed &s@^SELINUX=enforcing@SELINUX=disabled@g& /etc/sysconfig/selinux
设置节点名称,所有节点执行:
/bin/cat &&EOF& /etc/hosts
localhost.localdomain=data01 #或者name01,data02
192.168.52.128 name01
192.168.52.129 data01
192.168.52.130 data02
hostname node0*
send &s@HOSTNAME=localhost.localdomain@HOSTNAME=node0*@g& /etc/sysconfig/network
2.2 用户目录创建
创建hadoop运行账户:
使用root登陆所有机器后,所有的机器都创建hadoop用户
useradd hadoop #设置hadoop用户组
passwd hadoop
#sudo useradd &s /bin/bash &d /home/hadoop &m hadoop &g hadoop &G admin //添加一个zhm用户,此用户属于hadoop用户组,且具有admin权限。
#su hadoop //切换到zhm用户中
创建hadoop相关目录:
定义需要数据及目录的存放路径,定义代码及工具存放的路径
mkdir -p /home/hadoop/src
mkdir -p /home/hadoop/tools
chown -R hadoop.hadoop /home/hadoop/*
定义数据节点存放的路径到跟目录下的hadoop文件夹, 这里是数据节点存放目录需要有足够的空间存放
mkdir -p /data/hadoop/hdfs
mkdir -p /data/hadoop/tmp
mkdir -p /var/logs/hadoop
设置可写权限
chmod -R 777 /data/hadoop
chown -R hadoop.hadoop /data/hadoop/*
chown -R hadoop.hadoop /var/logs/hadoop
定义java安装程序路径
mkdir -p /usr/lib/jvm/
2.3 配置ssh免密码登陆
参考文章地址:http://blog.csdn.net/ab198604/article/details/8250461
SSH主要通过RSA算法来产生公钥与私钥,在数据传输过程中对数据进行加密来保障数
据的安全性和可靠性,公钥部分是公共部分,网络上任一结点均可以访问,私钥主要用于对数据进行加密,以防他人盗取数据。总而言之,这是一种非对称算法,
想要破解还是非常有难度的。Hadoop集群的各个结点之间需要进行数据的访问,被访问的结点对于访问用户结点的可靠性必须进行验证,hadoop采用的是ssh的方
法通过密钥验证及数据加解密的方式进行远程安全登录操作,当然,如果hadoop对每个结点的访问均需要进行验证,其效率将会大大降低,所以才需要配置SSH免
密码的方法直接远程连入被访问结点,这样将大大提高访问效率。
namenode节点配置免密码登陆其他节点,每个节点都要产生公钥密码,Id_dsa.pub为公钥,id_dsa为私钥,紧接着将公钥文件复制成authorized_keys文件,这个步骤是必须的,过程如下:
2.3.1 每个节点分别产生密钥
(1):.ssh目录需要755权限,authorized_keys需要644权限;
(2):Linux防火墙开着,hadoop需要开的端口需要添加,或者关掉防火墙;
(3):数据节点连不上主服务器还有可能是使用了机器名的缘故,还是使用IP地址比较稳妥。
name01(192.168.52.128)主库上面:
namenode主节点hadoop账户创建服务器登陆公私钥:
mkdir -p /home/hadoop/.ssh
chown hadoop.hadoop -R /home/hadoop/.ssh
chmod 755 /home/hadoop/.ssh
su - hadoop
cd /home/hadoop/.ssh
ssh-keygen -t dsa -P '' -f id_dsa
[hadoop@name01 .ssh]$ ssh-keygen -t dsa -P '' -f id_dsa
Generating public/private dsa key pair.
open id_dsa failed: Permission denied.
Saving the key failed: id_dsa.
[hadoop@name01 .ssh]$
报错,解决办法是: setenforce 0
[root@name01 .ssh]# setenforce 0
su - hadoop
[hadoop@name01 .ssh]$ ssh-keygen -t dsa -P '' -f id_dsa
Generating public/private dsa key pair.
Your identification has been saved in id_dsa.
Your public key has been saved in id_dsa.pub.
The key fingerprint is:
52:69:9a:ff:07:f4:fc:28:1e:48:18:fe:93:ca:ff:1d hadoop@name01
The key's randomart image is:
+--[ DSA 1024]----+
| * S. o |
| = o. o |
| * ..Eo |
| . . o.oo.. |
| o..o+o. |
+-----------------+
[hadoop@name01 .ssh]$ ll
-rw-------. 1 hadoop hadoop 668 Aug 20 23:58 id_dsa
-rw-r--r--. 1 hadoop hadoop 603 Aug 20 23:58 id_dsa.pub
drwxrwxr-x. 2 hadoop hadoop 4096 Aug 20 23:48 touch
[hadoop@name01 .ssh]$
Id_dsa.pub为公钥,id_dsa为私钥,紧接着将公钥文件复制成authorized_keys文件,这个步骤是必须的,过程如下:
[hadoop@name01 .ssh]$ cat id_dsa.pub && authorized_keys
[hadoop@name01 .ssh]$ ll
-rw-rw-r--. 1 hadoop hadoop 603 Aug 21 00:00 authorized_keys
-rw-------. 1 hadoop hadoop 668 Aug 20 23:58 id_dsa
-rw-r--r--. 1 hadoop hadoop 603 Aug 20 23:58 id_dsa.pub
drwxrwxr-x. 2 hadoop hadoop 4096 Aug 20 23:48 touch
[hadoop@name01 .ssh]$
用上述同样的方法在剩下的两个结点中如法炮制即可。
data01(192.168.52.129)
2.3.2 在data01(192.168.52.129)上面执行:
useradd hadoop #设置hadoop用户组
passwd hadoop #设置hadoop密码为hadoop
setenforce 0
su - hadoop
mkdir -p /home/hadoop/.ssh
cd /home/hadoop/.ssh
ssh-keygen -t dsa -P '' -f id_dsa
cat id_dsa.pub && authorized_keys
2.3.3 在data01(192.168.52.130)上面执行:
useradd hadoop #设置hadoop用户组
passwd hadoop #设置hadoop密码为hadoop
setenforce 0
su - hadoop
mkdir -p /home/hadoop/.ssh
cd /home/hadoop/.ssh
ssh-keygen -t dsa -P '' -f id_dsa
cat id_dsa.pub && authorized_keys
2.3.4 构造3个通用的authorized_keys
在name01(192.168.52.128)上操作:
su - hadoop
cd /home/hadoop/.ssh
scp hadoop@data01:/home/hadoop/.ssh/id_dsa.pub ./id_dsa.pub.data01
scp hadoop@data02:/home/hadoop/.ssh/id_dsa.pub ./id_dsa.pub.data02
cat id_dsa.pub.data01 && authorized_keys
cat id_dsa.pub.data02 && authorized_keys
如下所示:
[hadoop@name01 .ssh]$ scp hadoop@data01:/home/hadoop/.ssh/id_dsa.pub ./id_dsa.pub.data01
The authenticity of host 'data01 (192.168.52.129)' can't be established.
RSA key fingerprint is 5b:22:7b:dc:0c:b8:bf:5c:92:aa:ff:93:3c:59:bd:d3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'data01,192.168.52.129' (RSA) to the list of known hosts.
hadoop@data01's password:
Permission denied, please try again.
hadoop@data01's password:
id_dsa.pub 100% 603 0.6KB/s 00:00
[hadoop@name01 .ssh]$
[hadoop@name01 .ssh]$ scp hadoop@data02:/home/hadoop/.ssh/id_dsa.pub ./id_dsa.pub.data02
The authenticity of host 'data02 (192.168.52.130)' can't be established.
RSA key fingerprint is 5b:22:7b:dc:0c:b8:bf:5c:92:aa:ff:93:3c:59:bd:d3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'data02,192.168.52.130' (RSA) to the list of known hosts.
hadoop@data02's password:
id_dsa.pub 100% 603 0.6KB/s 00:00
[hadoop@name01 .ssh]$
[hadoop@name01 .ssh]$ cat id_dsa.pub.data01 && authorized_keys
[hadoop@name01 .ssh]$ cat id_dsa.pub.data02 && authorized_keys
[hadoop@name01 .ssh]$ cat authorized_keys
ssh-dss ssh-dss
[hadoop@name01 .ssh]$
看到authorized_keys文件里面有3行记录,分别代表了访问name01,data01,data02的公用密钥。把这个authorized_keys公钥文件copy到data01和data02上面同一个目录下。
然后通过hadoop远程彼此连接name01、data01、data02就可以免密码了
scp authorized_keys hadoop@data01:/home/hadoop/.ssh/
scp authorized_keys hadoop@data02:/home/hadoop/.ssh/
然后分别在name01、data01、data02以hadoop用户执行权限赋予操作
su - hadoop
chmod 600 /home/hadoop/.ssh/authorized_keys
chmod 700 -R /home/hadoop/.ssh
测试ssh免秘钥登录,首次连接的时候,需要输入yes,之后就不用输入密码直接可以ssh过去了。
[hadoop@name01 .ssh]$ ssh hadoop@data01
Last login: Thu Aug 21 01:53:24 2014 from name01
[hadoop@data01 ~]$ ssh hadoop@data02
The authenticity of host 'data02 (192.168.52.130)' can't be established.
RSA key fingerprint is 5b:22:7b:dc:0c:b8:bf:5c:92:aa:ff:93:3c:59:bd:d3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'data02,192.168.52.130' (RSA) to the list of known hosts.
[hadoop@data02 ~]$ ssh hadoop@name01
The authenticity of host 'name01 (::1)' can't be established.
RSA key fingerprint is 5b:22:7b:dc:0c:b8:bf:5c:92:aa:ff:93:3c:59:bd:d3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'name01' (RSA) to the list of known hosts.
Last login: Thu Aug 21 01:56:12 2014 from data01
[hadoop@data02 ~]$ ssh hadoop@name01
Last login: Thu Aug 21 01:56:22 2014 from localhost.localdomain
[hadoop@data02 ~]$
看到问题所在,从data01、data02上面ssh到name01上面没有成功,问题在哪里?
2.3.5 解决ssh name01失败的问题
[Hadoop@data01 ~]$ ssh name01
Last login: Thu Aug 21 02:25:28 2014 from localhost.localdomain
[hadoop@data01 ~]$
确实没有成功,退出来看看/etc/hosts的设置
[hadoop@data01 ~]$ exit
[root@data01 ~]#
[root@data01 ~]# vim /etc/hosts
#127.0.0.1 localhost.localdomain localhost.localdomain localhost4 localhost4.localdomain4 localhost name01
#::1 localhost.localdomain localhost.localdomain localhost6 localhost6.localdomain6 localhost name01
localhost.localdomain=data01
192.168.52.128 name01
192.168.52.129 data01
192.168.52.130 data02
[root@data01 ~]# su - hadoop
[hadoop@data02 ~]$ ssh name01
Warning: Permanently added the RSA host key for IP address '192.168.52.128' to the list of known hosts.
Last login: Thu Aug 21 02:32:32 2014 from data01
[hadoop@name01 ~]$
OK,ssh远程连接name01成功,解决方法vim hosts注释掉前面两行搞定,如下所示:
[root@data01 ~]# vim /etc/hosts
#127.0.0.1 localhost.localdomain localhost.localdomain localhost4 localhost4.localdomain4 localhost name01
#::1 localhost.localdomain localhost.localdomain localhost6 localhost6.localdomain6 localhost name01
2.3.6 验证name01、data01、data02任何ssh免密码登录
[hadoop@data02 ~]$ ssh name01
Last login: Thu Aug 21 02:38:46 2014 from data02
[hadoop@name01 ~]$ ssh data01
Last login: Thu Aug 21 02:30:35 2014 from localhost.localdomain
[hadoop@data01 ~]$ ssh data02
Last login: Thu Aug 21 02:32:57 2014 from localhost.localdomain
[hadoop@data02 ~]$ ssh data01
Last login: Thu Aug 21 02:39:55 2014 from name01
[hadoop@data01 ~]$ ssh name01
Last login: Thu Aug 21 02:39:51 2014 from data02
[hadoop@name01 ~]$ ssh data02
Last login: Thu Aug 21 02:39:58 2014 from data01
[hadoop@data02 ~]$
3,安装部署hadoop环境
3.1 java环境准备
root账户所有节点部署java环境:
安装jdk7版本,请参考:http://blog.itpub.net//viewspace-1256321/
3.2,安装hadoop
3.2.1 版本2.2.0安装
下载软件包:
chown -R hadoop.hadoop /soft
从本地copy到linux虚拟机
su - hadoop
cd /soft/hadoop
tar zxvf hadoop-2.3.0-x64.tar.gz -C /home/hadoop/src/
配置环境变量:
使用root设置环境变量
cat &&EOF&& /etc/profile
export HADOOP_HOME=/home/hadoop/src/hadoop-2.3.0
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60/
export PATH=/home/hadoop/src/hadoop-2.3.0/bin:/home/hadoop/src/hadoop-2.3.0/sbin:$PATH
source /etc/profile
3.3,hadoop配置文件
hadoop群集涉及配置文件:hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml yarn-env.sh slaves yarn-site.xml
涉及到的配置文件有7个:
cp /home/hadoop/src/hadoop-2.3.0/etc/hadoop
hadoop-env.sh
yarn-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
以上个别文件默认不存在的,可以复制相应的template文件获得
a.修改hadoop-env.sh配置:
vim hadoop-env.sh
增加java环境变量
export JAVA_HOME=&/usr/lib/jvm/jdk1.7.0_60&
b.修改yarn-env.sh配置:
vim yarn-env.sh
修改java_home值为 export JAVA_HOME=&/usr/lib/jvm/jdk1.7.0_60&
c.修改slaves配置,写入所有从节点主机名:
vim slaves
d.修改core-site.xml配置文件:
vim core-site.xml
&configuration&
&property&
&name&fs.default.name&/name&
&value&hdfs://name01:9000&/value&
&/property&
&property&
&name&io.file.buffer.size&/name&
&value&131072&/value&
&/property&
&property&
&name&hadoop.tmp.dir&/name&
&value&/data/hadoop/tmp&/value&
&description&A base for other temporary directories.&/description&
&/property&
&property&
&name&hadoop.proxyuser.hduser.hosts&/name&
&value&*&/value&
&/property&
&property&
&name&hadoop.proxyuser.hduser.groups&/name&
&value&*&/value&
&/property&
&/configuration&
e.修改hdfs-site.xml配置:
创建相关目录
mkdir -p /data/hadoop/name
chown -R hadoop.hadoop /data/hadoop/name
mkdir -p /data/hadoop/data
chown -R hadoop.hadoop /data/hadoop/data
vim hdfs-site.xml
&configuration&
&property&
&name&dfs.namenode.secondary.http-address&/name&
&value&name01:9001&/value&
&/property&
&property&
&name&dfs.namenode.name.dir&/name&
&value&file:/data/hadoop/name&/value&
&/property&
&property&
&name&dfs.datanode.data.dir&/name&
&value&file:/data/hadoop/data&/value&
&/property&
&property&
&name&dfs.replication&/name&
&value&3&/value&
&description&storage copy number&/description&
&/property&
&property&
&name&dfs.webhdfs.enabled&/name&
&value&true&/value&
&/property&
&/configuration&
f.修改mapred-site.xml配置
#这个文件不存在,需要自己VIM创建
vim mapred-site.xml
&configuration&
&property&
&name&mapreduce.framework.name&/name&
&value&yarn&/value&
&/property&
&property&
&name&mapreduce.jobhistory.address&/name&
&value&name01:10020&/value&
&/property&
&property&
&name&mapreduce.jobhistory.webapp.address&/name&
&value&name01:19888&/value&
&/property&
# &property&
# &name&mapred.job.tracker&/name&
# &value&name01:9001&/value&
# &description&JobTracker visit path&/description&
# &/property&
&/configuration&
g.修改yarn-site.xml配置:
vim yarn-site.xml
&configuration&
&property&
&name&yarn.nodemanager.aux-services&/name&
&value&mapreduce_shuffle&/value&
&/property&
&property&
&name&yarn.nodemanager.aux-services.mapreduce.shuffle.class&/name&
&value&org.apache.hadoop.mapred.ShuffleHandler&/value&
&/property&
&property&
&name&yarn.resourcemanager.address&/name&
&value&name01:8032&/value&
&/property&
&property&
&name&yarn.resourcemanager.scheduler.address&/name&
&value&name01:8030&/value&
&/property&
&property&
&name&yarn.resourcemanager.resource-tracker.address&/name&
&value&name01:8031&/value&
&/property&
&property&
&name&yarn.resourcemanager.admin.address&/name&
&value&name01:8033&/value&
&/property&
&property&
&name&yarn.resourcemanager.webapp.address&/name&
&value&name01:8088&/value&
&/property&
&/configuration&
所有节点采用相同的配置文件和安装目录,直接整个目录copy过去安装把name01上面的所有hadoop目录copy到data02上面去:
scp -r /home/hadoop/* hadoop@data02:/home/hadoop/
scp -r /data/hadoop/* hadoop@data02:/data/hadoop/
把name01上面的所有hadoop目录copy到data01上面去:
scp -r /home/hadoop/* hadoop@data01:/home/hadoop/
scp -r /data/hadoop/* hadoop@data01:/data/hadoop/
3.3,格式化文件系统
在name01主库上面执行 hadoop namenode -format操作,格式化hdfs文件系统。
su - hadoop
[hadoop@localhost ~]$ hadoop namenode -format
[hadoop@name01 bin]$ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
14/08/21 04:51:20 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = name01/192.168.52.128
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.3.0
STARTUP_MSG: classpath = /home/hadoop/src/hadoop-2.3.0/etc/hadoop:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jetty-6.1.26.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/httpcore-4.2.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/junit-4.8.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-el-1.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-lang-2.6.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/httpclient-4.2.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/zookeeper-3.4.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/hadoop-auth-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jets3t-0.9.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jersey-server-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/hadoop-annotations-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-io-2.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/avro-1.7.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jersey-json-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jersey-core-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/stax-api-1.0-2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/hadoop-nfs-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0-tests.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/commons-io-2.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/hadoop-hdfs-2.3.0-tests.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/hadoop-hdfs-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/hdfs/hadoop-hdfs-nfs-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/servlet-api-2.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jetty-6.1.26.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jackson-xc-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/commons-lang-2.6.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/zookeeper-3.4.5.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jline-0.9.94.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/commons-codec-1.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jersey-client-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/xz-1.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/commons-io-2.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/commons-cli-1.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jersey-json-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/activation-1.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jettison-1.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-server-tests-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-common-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-client-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-api-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-server-common-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/junit-4.10.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/hadoop-annotations-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0.jar:/home/hadoop/src/hadoop-2.3.0/contrib/capacity-scheduler/*.jar:/home/hadoop/src/hadoop-2.3.0/contrib/capacity-scheduler/*.jar
STARTUP_MSG: build = Unknown -r U compiled by 'root' on T02:27Z
STARTUP_MSG: java = 1.7.0_60
************************************************************/
14/08/21 04:51:20 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
Formatting using clusterid: CID-9a8-4f79-a5bb-b
14/08/21 04:51:24 INFO namenode.FSNamesystem: fsLock is fair:true
14/08/21 04:51:24 INFO namenode.HostFileManager: read includes:
14/08/21 04:51:24 INFO namenode.HostFileManager: read excludes:
14/08/21 04:51:24 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
14/08/21 04:51:24 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
14/08/21 04:51:24 INFO util.GSet: Computing capacity for map BlocksMap
14/08/21 04:51:24 INFO util.GSet: VM type = 64-bit
14/08/21 04:51:24 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
14/08/21 04:51:24 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/08/21 04:51:24 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
14/08/21 04:51:24 INFO blockmanagement.BlockManager: defaultReplication = 3
14/08/21 04:51:24 INFO blockmanagement.BlockManager: maxReplication = 512
14/08/21 04:51:24 INFO blockmanagement.BlockManager: minReplication = 1
14/08/21 04:51:24 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
14/08/21 04:51:24 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
14/08/21 04:51:24 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
14/08/21 04:51:24 INFO blockmanagement.BlockManager: encryptDataTransfer = false
14/08/21 04:51:24 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
14/08/21 04:51:25 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
14/08/21 04:51:25 INFO namenode.FSNamesystem: supergroup = supergroup
14/08/21 04:51:25 INFO namenode.FSNamesystem: isPermissionEnabled = true
14/08/21 04:51:25 INFO namenode.FSNamesystem: HA Enabled: false
14/08/21 04:51:25 INFO namenode.FSNamesystem: Append Enabled: true
14/08/21 04:51:26 INFO util.GSet: Computing capacity for map INodeMap
14/08/21 04:51:26 INFO util.GSet: VM type = 64-bit
14/08/21 04:51:26 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
14/08/21 04:51:26 INFO util.GSet: capacity = 2^20 = 1048576 entries
14/08/21 04:51:26 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/08/21 04:51:26 INFO util.GSet: Computing capacity for map cachedBlocks
14/08/21 04:51:26 INFO util.GSet: VM type = 64-bit
14/08/21 04:51:26 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
14/08/21 04:51:26 INFO util.GSet: capacity = 2^18 = 262144 entries
14/08/21 04:51:26 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.6033
14/08/21 04:51:26 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
14/08/21 04:51:26 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
14/08/21 04:51:26 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
14/08/21 04:51:26 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
14/08/21 04:51:26 INFO util.GSet: Computing capacity for map Namenode Retry Cache
14/08/21 04:51:26 INFO util.GSet: VM type = 64-bit
14/08/21 04:51:26 INFO util.GSet: 0.447746% max memory 966.7 MB = 297.0 KB
14/08/21 04:51:26 INFO util.GSet: capacity = 2^15 = 32768 entries
14/08/21 04:51:27 INFO common.Storage: Storage directory /data/hadoop/name has been successfully formatted.
14/08/21 04:51:27 INFO namenode.FSImage: Saving image file /data/hadoop/name/current/fsimage.ckpt_0000000 using no compression
14/08/21 04:51:27 INFO namenode.FSImage: Image file /data/hadoop/name/current/fsimage.ckpt_0000000 of size 218 bytes saved in 0 seconds.
14/08/21 04:51:27 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid &= 0
14/08/21 04:51:27 INFO util.ExitUtil: Exiting with status 0
14/08/21 04:51:27 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at name01/192.168.52.128
************************************************************/
注意:上面只要出现&successfully formatted&就表示成功了,只在第一次启动的时候格式化,不要每次启动都格式化。理解为我们新买了块移动硬盘,使用之前总要格式化硬盘。如果真的有必要再次格式化,请先把&$HADOOP_HOME/tmp&目录下的文件全部删除。
读者可以自己观察目录&$HADOOP_HOME/tmp&在格式化前后的变化情况。格式化操作很少有出现失败的情况。如果真出现了,请检查配置是否正确。
3.4,Hadoop管理
3.4.1 格式化完成后,开始启动hadoop 程序启动hadoop 的命令脚本都在$HADOOP_HOME/sbin/下,下面的所有命令都不再带有完整路径名称:
distribute-exclude.sh hdfs-config.sh slaves.sh start-dfs.cmd start-yarn.sh stop-dfs.cmd stop-yarn.sh
hadoop-daemon.sh httpfs.sh start-all.cmd start-dfs.sh stop-all.cmd stop-dfs.sh yarn-daemon.sh
hadoop-daemons.sh mr-jobhistory-daemon.sh start-all.sh start-secure-dns.sh stop-all.sh stop-secure-dns.sh yarn-daemons.sh
hdfs-config.cmd refresh-namenodes.sh start-balancer.sh start-yarn.cmd stop-balancer.sh stop-yarn.cmd
讲述hadoop 启动的三种方式:
3.4.2,第一种,一次性全部启动:
执行start-all.sh 启动hadoop,观察控制台的输出,可以看到正在启动进程,分别是namenode、datanode、secondarynamenode、jobtracker、tasktracker,一共5 个,待执行完毕后,并不意味着这5 个进程成功启动,上面仅仅表示系统正在启动进程而已。我们使用jdk 的命令jps 查看进程是否已经正确启动。执行以下jps,如果看到了这5 个进程,说明hadoop 真的启动成功了。如果缺少一个或者多个,那就进入到&Hadoop的常见启动错误&章节寻找原因了。
停止应用:
/home/hadoop/src/hadoop-2.3.0/sbin/stop-all.sh
启动应用:
/home/hadoop/src/hadoop-2.3.0/sbin/start-all.sh
[hadoop@name01 hadoop]$ /home/hadoop/src/hadoop-2.3.0/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [name01]
name01: starting namenode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-namenode-name01.out
data01: starting datanode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-datanode-name01.out
data02: starting datanode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-datanode-name01.out
Starting secondary namenodes [name01]
name01: starting secondarynamenode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-secondarynamenode-name01.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-resourcemanager-name01.out
data02: starting nodemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-nodemanager-name01.out
data01: starting nodemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-nodemanager-name01.out
[hadoop@name01 bin]$
3.4.2.1,检查后台各个节点运行的hadoop进程
[hadoop@name01 hadoop]$ jps
8601 ResourceManager
8458 SecondaryNameNode
8285 NameNode
[hadoop@name01 hadoop]$
[hadoop@name01 ~]$ jps
-bash: jps: command not found
[hadoop@name01 ~]$
[hadoop@name01 ~]$ /usr/lib/jvm/jdk1.7.0_60/bin/jps
5812 NodeManager
5750 DataNode
[hadoop@name01 ~]$
[root@data01 ~]# jps
5812 NodeManager
5750 DataNode
[root@data01 ~]
3.4.2.2,为什么在root下能单独用jps命令,su到hadoop不行,search了下,原因是我加载jdk路径的时候用的是
vim ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
保存退出,然后输入下面的命令来使之生效
source ~/.bashrc
方式,这种只对当前用户生效,我的jdk是用root安装的,所以su到hadoop就无法生效了,怎么办?用/etc/profile,在文件最末端添加jdk路径
[root@data01 ~]# vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
#保存退出,然后输入下面的命令来使之生效:
[root@data01 ~]# source /etc/profile
su - hadoop
[hadoop@data01 ~]$ jps
6891 DataNode
7025 NodeManager
[hadoop@data01 ~]$
OK,在hadoop账号下,jps也生效
3.4.2.3,再去data02节点下检查
[hadoop@data02 ~]$ jps
10609 NodeManager
10540 DataNode
[hadoop@data02 ~]$
查看到2个data节点的进程都启动起来了,恭喜&&&&
3.4.2.4,通过网站查看hadoop集群情况
在浏览器中输入:http://192.168.52.128:50030/dfshealth.html,网址为name01结点(也就是master主库节点)所对应的IP:
结果显示一片空白:
在浏览器中输入:http://192.168.1.100:50070,网址为name01结点(也就是master主库节点)所对应的IP
进入http://192.168.52.128:50070/dfshealth.html#tab-overview,看集群基本信息,如下图所示:
进入http://192.168.52.128:50070/dfshealth.html#tab-datanode,看datanode信息,如下图所示:
进入http://192.168.52.128:50070/logs/,查看所有日志信息,如下图所示:
注:以上图片上传到红联频道中。
至此,hadoop的完全分布式集群安装已经全部完成,可以好好睡个觉了。
3.4.2.5,关闭Hadoop 的命令是stop-all.sh,如下所示:
[hadoop@name01 src]$ /home/hadoop/src/hadoop-2.3.0/sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [name01]
name01: stopping namenode
data01: stopping datanode
data02: stopping datanode
Stopping secondary namenodes [name01]
name01: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
data02: stopping nodemanager
data01: stopping nodemanager
no proxyserver to stop
[hadoop@name01 src]$
上面的命令是最简单的,可以一次性把所有节点都启动、关闭。
3.4.3,第二种,分别启动HDFS 和yarn:
执行命令start-dfs.sh,是单独启动hdfs。执行完该命令后,通过jps 能够看到NameNode、DataNode、SecondaryNameNode 三个进程启动了,该命令适合于只执行hdfs
存储不使用yarn做资源管理。关闭的命令就是stop-dfs.sh 了。
3.4.3.1 先启动HDFS
[hadoop@name01 sbin]$ jps
[hadoop@name01 sbin]$ pwd
/home/hadoop/src/hadoop-2.3.0/sbin
[hadoop@name01 sbin]$ start-dfs.sh
Starting namenodes on [name01]
name01: starting namenode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-namenode-name01.out
data01: starting datanode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-datanode-data01.out
data02: starting datanode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-datanode-data02.out
Starting secondary namenodes [name01]
name01: starting secondarynamenode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-secondarynamenode-name01.out
在name01节点下,查看后台的jps进程如下:
[hadoop@name01 sbin]$ jps
3800 NameNode
3977 SecondaryNameNode
[hadoop@name01 sbin]$
[root@hadoop03 src]# jps
13859 DataNode
去data01节点看下,后台的jps进程如下:
[hadoop@data01 ~]$ jps
2863 DataNode
[hadoop@data01 ~]$
3.4.3.2 再启动yarn
执行命令start-yarn.sh,可以单独启动资源管理器的服务器端和客户端进程,关闭的命令就是stop-yarn.sh
[hadoop@name01 sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-resourcemanager-name01.out
data01: starting nodemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-nodemanager-data01.out
data02: starting nodemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-nodemanager-data02.out
在name01节点下,查看后台的jps进程,多了一个ResourceManager进程,如下所示:
[hadoop@name01 sbin]$ jps
4601 ResourceManager
3800 NameNode
3977 SecondaryNameNode
[hadoop@name01 sbin]$
去data01节点看下,后台的jps进程多了一个NodeManager进程,如下所示:
[hadoop@data01 ~]$ jps
3249 NodeManager
2863 DataNode
[hadoop@data01 ~]$
3.4.3.3 依次关闭,先关闭yarn再关闭HDFS
[hadoop@name01 sbin]$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
data01: stopping nodemanager
data02: stopping nodemanager
no proxyserver to stop
[hadoop@name01 sbin]$
[hadoop@name01 sbin]$ stop-dfs.sh
Stopping namenodes on [name01]
name01: stopping namenode
data01: stopping datanode
data02: stopping datanode
Stopping secondary namenodes [name01]
name01: stopping secondarynamenode
[hadoop@name01 sbin]$
PS:当然,也可以先启动MapReduce,再启动HDFS。说明HDFS 和MapReduce的进程之间是互相独立的,没有依赖关系。
3.4.4,第三种,分别启动各个进程:
[root@book0 bin]# jps
[root@book0 bin]# hadoop-daemon.sh start namenode
[root@book0 bin]# hadoop-daemon.sh start datanode
[root@book0 bin]# hadoop-daemon.sh start secondarynamenode
[root@book0 bin]# hadoop-daemon.sh start jobtracker
[root@book0 bin]# hadoop-daemon.sh start tasktracker
[root@book0 bin]# jps
14855 NameNode
14946 DataNode
15043 SecondaryNameNode
15196 TaskTracker
15115 JobTracker
执行的命令是&hadoop-daemon.sh start [进程名称]&,这种启动方式适合于单独增加、删除节点的情况,在安装集群环境的时候会看到
3.5,另外一种检查状态hadoop集群的状态
:用&hadoop dfsadmin -report&来查看hadoop集群的状态
[hadoop@name01 sbin]$ &hadoop dfsadmin -report&
-bash: hadoop dfsadmin -report: command not found
[hadoop@name01 sbin]$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Configured Capacity:
(54.66 GB)
Present Capacity:
(45.11 GB)
DFS Remaining:
(45.11 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Live datanodes:
Name: 192.168.52.130:50010 (data02)
Hostname: data02
Decommission Status : Normal
Configured Capacity:
(27.33 GB)
DFS Used: 24576 (24 KB)
Non DFS Used:
DFS Remaining:
(22.56 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.53%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Fri Aug 22 00:04:58 PDT 2014
Name: 192.168.52.129:50010 (data01)
Hostname: data01
Decommission Status : Normal
Configured Capacity:
(27.33 GB)
DFS Used: 24576 (24 KB)
Non DFS Used:
DFS Remaining:
(22.56 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.53%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Fri Aug 22 00:04:58 PDT 2014
3.5,hadoop集群测试
3.5.1 运行简单的 MapReduce 计算
在$HADOOP_HOME 下有个jar 包,叫hadoop-example-2.2.0.jar,没有的话找其他版本的测试jar包;
执行如下命令,命令用法如下:hadoop jar hadoop-example-1.1.2.jar,
[root@name01 ~]# find / -name hadoop-example-1.1.2.jar
[root@name01 ~]#
jar包不存在,需要找出用来测试的jar包,用模糊*搜索find / -name hadoop-*examp*.jar,如下所示:
[root@name01 ~]# find / -name hadoop-*examp*.jar
/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.3.0-sources.jar
/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.3.0-test-sources.jar
/home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
[root@name01 ~]#
hadoop jar /home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
[root@name01 ~]# su - hadoop
[hadoop@name01 ~]$ hadoop jar /home/hadoop/src/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[hadoop@name01 ~]$
验证是否可以登录:
hadoop fs -ls hdfs://192.168.52.128:9000/
hadoop fs -mkdir hdfs://192.168.1.201:9000/testfolder
测试计算文本字符数目:
hadoop jar hadoop-examples-0.20.2-cdh3u5.jar wordcount /soft/BUILDING.txt /wordcountoutput
[hadoop@hadoop01 hadoop-2.3.0]$ hadoop jar hadoop-examples-0.20.2-cdh3u5.jar wordcount /soft/hadoop-2.3.0/release-2.3.0/BUILDING.txt /wordcountoutput
查看执行结果:
[hadoop@hadoop01 hadoop-2.2.0]$ hadoop fs -ls /wordcountoutput
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0
11:30 /wordcountoutput/_SUCCESS
-rw-r--r-- 3 hadoop supergroup -02 11:30 /wordcountoutput/part-r-00000
[hadoop@hadoop01 hadoop-2.2.0]$ hadoop fs -text /wordcountoutput/part-r-00000
&PLATFORM& 1
&Platform&, 1
&platform&. 1
'-nsu' 1
'deploy' 1
'install', 1
------分隔线----------------------------

我要回帖

更多关于 linux系统安装 的文章

 

随机推荐