为什么数据bin/hadoopcommon2.7.2bin

【大数据笔记】-解读hadoop命令 - flyfoxs - ITeye技术网站
博客分类:
下面是hadoop发布版本, bin目录下面的hadoop命令的源码,hadoop命令支持好多种参数,一直记不住,想通过精度这部分代码,能记住部分参数.
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.
See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script runs the hadoop core commands.
#这3行命令的主要目的是,获取Hadoop运行所在目录.
bin=`which $0`
bin=`dirname ${bin}`
bin=`cd "$bin"; pwd`
#定位找到 hadoop-config.sh 文件,里面包含了很多Hadoop命令的配置文件.
#先找HADOOP_LIBEXEC_DIR目录,如果没有定义,就使用默认的路径,也就是hadoop根目录下面的libexec
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
function print_usage(){
echo "Usage: hadoop [--config confdir] COMMAND"
where COMMAND is one of:"
run a generic filesystem user client"
print the version"
run a jar file"
checknative [-a|-h]
check native hadoop and compression libraries availability"
distcp &srcurl& &desturl& copy file or directories recursively"
archive -archiveName NAME -p &parent path& &src&* &dest& create a hadoop archive"
prints the class path needed to get the"
Hadoop jar and the required libraries"
get/set the log level for each daemon"
echo " or"
run the class named CLASSNAME"
echo "Most commands print help when invoked w/o parameters."
#如果命令参数个数为0,则打印提示,退出
if [ $# = 0 ]; then
print_usage
#解析第1个参数,第0个参数是命令本身
COMMAND=$1
case $COMMAND in
# usage flags
--help|-help|-h)
print_usage
#hdfs commands
namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups)
echo "DEPRECATED: Use of this script to execute hdfs command is deprecated." 1&&2
echo "Instead use the hdfs command for it." 1&&2
echo "" 1&&2
#try to locate hdfs and if present, delegate to it.
if [ -f "${HADOOP_HDFS_HOME}"/bin/hdfs ]; then
exec "${HADOOP_HDFS_HOME}"/bin/hdfs ${COMMAND/dfsgroups/groups}
elif [ -f "${HADOOP_PREFIX}"/bin/hdfs ]; then
exec "${HADOOP_PREFIX}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"
echo "HADOOP_HDFS_HOME not found!"
#mapred commands for backwards compatibility
pipes|job|queue|mrgroups|mradmin|jobtracker|tasktracker|mrhaadmin|mrzkfc|jobtrackerha)
echo "DEPRECATED: Use of this script to execute mapred command is deprecated." 1&&2
echo "Instead use the mapred command for it." 1&&2
echo "" 1&&2
#try to locate mapred and if present, delegate to it.
if [ -f "${HADOOP_MAPRED_HOME}"/bin/mapred ]; then
exec "${HADOOP_MAPRED_HOME}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
elif [ -f "${HADOOP_PREFIX}"/bin/mapred ]; then
exec "${HADOOP_PREFIX}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
echo "HADOOP_MAPRED_HOME not found!"
#打印出Hadoop执行时的classpath,方便查找classpath的错误
classpath)
CLASSPATH=`cygpath -p -w "$CLASSPATH"`
echo $CLASSPATH
#core commands
# the core commands
if [ "$COMMAND" = "fs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
elif [ "$COMMAND" = "version" ] ; then
CLASS=org.apache.hadoop.util.VersionInfo
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar
elif [ "$COMMAND" = "checknative" ] ; then
CLASS=org.apache.hadoop.util.NativeLibraryChecker
elif [ "$COMMAND" = "distcp" ] ; then
CLASS=org.apache.hadoop.tools.DistCp
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [ "$COMMAND" = "daemonlog" ] ; then
CLASS=org.apache.hadoop.log.LogLevel
elif [ "$COMMAND" = "archive" ] ; then
CLASS=org.apache.hadoop.tools.HadoopArchives
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [[ "$COMMAND" = -*
# class and package names cannot begin with a -
echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'"
#如果上面的都没匹配上,那么第一个参数作为classname 来解析,比如下面就是一个示例
#hadoop org.apache.hadoop.examples.WordCount /tmp/15 /tmp/46
CLASS=$COMMAND
#删除$@中的第一个参数,比如"hadoop org.apache.hadoop.examples.WordCount /tmp/15 /tmp/46"
#在运行shift之前$@=org.apache.hadoop.examples.WordCount /tmp/15 /tmp/46
#之后$@=/tmp/15 /tmp/46
# Always respect HADOOP_OPTS and HADOOP_CLIENT_OPTS
# 对应的这两个变量默认的定义在文件hadoop-config.sh,如果要修改启动参数,也可以修改这个文件,比如想开启远程debug
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
#make sure security appender is turned off
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"
#兼容cygwin模拟器
CLASSPATH=`cygpath -p -w "$CLASSPATH"`
#没什么意思,放在这是为了方便修改扩展CLASSPATH
export CLASSPATH=$CLASSPATH
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
浏览: 125698 次
来自: 合肥
下载后,进行安装就可以了。https://repos.fedo ...
java用的double,scala用的String,要精确, ...
不错了,之前的方法太麻烦了
已经失效,不过手动改下就ok了[epel-apache-mav ...
LinApex 写道卸载掉以前的类了谢谢分享【更新中】个人总结:在大数据hadoop路上跳过的坑 - 简书
【更新中】个人总结:在大数据hadoop路上跳过的坑
环境说明:CentOS-6.4-x86_64-bin-DVD1.isohadoop-2.4.1.tar.gzhbase-0.98.3-hadoop2-bin.tar.gzjdk-7u79-linux-x64.tar.gzscala-2.10.4.tgzspark-1.2.0-bin-hadoop2.4.tgz.tarzookeeper-3.4.6.tar.gz
附上下载地址:
hadoop2.4.1:hbase0.98.3:spark1.2.0:zookeeper3.4.6:scala2.10.4:
1、hadoop与hbase依赖关系【没选好,后面就等着推倒重来吧】
见:Apache HBase Reference Guide----4.1.Hadoop章节
Paste_Image.png
2、namenode无法启动
日志报错“ulimit -a for user root”解决:重新格式化namenode,然后启动hadoop,jps存在namenode。
3、然也有可能datanode无法启动
日志报错“ulimit -a for user root”原因:datanamenode运行时打开文件数,达到系统最大限制当前最大限制
[root@centos-FI hadoop-2.4.1]# ulimit -n
1024解决:调整最大打开文件数
[root@centos-FI hadoop-2.4.1]# ulimit -n 65536
[root@centos-FI hadoop-2.4.1]# ulimit -n
65536再次启动hadoop[root@centos-FI hadoop-2.4.1]# jps6330 WebAppProxyServer6097 NameNode6214 ResourceManager6148 DataNode6441 Jps6271 NodeManager6390 JobHistoryServerps:ulimit命令只是临时修改,重启又恢复默认,可在/etc/security/limits.conf 里修改 nofile 的限制。参考:
4、hadoop fs -ls报“WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable”
还是中招了64位操作系统,32位的jdk,不过还好该warn无影响可忽视。
5、hadoop fs -ls报“ls: `.': No such file or directory”
hadoop2.x不同于hadoop1.x的地方,1.x是可以执行的,而2.x的执行命令为:hadoop fs -ls /【坑啊】
6、启动hbase时,hmaster启动后又消失了,且hin/base shell后,list执行报错"ERROR:can't get master address from ZooK znode data == null"
[root@centos-FI hbase-0.98.3-hadoop2]# jps28406 NameNode28576 ResourceManager32196 HRegionServer32079 HMaster28464 DataNode32253 Jps28748 JobHistoryServer28635 NodeManager24789 QuorumPeerMain[root@centos-FI hbase-0.98.3-hadoop2]# jps28406 NameNode28576 ResourceManager32196 HRegionServer28464 DataNode32293 Jps28748 JobHistoryServer28635 NodeManager24789 QuorumPeerMain[root@centos-FI hbase-0.98.3-hadoop2]#
日志提示:
13:25:53,904 DEBUG [main-EventThread] master.ActiveMasterManager: A master is now available 13:25:53,912 INFO
[master:centos-FI:60000] master.ActiveMasterManager: Registered Active Master=centos-FI,9147113 13:25:53,921 INFO
[master:centos-FI:60000] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 13:25:54,250 FATAL [master:centos-FI:60000] master.HMaster: Unhandled exception. Starting shutdown.java.net.ConnectException: Call From centos-FI/127.0.0.1 to centos-FI:9000 failed on connection exception: java.net.ConnectException: C For more details see:
解决方法:根据
的这条提示信息"Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this)"检查:
[root@centos-FI ~]# hostname -i127.0.0.1 192.168.128.120[root@centos-FI hbase-0.98.3-hadoop2]# cat /etc/hosts127.0.0.1
#localhost localhost.localdomain localhost4 localhost4.localdomain4
#localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.128.120 master192.168.128.120 slave192.168.128.120 centos-FI
[root@centos-FI hbase-0.98.3-hadoop2]# cat /etc/hosts127.0.0.1
#localhost localhost.localdomain localhost4 localhost4.localdomain4
#localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.128.120 master192.168.128.120 slave192.168.128.120 centos-FI
[root@centos-FI hbase-0.98.3-hadoop2]# hostname -i192.168.128.120
hmaster进程正常后,hin/base shell再次执行list也就正常了当然hin/base shell后,list执行报错"ERROR:can't get master address from ZooK znode data == null"还有可能是其他原因,详见:
7、启动hbase时报错“localhost: ssh: Could not resolve hostname localhost: Temporary failure in name resolution”
[root@centos-FI hbase-0.98.3-hadoop2]# bin/start-hbase.shstarting master, logging to /opt/program/hbase-0.98.3-hadoop2/bin/../logs/hbase-root-master-centos-FI.outlocalhost: ssh: Could not resolve hostname localhost: Temporary failure in name resolution
以上报错,/etc/hosts中必须存在127.0.0.1 localhost
8、CRT进入hbase shell后无法退格
在secureCRT中,点击【选项】【会话选项】【终端】【仿真】,右边的终端选择linux,在hbase shell中如输入出错,按住Ctrl+删除键(backspace) 即可删除!
9、执行spark-example:HBaseTest.scala 如下报错:
WARN TableInputFormatBase: Cannot resolve the host name for master/192.168.128.120 because of municationException: DNS error [Root exception is java.net.PortUnreachableException: ICMP Port Unreachable]; remaining name '120.128.168.192.in-addr.arpa'
这是由于DNS服务器没有当前节点的记录(都没使用到DNS服务器),手动在/etc/hosts中添加一条记录"master
192.168.128.120"
10、使用flume拦截器时,报错"java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null"
sink是hdfs,然后使用目录自动生成功能。出现如题的错误,看官网文档说的是需要在每个文件记录行的开头需要有时间戳,但是时间戳的格式可能比较难调节,所以亦可设置hdfs.useLocalTimeStamp这个参数,比如以每个小时作为一个文件夹,那么配置应该是这样:
a1.sinks.k1.hdfs.path = hdfs://ubuntu:9000/flume/events/%y-%m-%d/%Ha1.sinks.k1.hdfs.filePrefix = events-a1.sinks.k1.hdfs.round = truea1.sinks.k1.hdfs.roundValue = 1a1.sinks.k1.hdfs.roundUnit = houra1.sinks.k1.hdfs.useLocalTimeStamp = true
修改之后再次执行flume确实是自动生成了hdfs目录,flume日志:
15:16:28,405 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:261)] Creating hdfs://master:9000/flume_test/15-07-23/events-.0.tmp
查看RM:查看hdfs:查看jobtrack;hbase master web ui:hbase region web ui:
spark监控之web UI:每个SparkContext启动一个UI,默认端口4040,多个SparkContext端口4040以此累加,显示应用信息:
高度器stage和task列表
RDD大小和内存占用概况
正在运行的executors信息
此信息之在应用运行时才能显示,若想运行后查看web UI,须在启动应用之前将spark-default.conf的spark.eventLog.enabled设置为true:spark将UI中显示的信息记录为spark事件,并记录到持久化存储中
spark.eventLog.enabled
true spark.eventLog.dir
hdfs://centos-FI:9000/spark_eventLog
hadoop2.4.1参考资料:
书写即观照Hadoop快速入门
这篇文档的目的是帮助你快速完成单机上的Hadoop安装与使用以便你对和Map-Reduce框架有所体会,比如在HDFS上运行示例程序或简单作业等。
GNU/Linux是产品开发和运行的平台。
Hadoop已在有2000个节点的GNU/Linux主机组成的集群系统上得到验证。
Win32平台是作为开发平台支持的。由于分布式操作尚未在Win32平台上充分测试,所以还不作为一个生产平台被支持。
Linux和Windows所需软件包括:
JavaTM1.5.x,必须安装,建议选择Sun公司发行的Java版本。
ssh 必须安装并且保证 sshd一直运行,以便用Hadoop
脚本管理远端Hadoop守护进程。
Windows下的附加软件需求
- 提供上述软件之外的shell支持。
如果你的集群尚未安装所需软件,你得首先安装它们。
以Ubuntu Linux为例:
$ sudo apt-get install ssh
$ sudo apt-get install rsync
在Windows平台上,如果安装cygwin时未安装全部所需软件,则需启动cyqwin安装管理器安装如下软件包:
openssh - Net 类
为了获取Hadoop的发行版,从Apache的某个镜像服务器上下载最近的
运行Hadoop集群的准备工作
解压所下载的Hadoop发行版。编辑
conf/hadoop-env.sh文件,至少需要将JAVA_HOME设置为Java安装根路径。
尝试如下命令:
$ bin/hadoop
将会显示hadoop 脚本的使用文档。
现在你可以用以下三种支持的模式中的一种启动Hadoop集群:
伪分布式模式
完全分布式模式
单机模式的操作方法
默认情况下,Hadoop被配置成以非分布式模式运行的一个独立Java进程。这对调试非常有帮助。
下面的实例将已解压的 conf 目录拷贝作为输入,查找并显示匹配给定正则表达式的条目。输出写入到指定的output目录。
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
伪分布式模式的操作方法
Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。
使用如下的 conf/hadoop-site.xml:
&configuration&
&&&property&
&&&&&name&fs.default.name&/name&
&&&&&value&localhost:9000&/value&
&&&/property&
&&&property&
&&&&&name&mapred.job.tracker&/name&
&&&&&value&localhost:9001&/value&
&&&/property&
&&&property&
&&&&&name&dfs.replication&/name&
&&&&&value&1&/value&
&&&/property&
&/configuration&
免密码ssh设置
现在确认能否不输入口令就用ssh登录localhost:
$ ssh localhost
如果不输入口令就无法用ssh登陆localhost,执行下面的命令:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub && ~/.ssh/authorized_keys
格式化一个新的分布式文件系统:
$ bin/hadoop namenode -format
启动Hadoop守护进程:
$ bin/start-all.sh
Hadoop守护进程的日志写入到
${HADOOP_LOG_DIR} 目录 (默认是
${HADOOP_HOME}/logs).
浏览NameNode和JobTracker的网络接口,它们的地址默认为:
NameNode -
JobTracker -
将输入文件拷贝到分布式文件系统:
$ bin/hadoop fs -put conf input
运行发行版提供的示例程序:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
查看输出文件:
将输出文件从分布式文件系统拷贝到本地文件系统查看:
$ bin/hadoop fs -get output output
$ cat output/*
在分布式文件系统上查看输出文件:
$ bin/hadoop fs -cat output/*
完成全部操作后,停止守护进程:
$ bin/stop-all.sh
完全分布式模式的操作方法
关于搭建完全分布式模式的,有实际意义的集群的资料可以在找到。
Java与JNI是Sun Microsystems, Inc.在美国以及其他国家地区的商标或注册商标。

我要回帖

更多关于 spark bin hadoop 的文章

 

随机推荐