程序员人生 网站导航

Alex 的 Hadoop 菜鸟教程: 第11课 Hive的Java调用

栏目:服务器时间:2015-02-02 09:12:02

声明

  • 本文基于Centos 6.x + CDH 5.x
说到Hive就1定要说到写程序的时候怎样调用Hive。以下我通过1个例子说明如果通过java来调用hive查询数据

数据准备

建立1个文本文件叫 a.txt,内容是
1,terry 2,alex 3,jimmy 4,mike 5,kate

并上传到hive服务器的  /data/ 目录下

JDBC调用方法

加载Driver

加载driver (只说hive2的jdbc)
Class.forName("org.apache.hive.jdbc.HiveDriver");

连接数据库

Connection con = DriverManager.getConnection("jdbc:hive2://host1:10000/default", "hive", "");

  • 这里的 host1 是主机名
  • 10000是hive默许的端口名
  • default是默许的database
  • hive是默许的用户名,默许密码是空

数据库操作语句

删除表
stmt.execute("drop table if exists " + tableName);

创建表
stmt.execute("create table " + tableName + " (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '54'");

查询数据
ResultSet res = stmt.executeQuery("select * from " + tableName);

导入数据
stmt.execute("load data local inpath '" + filepath + "' into table " + tableName);


例子

建立项目

先打开eclipse建立1个maven项目


pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven⑷.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.crazycake</groupId> <artifactId>play-hive</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>play-hive</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF⑻</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>0.14.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.2.0</version> </dependency> </dependencies> <build> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>2.0.2</version> <configuration> <source>1.6</source> <target>1.6</target> <encoding>UTF⑻</encoding> <optimise>true</optimise> <compilerArgument>-nowarn</compilerArgument> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"> </transformer> </transformers> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project>



其中最重要的就是这两段
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>0.14.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.2.0</version> </dependency>

其他的都无所谓

建表、导入和查询数据

建立1个类 HiveJdbcClient
package org.crazycake.play_hive; import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; /** * 测试hive 的客户端连接 * @author alexxiyang (https://github.com/alexxiyang) * */ public class HiveJdbcClient { /** * 注意:hive-server2 援用的driver是 org.apache.hive.* 而 hive-server 是 org.apache.hadoop.hive.* */ private static String driverName = "org.apache.hive.jdbc.HiveDriver"; /** * @param args * @throws SQLException */ public static void main(String[] args) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(1); } //hive的默许端口是 10000,如果要修改就修改 hive-site.xml 文件的hive.server2.thrift.port 属性值 //默许用户名hive,默许密码为空 Connection con = DriverManager.getConnection("jdbc:hive2://host1:10000/default", "hive", ""); Statement stmt = con.createStatement(); //测试的表名 testhivedrivertable String tableName = "testhivedrivertable"; //如果已存在就删除 stmt.execute("drop table if exists " + tableName); //创建这张表 stmt.execute("create table " + tableName + " (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '54'"); //看下创建是不是成功 String sql = "show tables '" + tableName + "'"; System.out.println("Running: " + sql); ResultSet res = stmt.executeQuery(sql); if (res.next()) { System.out.println(res.getString(1)); } //看下表结构 sql = "describe " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1) + " " + res.getString(2)); } // 加载数据到表里面 // NOTE: filepath 是本地文件所在的位置,注意这个本地不是你的电脑! // 你得先把这个文件上传到服务器,然后这里的路径是服务器上这个文件的路径 // NOTE: /data/a.txt String filepath = "/data/a.txt"; sql = "load data local inpath '" + filepath + "' into table " + tableName; System.out.println("Running: " + sql); stmt.execute(sql); // select * query sql = "select * from " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(String.valueOf(res.getInt(1)) + " " + res.getString(2)); } // count 1下看有多少条数据 sql = "select count(1) from " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); if (res.next()) { System.out.println(res.getString(1)); } } }


输出是
Running: show tables 'testhivedrivertable' testhivedrivertable Running: describe testhivedrivertable key int value string Running: load data local inpath '/data/a.txt' into table testhivedrivertable Running: select * from testhivedrivertable 1 terry 2 alex 3 jimmy 4 mike 5 kate Running: select count(1) from testhivedrivertable

其实java调用很简单的,就是把你在hive shell 里面履行的语句用jdbc履行1遍而已,所以你传输过去的语句的环境是hive server机器,里面写的路径也是从hive server主机的根目录路径动身去寻觅数据,所以我们的 a.txt 得上传到服务器上,这段代码才会运行正常。


参考资料

  • https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC

------分隔线----------------------------
------分隔线----------------------------

最新技术推荐