1.前置条件:
1.1 环境准备:
Java 8 Linux,
Mac OS X或其他类Unix操作系统(不支持Windows)
8G的RAM
2个vCPU
1.2 下载并解压 druid
下载地址:https://www./dyn/closer.cgi?path=/incubator/druid/0.13.0-incubating/apache-druid-0.13.0-incubating-bin.tar.gz
解压命令:tar -xzf apache-druid-0.13.0-incubating-bin.tar.gz
进入目录:cd apache-druid-0.13.0-incubating
1.3 下载并解压 zookeeper
Druid依赖Apache ZooKeeper进行分布式协调,在druid的根目录中,下载并运行Zookeeper。运行以下命令:
curl https://archive./dist/zookeeper/zookeeper-3.4.11/zookeeper-3.4.11.tar.gz -o zookeeper-3.4.11.tar.gz
tar -xzf zookeeper-3.4.11.tar.gz
mv zookeeper-3.4.11 zk
1.4 启动并运行druid
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
这将带来Zookeeper和Druid服务的实例,所有这些都在本地机器上运行,例如:
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
[Tue Dec 25 16:11:35 2018] Running command[zk], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf
[Tue Dec 25 16:11:35 2018] Running command[coordinator], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf
[Tue Dec 25 16:11:35 2018] Running command[broker], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/broker.log]: bin/run-druid broker quickstart/tutorial/conf
[Tue Dec 25 16:11:35 2018] Running command[historical], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/historical.log]: bin/run-druid historical quickstart/tutorial/conf
[Tue Dec 25 16:11:35 2018] Running command[overlord], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/overlord.log]: bin/run-druid overlord quickstart/tutorial/conf
[Tue Dec 25 16:11:35 2018] Running command[middleManager], logging to[/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/var/sv/middleManager.log]: bin/run-druid middleManager quickstart/tutorial/conf
所有持久状态(如集群元数据存储和服务段)都将保存在apache-druid-0.13.0-incubating下的var目录中。服务的日志位于var / sv。
2.加载数据
此示例数据位于Druid软件包根目录下的quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz中。页面编辑事件作为JSON对象存储在文本文件中。
示例数据包含以下列,示例事件如下所示:
"time": "2015-09-12T00:47:47.870Z",
"channel": "#vi.wikipedia",
"comment": "clean up using [[Project:AWB|AWB]]",
"page": "Atractus duboisi",
"user": "ThitxongkhoiAWB",
2.1 通过kafka流式的方式加载数据
2.1.1 Kafka相关操作
(1)下载
curl -O https://archive./dist/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz
tar -xzf kafka_2.11-0.10.2.0.tgz
(2)启动kafka
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-server-start.sh config/server.properties
(3)查看主题
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --list
(4)创建主题
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wikipedia
(5)删除主题
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic wikipedia
2.1.2 druid 相关操作:
数据进入kafka之后,需要定义一个datasource specfile告诉druid怎样接入数据,里面声明时间戳格式,纬度列、指标列、预聚合的粒度等等。
(1)在druid中启动从kafka中摄入数据的处理逻辑
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ curl -XPOST -H'Content-Type: application/json' -d @quickstart/tutorial/wikipedia-kafka-supervisor.json http://localhost:8090/druid/indexer/v1/supervisor
说明:如果主管成功创建,您将收到包含主管ID的回复;在我们的例子中,我们应该看到{“id”:“wikipedia-kafka”}
数据(定义datasource):wikipedia-kafka-supervisor.json
"dataSource": "wikipedia",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }
"segmentGranularity": "DAY",
"queryGranularity": "NONE",
"reportParseExceptions": false
"completionTimeout": "PT20M",
"bootstrap.servers": "localhost:9092"
(2)解压wikiticker-2015-09-12-sampled.json.gz
MacBook-Air-3:tutorial g2$ pwd
/Users/g2/myresource/druid/apache-druid-0.13.0-incubating/quickstart/tutorial
MacBook-Air-3:tutorial g2$ gunzip -k wikiticker-2015-09-12-sampled.json.gz
数据示例:
"time": "2015-09-12T00:48:02.596Z",
"channel": "#es.wikipedia",
"cityName": "Mexico City",
"comment": "Cambio en la redacción del texto y correción en sintaxis",
"regionName": "Mexico City",
"user": "189.217.75.123",
(3)在kafka目录中,解压如下命令(向kafka中写入数据)
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ export KAFKA_OPTS="-Dfile.encoding=UTF-8"
MacBook-Air-3:kafka_2.11-0.10.2.0 g2$ ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wikipedia < /Users/g2/myresource/druid/apache-druid-0.13.0-incubating/quickstart/tutorial/wikiticker-2015-09-12-sampled.json
现在,数据成功写入到kafka,接下来我们就可以到druid中进行查询了。
3.2 加载文件
3.2.1 准备数据和摄取任务规范:wikipedia-index.json
"dataSource" : "wikipedia",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"],
"baseDir" : "quickstart/tutorial/",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
"appendToExisting" : false
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
3.2.2 Load batch data
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ bin/post-index-task --file quickstart/tutorial/wikipedia-index.json
Beginning indexing data for wikipedia
Task started: index_wikipedia_2018-12-25T10:00:45.744Z
Task log: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-12-25T10:00:45.744Z/log
Task status: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-12-25T10:00:45.744Z/status
Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
Task index_wikipedia_2018-12-25T10:00:45.744Z still running...
Task finished with status: SUCCESS
Completed indexing data for wikipedia. Now loading indexed data onto the cluster...
wikipedia loading complete! You may now query your data
MacBook-Air-3:apache-druid-0.13.0-incubating g2$
现在,数据成功加载到druid,接下来我们就可以到druid中进行查询了。
3.查询数据
3.1 Native JSON queries
3.1.1 查询的数据请求示例
"dataSource" : "wikipedia",
"intervals" : ["2015-09-12/2015-09-13"],
3.1.2 查询:
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8082/druid/v2?pretty
3.1.3 执行结果
"timestamp" : "2015-09-12T00:46:58.771Z",
"page" : "Wikipedia:Vandalismusmeldung"
"page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
"page" : "Wikipedia:Administrators' noticeboard/Incidents"
"page" : "Flavia Pennetta"
"page" : "Total Drama Presents: The Ridonculous Race"
"page" : "User talk:Dudeperson176123"
"page" : "Wikipédia:Le Bistro/12 septembre 2015"
"page" : "Wikipedia:In the news/Candidates"
"page" : "Wikipedia:Requests for page protection"
"page" : "Utente:Giulio Mainardi/Sandbox"
"page" : "Wikipedia:Administrator intervention against vandalism"
"page" : "Anthony Martial"
"page" : "Template talk:Connected contributor"
"page" : "Chronologie de la Lorraine"
3.2 Druid SQL queries
德鲁伊还支持用于查询的SQL方言。让我们运行一个SQL查询,它等同于上面显示的本机JSON查询。
3.2.1 sql请求的json格式
"query":"SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE \"__time\" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10"
3.2.2 查询
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql
3.2.3 执行结果
"page": "Wikipedia:Vandalismusmeldung",
"page": "User:Cyde/List of candidates for speedy deletion/Subpage",
"page": "Wikipedia:Administrators' noticeboard/Incidents",
"page": "Flavia Pennetta",
"page": "Total Drama Presents: The Ridonculous Race",
"page": "User talk:Dudeperson176123",
"page": "Wikipédia:Le Bistro/12 septembre 2015",
"page": "Wikipedia:In the news/Candidates",
"page": "Wikipedia:Requests for page protection",
3.3 sql client
为方便起见,Druid包中包含一个SQL命令行客户端,位于Druid包根目录的bin/dsql中。 我们现在运行bin / dsql;你应该看到以下提示:
MacBook-Air-3:apache-druid-0.13.0-incubating g2$ bin/dsql
Welcome to dsql, the command-line client for Druid SQL.
3.3.1 示例1:
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung │ 33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
│ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
│ Total Drama Presents: The Ridonculous Race │ 18 │
│ User talk:Dudeperson176123 │ 18 │
│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
│ Wikipedia:In the news/Candidates │ 17 │
│ Wikipedia:Requests for page protection │ 17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.35s.
3.3.2 示例2:Timeseries
dsql> SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);
┌──────────────────────────┬──────────────┐
│ HourTime │ LinesDeleted │
├──────────────────────────┼──────────────┤
│ 2015-09-12T00:00:00.000Z │ 1761 │
│ 2015-09-12T01:00:00.000Z │ 16208 │
│ 2015-09-12T02:00:00.000Z │ 14543 │
│ 2015-09-12T03:00:00.000Z │ 13101 │
│ 2015-09-12T04:00:00.000Z │ 12040 │
│ 2015-09-12T05:00:00.000Z │ 6399 │
│ 2015-09-12T06:00:00.000Z │ 9036 │
│ 2015-09-12T07:00:00.000Z │ 11409 │
│ 2015-09-12T08:00:00.000Z │ 11616 │
│ 2015-09-12T09:00:00.000Z │ 17509 │
│ 2015-09-12T10:00:00.000Z │ 19406 │
│ 2015-09-12T11:00:00.000Z │ 16284 │
│ 2015-09-12T12:00:00.000Z │ 18672 │
│ 2015-09-12T13:00:00.000Z │ 30520 │
│ 2015-09-12T14:00:00.000Z │ 18025 │
│ 2015-09-12T15:00:00.000Z │ 26399 │
│ 2015-09-12T16:00:00.000Z │ 24759 │
│ 2015-09-12T17:00:00.000Z │ 19634 │
│ 2015-09-12T18:00:00.000Z │ 17345 │
│ 2015-09-12T19:00:00.000Z │ 19305 │
│ 2015-09-12T20:00:00.000Z │ 22265 │
│ 2015-09-12T21:00:00.000Z │ 16394 │
│ 2015-09-12T22:00:00.000Z │ 16379 │
│ 2015-09-12T23:00:00.000Z │ 15289 │
└──────────────────────────┴──────────────┘
Retrieved 24 rows in 0.25s.
3.3.3 GroupBy
dsql> SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;
┌───────────────┬─────────┐
├───────────────┼─────────┤
│ #en.wikipedia │ 3045299 │
│ #it.wikipedia │ 711011 │
│ #fr.wikipedia │ 642555 │
│ #ru.wikipedia │ 640698 │
│ #es.wikipedia │ 634670 │
└───────────────┴─────────┘
Retrieved 5 rows in 0.13s.
3.3.4 EXPLAIN PLAN FOR
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Retrieved 1 row in 0.09s.
参考文献:http:///docs/latest/tutorials/index.html
|