【原】Fluentd输出插件：out_elasticsearch用法详解

RealPython 2021-03-14

展开全文

“ 把日志输出到elasticsearch做业务分析，这大概是最普遍的日志采集用途了”

out_elasticsearch 输出插件用于将日志记录写入elasticsearch。

缺省情况下，它使用的是 elasticsearch 的 bulk api，这表示out_elasticsearch会在单次 api 调用中同时操作 elasticsearch 上的多个索引。

这样做的好处是可以减少系统资源调用，显著提升索引速度。

同时也意味着，如果你首次使用这个插件输出日志，日志不会立即被推送到 elasticsearch。

当<buffer>指令中 chunk_keys 设置的条件达到时，日志记录将会被发送到 elasticsearch。

可以通过设置 chunk_keys 中的 time 和 time_key 参数来调整插件输出日志的频率。

chunk_key 的设置请参考：Fluentd配置：缓存（Buffer）配置项。

【安装方法】

如果你使用的是 td-agent v3.0.1及更高版本，out_elasticsearch 插件已打包在 td-agent 的安装包中，无需手动安装。

如果不是通过 td-agent 安装的 Fluentd，可使用 fluent-gem 来安装 out-elasticsearch。

$ fluent-gem install fluent-plugin-elasticsearch

【配置示例】

此处给出一个极简的配置片段，能满足大多数用户上手使用。

<match my.logs> @type elasticsearch host localhost port 9200 logstash_format true</match>

【插件参数】

@type
插件类型，取值为 elasticsearch
host
elasticsearch 节点主机名，缺省为 localhost
port
elasticsearch 节点的端口号，缺省为 9200
hosts
节点列表。可使用如下配置指定多个 elasticsearch 节点：
hosts host1:port1,host2:port2,host3:port3# orhosts https://:443/path,https://username:password@host-failover.com:443
使用 hosts 配置后，上边的 host 和 port 将不再生效。
user, password
登录 elasticsearch 的用户名和密码，缺省为 nil
user fluentpassword mysecret
schema
使用 http 还是 https，缺省为 http
index_name
将日志写入 elasticsearch 中的索引名称，缺省为 fluentd。
此参数支持使用占位符。
比如，你可以通过以下配置来将日志按照 tag 写入不同的索引中：
index_name fluentd.${tag}
一个更加实用的 index_name 值为 tag 和时间戳的组合：
index_name fluentd.${tag}.%Y%m%d
由于上边这个配置使用了时间占位符，我们需要在 chunk_keys 中同时设置 tag 和 time，并且需要指定 timekey 参数：
<buffer tag, time> timekey 1h # chunks per hours ("3600" also available)</buffer>
logstash_format
是否使用常规的 index 命名格式（logstash-%Y.%m.%d）。
缺省为 false，不使用。
此选项可取代 index_name 选项。
logstash_prefix
当 logstash_format 为 true 时，设置索引的前缀，缺省为 logstash

【其他杂项】

可使用 %{} 格式的占位符来转义需要进行 URL 编码的字符。

以下为合法配置：

user %{demo+}password %{@secret}

hosts https://%{j+hn}:%{passw@rd}@host1:443/elastic/,http://host2

以下为非法配置：

user demo+password @secret

【常见问题】

无法向 elasticsearch 发送日志事件
常见的失败原因为 out_elasticsearch 插件和 elasticsearch 实例版本不兼容。
比如，当前的 td-agent 绑定的是 6.x 系列的 elasticsearch-ruby 库，这意味着你的 elasticsearch 服务器的版本也应该是 6.x 。
可通过以下命令查看你使用的客户端库的版本：
# For td-agent users$ /usr/sbin/td-agent-gem list elasticsearch# For standalone Fluentd users$ fluent-gem list elasticsearch
如果你使用的 out_elasticsearch 版本为 v2.11.7 或者更高，可在配置文件中指定是否要检查版本兼容性：
validate_client_version true
如果版本不兼容，你会看到如下报错：
Detected ES 5 but you use ES client 6.1.0.Please consider to use 5.x series ES client.
无法看到详细的失败日志
这可能是由于插件和 elasticsearch 服务器使用的 ssl 协议版本不兼容导致的。
由于历史原因，out_elasticsearch 配置的 ssl_version 是 TLSv1，而现代 elasticsearch 生态要求 TLSv1.2或更高版本。
这种情况下，out_elasticsearch 会隐藏传输阶段的失败日志。
可通过以下配置开启传输日志：
with_transporter_log true@log_level debug
这样，你就会看到如下的失败日志：
2018-10-24 10:00:00 +0900 [error]: #0 [Faraday::ConnectionFailed] SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: unknown protocol (OpenSSL::SSL::SSLError) {:host=>"elasticsearch-host", :port=>80, :scheme=>"https", :user=>"elastic", :password=>"changeme", :protocol=>"https"}
这种情况也会发生在使用了反向代理的场景中。
比如，你的 elasticsearch 服务器位于 nginx 之后，而现代 nginx 也要求使用 TLSv1.2 及之后的 ssl 协议。
为何非法日志总是被无限重复处理？
有时候你可能会使用如下配置：
<match **> @type elasticsearch host localhost port 9200 type_name fluentd logstash_format true time_key @timestamp include_timestamp true reconnect_on_error true reload_on_failure true reload_connections false request_timeout 120s</match>
这种配置使用了泛型匹配（**），并且没有使用 @label 标签。这通常是一种有问题的配置。
fluentd 处理日志出错时，错误日志会被发送到 @ERROR 标注的配置段中进行处理，并且会打上 fluent.* 标签。
如果使用泛型匹配，则错误日志会被重新提交到 out_elasticsearch 插件的处理流程中，不断重复，造成洪泛现象。
这种情况下，你会看到如下的报错信息：
2018-11-13 11:16:27 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch" location=nil tag="app.fluentcat" time=2018-11-13 11:16:17.492985640 +0000 record={"message"=>"\xFF\xAD"}2018-11-13 11:16:38 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch" location=nil tag="fluent.warn" time=2018-11-13 11:16:27.978851140 +0000 record={"error"=>"#<Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError: 400 - Rejected by Elasticsearch>", "location"=>nil, "tag"=>"app.fluentcat", "time"=>2018-11-13 11:16:17.492985640 +0000, "record"=>{"message"=>"\xFF\xAD"}, "message"=>"dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error=\"400 - Rejected by Elasticsearch\" location=nil tag=\"app.fluentcat\" time=2018-11-13 11:16:17.492985640 +0000 record={\"message\"=>\"\\xFF\\xAD\"}"}
我们可以通过以下两种方法解决这个问题：
1，使用更加细化的 tag 进行事件路由
<match out.elasticsearch.**> @type elasticsearch host localhost port 9200 type_name fluentd logstash_format true time_key @timestamp include_timestamp true reconnect_on_error true reload_on_failure true reload_connections false request_timeout 120s</match>
2，使用 @label 进行事件路由
<source> @type forward @label @ES</source><label @ES> <match out.elasticsearch.**> @type elasticsearch host localhost port 9200 type_name fluentd logstash_format true time_key @timestamp include_timestamp true reconnect_on_error true reload_on_failure true reload_connections false request_timeout 120s </match></label><label @ERROR> <match **> @type stdout </match></label>