分享

实战 | Python脚本收集腾讯云CDN日志,并入ELK日志分析

 _王文波 2017-03-09

ID:FuDanBigData

(我)负责搭建公司日志分析,一直想把CDN日志也放入到日志分析,前些日志终于达成所愿,现在贴出具体做法:

1、收集日志

腾讯云CDN日志一般一小时刷新一次,也就是说当前只能下载一小时之前的日志数据,但据本人观察,有时前一小时的并下载不到,所以为了保险起见,可以下载两小时之前的日志数据。下载日志可以通过腾讯云的API获取日志列表,然后下载。

腾讯云日志下载 API 链接:

日志采集脚本:get_cdn_log.py

  1. [root@BJVM-2-181bin]# cat get_cdn_log.py

  2. #!/usr/bin/env python

  3. # coding=utf-8

  4. importhashlib

  5. importrequests

  6. importhmac

  7. importrandom

  8. importtime

  9. importbase64

  10. importjson

  11. importgzip

  12. importos

  13. importsys

  14. fromdatetimeimportdatetime, timedelta

  15. classSign(object):

  16. def__init__(self, secretId, secretKey):

  17. self.secretId = secretId

  18. self.secretKey = secretKey

  19. # 生成签名串

  20. defmake(self, requestHost, requestUri, params, method='GET'):

  21. srcStr = method.upper + requestHost + requestUri +'?'+'&'.join(k.replace('_''.') +'='+ str(params[k])forkinsorted(params.keys))

  22. hashed = hmac.new(self.secretKey, srcStr, hashlib.sha1)

  23. returnbase64.b64encode(hashed.digest)

  24. classCdnHelper(object):

  25. SecretId='AKIDLsldjflsdjflsdjflsdjfpGSO5XoGiY9'

  26. SecretKey='SeaHjSDFLJSLDFJQIuFJ7rMiz0lGV'

  27. requestHost='cdn.api.qcloud.com'

  28. requestUri='/v2/index.php'

  29. def__init__(self, host, startDate, endDate):

  30. self.host = host

  31. self.startDate = startDate

  32. self.endDate = endDate

  33. self.params = {

  34. 'Timestamp': int(time.time),

  35. 'Action':'GetCdnLogList'

  36. 'SecretId':CdnHelper.SecretId

  37. 'Nonce': random.randint(1000000099999999),

  38. 'host': self.host,

  39. 'startDate': self.startDate,

  40. 'endDate': self.endDate

  41. }

  42. self.params['Signature'] = Sign(CdnHelper.SecretIdCdnHelper.SecretKey).make(CdnHelper.requestHost,CdnHelper.requestUri, self.params)

  43. self.url ='https://%s%s'% (CdnHelper.requestHost,CdnHelper.requestUri)

  44. defGetCdnLogList(self):

  45. ret = requests.get(self.url, params=self.params)

  46. returnret.json

  47. classGZipTool(object):

  48. '''

  49. 压缩与解压gzip

  50. '''

  51. def__init__(self, bufSize =1024*8):

  52. self.bufSize = bufSize

  53. self.fin =None

  54. self.fout =None

  55. defcompress(self, src, dst):

  56. self.fin = open(src,'rb')

  57. self.fout = gzip.open(dst,'wb')

  58. self.__in2out

  59. defdecompress(self, gzFile, dst):

  60. self.fin = gzip.open(gzFile,'rb')

  61. self.fout = open(dst,'wb')

  62. def__in2out(self,):

  63. whileTrue:

  64. buf = self.fin.read(self.bufSize)

  65. iflen(buf) 1:

  66. break

  67. self.fout.write(buf)

  68. self.fin.close

  69. self.fout.close

  70. defdownload(link, name):

  71. try:

  72. r = requests.get(link)

  73. withopen(name,'wb')asf:

  74. f.write(r.content)

  75. returnTrue

  76. except:

  77. returnFalse

  78. defwritelog(src, dst):

  79. # 保存为以天命名日志

  80. dst = dst.split('-')[0][:-2] +'-'+ dst.split('-')[1]

  81. withopen(src,'r')asf1:

  82. withopen(dst,'a+')asf2:

  83. forlineinf1:

  84. f2.write(line)

  85. if__name__ =='__main__':

  86. #startDate = '2017-02-23 12:00:00'

  87. #endDate = '2017-02-23 12:00:00'

  88. # 前一小时

  89. # startDate = endDate = time.strftime('%Y-%m-%d ', time.localtime) + str(time.localtime.tm_hour-1) + ':00:00'

  90. tm = datetime.now + timedelta(hours=-2)

  91. startDate = endDate = tm.strftime('%Y-%m-%d %H:00:00')

  92. #hosts = ['userface.51img1.com']

  93. hosts = [

  94. 'pfcdn.xxx.com'

  95. 'pecdn.xxx.com'

  96. 'pdcdn.xxx.com'

  97. 'pccdn.xxx.com'

  98. 'pbcdn.xxx.com'

  99. 'pacdn.xxx.com'

  100. 'p9cdn.xxx.com'

  101. 'p8cdn.xxx.com'

  102. 'p7cdn.xxx.com'

  103. ]

  104. forhostinhosts:

  105. try:

  106. obj =CdnHelper(host, startDate,endDate)

  107. ret = obj.GetCdnLogList

  108. link = ret['data']['list'][0]['link']

  109. name = ret['data']['list'][0]['name']

  110. # 下载链接保存的文件名

  111. gzip_name ='/data/logs/cdn/cdn_log_temp/'+ name +'.gz'

  112. # 解压后的文件名

  113. local_name =+ name +'.log'

  114. # 追加的文件名

  115. real_path ='/data/logs/cdn/'+ name +'.log'

  116. printlocal_name, real_path

  117. status = download(link, gzip_name)

  118. ifstatus:

  119. try:

  120. GZipTool.decompress(gzip_name, local_name)

  121. writelog(local_name, real_path)

  122. # os.remove(gzip_name)

  123. os.remove(local_name)

  124. except:

  125. continue

  126. exceptException,e:

  127. printe

  128. continue

放到定时任务,每小时执行一次。

  1. # cdn日志

  2. 30*/1 * * * /usr/bin/python /root/bin/get_cdn_log.py &> /dev/null

此图解压后的日志,每个域名保存为一个文件,按天分割。

2、filebeat配置(具体含义查看官方文档)

  1. [root@BJ-2-11bin]# cat /usr/local/app/filebeat-1.2.3-x86_64/nginx-php.yml

  2. filebeat:

  3. prospectors:

  4. -

  5. paths:

  6. - /data/logs/cdn/*.log

  7. document_type: cdn-log

  8. input_type: log

  9. #tail_files: true

  10. multiline:

  11. negate: true

  12. match: after

  13. output:

  14. logstash:

  15. hosts: ['10.80.2.181:5048''10.80.2.182:5048']

  16. shipper:

  17. logging:

  18. files:

3、logstash配置

日志格式:

  1. 2017022715211661.135.234.125cdn.xxx.com /game/2017/201701/20170121/57037f7fc1a0dde9091d4fe6502a6c53.jpg177692226200//www.xxx.com/ 5 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; NetworkBench/7.0.0.282-5004888-124025)' '(null)' GET HTTP/1.1 hit

日志内容依次包括:请求时间、访问域名的客户端IP、被访问域名、文件请求路径、本次访问字节数大小、省份、运营商、http返回码、referer信息、request- time(毫秒)、User-Agent、range、HTTP Method、HTTP协议标识、缓存Hit/Miss。

配置文件

  1. # /usr/local/app/logstash-2.3.4/conf.d/logstash.conf

  2. input {

  3. beats {

  4. port =>5048

  5. host =>'0.0.0.0'

  6. }

  7. }

  8. filter {

  9. .....(省略)

  10. elseif[type] =='cdn-log'{

  11. grok {

  12. patterns_dir => ['./patterns']

  13. match => {'message'=>'%{DATESTAMP_EVENTLOG:timestamp} %{IPORHOST:client_ip} %{IPORHOST:server_name} %{NOTSPACE:request} %{NUMBER:bytes} %{NUMBER:province} %{NUMBER:operator} %{NUMBER:status} (?:%{URI:referrer}|%{WORD:referrer}) %{NUMBER:request_time} %{QS:agent} \'\(%{WORD:range}\)\' %{WORD:method} HTTP/%{NUMBER:protocol} %{WORD:cache}'}

  14. }

  15. date {

  16. match => ['timestamp''yyyyMMddHHmmss']

  17. target =>'@timestamp'

  18. }

  19. alter {

  20. condrewrite => [

  21. 'province''22''北京'

  22. 'province''86''内蒙古'

  23. 'province''146''山西'

  24. 'province''1069''河北'

  25. 'province''1077''天津'

  26. 'province''119''宁夏'

  27. 'province''152''陕西'

  28. 'province''1208''甘肃'

  29. 'province''1467''青海'

  30. 'province''1468''新疆'

  31. 'province''145''黑龙江'

  32. 'province''1445''吉林'

  33. 'province''1464''辽宁'

  34. 'province''2''福建'

  35. 'province''120''江苏'

  36. 'province''121''安徽'

  37. 'province''122''山东'

  38. 'province''1050''上海'

  39. 'province''1442''浙江'

  40. 'province''182''河南'

  41. 'province''1135''湖北'

  42. 'province''1465''江西'

  43. 'province''1466''湖南'

  44. 'province''118''贵州'

  45. 'province''153''云南'

  46. 'province''1051''重庆'

  47. 'province''1068''四川'

  48. 'province''1155''西藏'

  49. 'province''4''广东'

  50. 'province''173''广西'

  51. 'province''1441''海南'

  52. 'province''0''其他'

  53. 'province''1''港澳台'

  54. 'province''1''海外'

  55. 'operator''2''中国电信'

  56. 'operator''26''中国联通'

  57. 'operator''38''教育网'

  58. 'operator''43''长城宽带'

  59. 'operator''1046''中国移动'

  60. 'operator''3947''中国铁通'

  61. 'operator''-1''海外运营商'

  62. 'operator''0''其他运营商'

  63. ]

  64. }

  65. }

  66. }# filter

  67. output {

  68. if'_grokparsefailure'in[tags] {

  69. file { path =>'/var/log/logstash/grokparsefailure-%{[type]}-%{+YYYY.MM.dd}.log'}

  70. }

  71. ......(省略)

  72. elseif[type] =='cdn-log'{

  73. elasticsearch {

  74. hosts => ['10.80.2.13:9200''10.80.2.14:9200''10.80.2.15:9200''10.80.2.16:9200']

  75. sniffing => true

  76. manage_template => true

  77. template_overwrite => true

  78. template_name =>'cdn'

  79. template =>'/usr/local/app/logstash-2.3.4/templates/cdn.json'

  80. index =>'%{[type]}-%{+YYYY.MM.dd}'

  81. document_type =>'%{[type]}'

  82. }

  83. }

  84. ......(省略)

  85. }# output

4 效果图(一小时数据)

cdn使用量效果图

cdn访问情况统计

状态码统计

1:一百多篇大数据文档下载!

2超全数据分析资料免费下载!(包括SQL,R语言,SPSS,SAS,python,数据分析和数据挖掘)

3清华大学数据科学院讲座内容集锦免费下载!

4Python超全资料分享!

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多