kibana自带grok插件工具
处理日志读取,思路是:先分析日志信息是什么格式,以及日志规则需要filter里面的什么模块处理或者组合使用处理??
官网地址
https://www.elastic.co/guide/en/logstash/7.12/filter-plugins.html
grok正则测试
https://grokdebug.herokuapp.com/
logstash的grok路径
[root@es-web1 ~]# ll /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns -rw-r--r-- 1 root root 5514 Apr 21 03:50 /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns
192.168.7.10 - - [24/May/2021:15:50:47 +0800] "GET /shijiange HTTP/1.1" 404 571 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
通过grok正则获取
%{IP:clientip} - - \[(?<requesttime>[^ ]+ \+\d+)\] "(?<requesttype>\w+) (?<requesturl>[^ ]+) HTTP/\d.\d" (?<status>\d+) (?<size>\d+) "[^"]+" "(?<ua>[^"]+)"
效果
Grok提供的常用Patterns说明及举例
大多数Linux使用人员都有过用正则表达式来查询机器中相关文件或文件里内容的经历,在Grok里,我们也是使用正则表达式来识别日志里的相关数据块。 有两种方式来使用正则表达式: 直接写正则来匹配 用Grok表达式映射正则来匹配 在我看来,每次重新写正则是一件很痛苦的事情,为什么不用表达式来一劳永逸呢? 特别提示:Grok表达式很像C语言里的宏定义 要学习Grok的默认表达式,我们就要找到它的具体配置路径,路径如下: # Windows下路径[你的logstash安装路径]\vendor\bundle\jruby\x.x\gems\logstash-patterns-core-x.x.x\patterns\grok-patterns 现在对常用的表达式进行说明:
USERNAME 或 USER 用户名,由数字、大小写及特殊字符(._-)组成的字符串 比如:1234、Bob、Alex.Wong等 EMAILLOCALPART 电子邮件用户名部分,首位由大小写字母组成,其他位由数字、大小写及特殊字符(_.+-=:)组成的字符串。注意,国内的QQ纯数字邮箱账号是无法匹配的,需要修改正则 比如:stone、Gary_Lu、abc-123等 EMAILADDRESS 电子邮件 比如:stone@abc.com、Gary_Lu@gmail.com、abc-123@163.com等 HTTPDUSER Apache服务器的用户,可以是EMAILADDRESS或USERNAME INT 整数,包括0和正负整数 比如:0、-123、43987等 BASE10NUM 或 NUMBER 十进制数字,包括整数和小数 比如:0、18、5.23等 BASE16NUM 十六进制数字,整数 比如:0x0045fa2d、-0x3F8709等 BASE16FLOAT 十六进制数字,整数和小数 WORD 字符串,包括数字和大小写字母 比如:String、3529345、ILoveYou等 NOTSPACE 不带任何空格的字符串 SPACE 空格字符串 QUOTEDSTRING 或 QS 带引号的字符串 比如:"This is an apple"、'What is your name?'等 UUID 标准UUID 比如:550E8400-E29B-11D4-A716-446655440000 MAC MAC地址,可以是Cisco设备里的MAC地址,也可以是通用或者Windows系统的MAC地址 IP IP地址,IPv4或IPv6地址 比如:127.0.0.1、FE80:0000:0000:0000:AAAA:0000:00C2:0002等 HOSTNAME 主机名称 IPORHOST IP或者主机名称 HOSTPORT 主机名(IP)+端口 比如:127.0.0.1:3306、api.stozen.NET:8000等 PATH 路径,Unix系统或者Windows系统里的路径格式 比如:/usr/local/nginx/sbin/nginx、c:\windows\system32\clr.exe等 URIPROTO URI协议 比如:http、ftp等 URIHOST URI主机 比如:www.stozen.Net、10.0.0.1:22等 URIPATH URI路径 比如://www.stozen.net/abc/、/api.PHP等 URIPARAM URI里的GET参数 比如:?a=1&b=2&c=3 URIPATHPARAM URI路径+GET参数 比如://www.stozen.net/abc/api.php?a=1&b=2&c=3 URI 完整的URI 比如:http://www.stozen.net/abc/api.php?a=1&b=2&c=3 日期时间表达式 MONTH 月份名称 比如:Jan、January等 MONTHNUM 月份数字 比如:03、9、12等 MONTHDAY 日期数字 比如:03、9、31等 DAY 星期几名称 比如:Mon、Monday等 YEAR 年份数字 HOUR 小时数字 MINUTE 分钟数字 SECOND 秒数字 TIME 时间 比如:00:01:23 DATE_US 美国日期格式 比如:10-15-1982、10/15/1982等 DATE_EU 欧洲日期格式 比如:15-10-1982、15/10/1982、15.10.1982等 ISO8601_TIMEZONE ISO8601时间格式 比如:+10:23、-1023等 TIMESTAMP_ISO8601 ISO8601时间戳格式 比如:2016-07-03T00:34:06+08:00 DATE 日期,美国日期%{DATE_US}或者欧洲日期%{DATE_EU} DATESTAMP 完整日期+时间 比如:07-03-2016 00:34:06 HTTPDATE http默认日期格式 比如:03/Jul/2016:00:36:53 +0800 Log表达式 LOGLEVEL 日志等级 比如:Alert、alert、ALERT、Error等 三、创建自己的Grok表达式 在业务领域中,可能会有越来越多的日志格式出现在我们眼前,而Grok的默认表达式显然已无法满足我们的需求(比如用户身份证号、手机号等信息),所以,我们需要自己动手添加些表达式。 表达式正则表达式说明DATE_CHS%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}中国人习惯的日期格式ZIPCODE_CHS[1-9]\d{5}国内邮政编码GAME_ACCOUNT[a-zA-Z][a-zA-Z0-9_]{4,15}游戏账号,首字符为字母,4-15位字母、数字、下划线组成 还有很多,需要您在业务中灵活运用!
USERNAME [a-zA-Z0-9_-]+ USER %{USERNAME} INT (?:[+-]?(?:[0-9]+)) BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))) NUMBER (?:%{BASE10NUM}) BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+)) BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b POSINT \b(?:[1-9][0-9]*)\b NONNEGINT \b(?:[0-9]+)\b WORD \b\w+\b NOTSPACE \S+ SPACE \s* DATA .*? GREEDYDATA .* #QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"])*"|(?:'(?:\\.|[^\\'])*')|(?:`(?:\\.|[^\\`])*`))) QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"]+)*"|(?:'(?:\\.|[^\\']+)*')|(?:`(?:\\.|[^\\`]+)*`))) UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12} # Networking MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}) CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4}) WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}) COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}) IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9]) HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) HOST %{HOSTNAME} IPORHOST (?:%{HOSTNAME}|%{IP}) HOSTPORT (?:%{IPORHOST=~/\./}:%{POSINT}) # paths PATH (?:%{UNIXPATH}|%{WINPATH}) UNIXPATH (?:/(?:[\w_%!$@:.,-]+|\\.)*)+ NUXTTY (?:/dev/pts/%{NONNEGINT}) BSDTTY (?:/dev/tty[pq][a-z0-9]) TTY (?:%{BSDTTY}|%{LINUXTTY}) WINPATH (?:[A-Za-z]+:|\\)(?:\\[^\\?*]*)+ URIPROTO [A-Za-z]+(\+[A-Za-z+]+)? URIHOST %{IPORHOST}(?::%{POSINT:port})? # uripath comes loosely from RFC1738, but mostly from what Firefox # doesn't turn into %XX URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=#%_-]*)+ #URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)? URIPARAM \?[A-Za-z0-9$.+!*'|(){},~#%&/=:;_-]* URIPATHPARAM %{URIPATH}(?:%{URIPARAM})? URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})? # Months: January, Feb, 3, 03, 12, December MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b MONTHNUM (?:0?[1-9]|1[0-2]) MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]) # Days: Monday, Tue, Thu, etc... DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?) # Years? YEAR [0-9]+ # Time: HH:MM:SS #TIME \d{2}:\d{2}(?::\d{2}(?:\.\d+)?)? # I'm still on the fence about using grok to perform the time match, # since it's probably slower. # TIME %{POSINT<24}:%{POSINT<60}(?::%{POSINT<60}(?:\.%{POSINT})?)? HOUR (?:2[0123]|[01][0-9]) MINUTE (?:[0-5][0-9]) # '60' is a leap second in most time standards and thus is valid. SECOND (?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?) TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]) # datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it) DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR} DATE_EU %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY} ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE})) ISO8601_SECOND (?:%{SECOND}|60) TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? DATE %{DATE_US}|%{DATE_EU} DATESTAMP %{DATE}[- ]%{TIME} TZ (?:[PMCE][SD]T) DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ} DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR} # Syslog Dates: Month Day HH:MM:SS SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME} PROG (?:[\w._/%-]+) SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])? SYSLOGHOST %{IPORHOST} SYSLOGFACILITY <%{POSINT:facility}.%{POSINT:priority}> HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT:ZONE} # Shortcuts QS %{QUOTEDSTRING} # Log formats SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: COMBINEDAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" %{QS:agent} # Log Levels LOGLEVEL ([D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL)/*#UNIXPATH (?<![\w*/
{"@timestamp":"2021-08-28T21:17:31+08:00","host":"172.31.2.107","clientip":"172.31.0.1","size":0,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"172.31.2.107","url":"/web/index.html","domain":"172.31.2.107","xff":"-","referer":"-","status":"304"}
通过json模块处理
input { redis { data_type => "list" key => "qq-m44-nginx-log" host => "172.31.2.106" port => "6379" db => "3" password => "123456" codec => json } } # 过滤器 filter { json { source => "message" remove_field => ["message","@version","path","beat","input","log","offset","prospector","source","tags"] } date { match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"] target => "@timestamp" } } output { if [fields][app] == "nginx-errorlog" { elasticsearch { hosts => ["172.31.2.101:9200"] index => "qq-123test-filebeat-nginx-errorlog-%{+YYYY.MM.dd}" }} if [fields][app] == "nginx-accesslog" { elasticsearch { hosts => ["172.31.2.101:9200"] index => "qq-123test-filebeat-nginx-accesslog-%{+YYYY.MM.dd}" }} }
访问nginx,终端输出效果
{ "agent" => { "name" => "es-web1.example.local", "type" => "filebeat", "ephemeral_id" => "2a8806fd-48de-46e0-bdde-502aa74b4c83", "version" => "7.12.1", "hostname" => "es-web1.example.local", "id" => "51f9df27-4170-4844-ba12-c719de1f4410" }, "domain" => "172.31.2.107", "status" => "304", "upstreamtime" => "-", "size" => 0, "xff" => "-", "ecs" => { "version" => "1.8.0" }, "@timestamp" => 2021-08-29T05:31:29.000Z, "clientip" => "172.31.0.1", "referer" => "-", "responsetime" => 0.0, "upstreamhost" => "-", "http_host" => "172.31.2.107", "url" => "/web/index.html", "host" => "172.31.2.107", "fields" => { "group" => "n125", "app" => "nginx-accesslog" } }