Logstash grok expressions
Grok is a language based on regular expressions. This is the heart of Logstash.
Thanks to Grok each log event can be analyzed and split into fields.
Contents
Tooling
You can create your own grok patterns and test them with the following on-line processor:
http://grokdebug.herokuapp.com/
Grok setup
Grok is installed with Logstash. So you don't have to install anything. :)
Put all your configuration files in /etc/logstash/grok/*.grok
Grok usage
You can use any Grok expression in a Logstash configuration file.
In the Logstash match criteria do:
# Match a single expression
match => [ "message", "%{LOG4J}" ]
# Try to apply many pattern to an expression (until a success is found)
match => [
"message", "%{LOG4J_COMMON_PATTERN_V1}",
"message", "%{LOG4J_COMMON_PATTERN_V2}",
"message", "%{LOG4J_COMMON_PATTERN_V3}",
"message", "%{LOG4J_COMMON_PATTERN_V4}",
"message", "%{LOG4J_COMMON_PATTERN_V5}",
"message", "%{LOG4J}"
]
Just use %{Grok_rule}
Grok expressions
Here are some GROK expressions you can use right away!
Apache2 error log
Create configuration file:
vim /etc/logstash/grok/apache2ErrorLog.grok
Put the following content:
HTTPERRORDATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}
APACHEERRORLOG \[%{HTTPERRORDATE:timestamp}\] \[%{WORD:severity}\] \[client %{IPORHOST:clientip}\] %{GREEDYDATA:message_remainder}
IpTables
Create configuration file:
vim /etc/logstash/grok/iptables.grok
Put the following content:
NETFILTERMAC %{COMMONMAC:dst_mac}:%{COMMONMAC:src_mac}:%{ETHTYPE:ethtype}
ETHTYPE (?:(?:[A-Fa-f0-9]{2}):(?:[A-Fa-f0-9]{2}))
# Iptables generic values
IPTABLES_MAC_LAYER IN=(%{WORD:in_device})? OUT=(%{WORD:out_device})? *(MAC=(%{NETFILTERMAC})?)?
IPTABLES_SRC_DEST SRC=(%{IP:src_ip})? DST=(%{IP:dst_ip})?
IPTABLES_FLAGS LEN=%{INT:pkt_length} *(TOS=%{BASE16NUM:pkt_tos})? *(PREC=%{BASE16NUM:pkt_prec})? *(TTL=%{INT:pkt_ttl})? *(ID=%{INT:pkt_id})? (?:DF)*
IPTABLES_PROTOCOL PROTO=%{WORD:protocol}
IPTABLES_HEADER %{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME} .* %{IPTABLES_MAC_LAYER} %{IPTABLES_SRC_DEST} %{IPTABLES_FLAGS} %{IPTABLES_PROTOCOL}
# IPv6 + v4
IPTABLES_IP_SUFFIX SPT=%{INT:src_port} DPT=%{INT:dst_port} *(WINDOW=%{INT:pkt_window})? *(RES=%{BASE16NUM:pkt_res})? .* *(URGP=%{INT:pkt_urgp})?
IPTABLES_IP %{IPTABLES_HEADER} %{IPTABLES_IP_SUFFIX}
# ICMP
IPTABLES_ICMP %{IPTABLES_HEADER} *(TYPE=%{INT:icmp_type})? *(CODE=%{BASE16NUM:icmp_code})?
# Generic pattern
IPTABLES_GENERIC %{IPTABLES_HEADER} (?<content>(.|\r|\n)*)
# Error pattern
IPTABLES_ERROR %{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME} .* %{IPTABLES_MAC_LAYER} %{IPTABLES_SRC_DEST} (?<content>(.|\r|\n)*)
Fail2ban
Create configuration file:
vim /etc/logstash/grok/fail2ban.grok
Put the following content:
FAIL2BAN %{TIMESTAMP_ISO8601:timestamp} %{JAVACLASS:criteria}: %{LOGLEVEL:level} \[%{WORD:service}\] Ban %{IPV4:clientip}
Log4j
We use some common log4j patterns, it's easy to extract the overall log message:
###### %d %5p %t %c - %m%n
LOG4J ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - *(%{GREEDYDATA:content})
# Some logs might start with spaces :'S ...
LOG4J_COMMON_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)
# Nominal cases
LOG4J_COMMON_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)
# When log is split on many lines right away
LOG4J_COMMON_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} (?<content>(.|\r|\n)*)
###### %d %5p %c{1} - %m%n
# Some logs might start with spaces :'S ...
LOG4J_ALT_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)
# Nominal cases
LOG4J_ALT_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)
# When log is split on many lines right away
LOG4J_ALT_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} (?<content>(.|\r|\n)*)
###### %d %5p %t %c{1} - %m%n
# Some logs might start with spaces :'S ...
LOG4J_ALT_2_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)
# Nominal cases
LOG4J_ALT_2_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)
# When log is split on many lines right away
LOG4J_ALT_2_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} (?<content>(.|\r|\n)*)
Super strong expression
To match multiple cases at once:
- %d %5p %t %c - %m%n
- %d %5p %t %c{1} - %m%n
- %d %5p %c - %m%n
- %d %5p %c{1} - %m%n
^\s*%{TIMESTAMP_ISO8601:timestamp}\s*%{LOGLEVEL:level} (?:(%{USERNAME:thread} %{JAVACLASS:logger}|%{USERNAME:thread} {WORD:logger}|%{JAVACLASS:logger}|%{WORD:logger})) (?<content>(.|\r|\n)*)
VEHCO specific patterns
My company, VEHCO, like all companies has some specific logs. The following example explains how to use Grok.
Logs
2014-11-21 12:00:47,922 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.business.AuthClient \
- Replying to OBC auth data DONE. Smart-card --> OBC | smartcardId 02951DA314000000
2014-11-21 12:38:26,981 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.dao.ampq.JmsTopicListener \
- [x] Received message 'startAuthentication' for smart-card: 02667AA314000000, consumer smartcardId: 02667AA314000000
2014-11-21 12:38:27,033 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.cardreaderlisthandler.cardreader.ReaderLocker \
- Terminal: OMNIKEY AG CardMan 3121 02 00 | Smart-card ID: 02667AA314000000 # locked
2014-11-21 12:38:30,920 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.cardreaderlisthandler.cardreader.ReaderLocker \
- Terminal: OMNIKEY AG CardMan 3121 02 00 | Smart-card ID: 02667AA314000000 # unlocked
Grok patterns
LOG_SENTENCE (?:[A-Za-z0-9\s\-><\\/.+*\[\]&%'#]+)*
RTD_TERMINAL_SUFFIX Terminal: %{LOG_SENTENCE:rtd_terminal_id} .* *(Smart-card ID: %{WORD:rtd_smartcard_id}) # %{WORD:rtd_terminal_state}
RTD_AUTH_START_SUFFIX %{LOG_SENTENCE:rtd_action}: %{WORD:rtd_smartcard_id}
RTD_AUTH_DONE_SUFFIX %{LOG_SENTENCE:rtd_action}. *(smartcardId %{WORD:rtd_smartcard_id})?
RTD_TERMINAL ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_TERMINAL_SUFFIX}
RTD_AUTH_START ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_AUTH_START_SUFFIX}
RTD_AUTH_DONE ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_AUTH_DONE_SUFFIX}
Just put all these patterns inside a dedicated file: /etc/logstash/grok/vehco_rtd.grok