Difference between revisions of "Logstash grok expressions"

(Created page with "Category:Linux caption|Logstash grok expressions Grok Grok is a language based on regular expressions. This is the heart of ''Logstash''. Thanks...")
 
Line 1: Line 1:
 
[[Category:Linux]]
 
[[Category:Linux]]
  
[[File:Grok icon.png|caption|Logstash grok expressions]] Grok
+
[[File:Grok icon.png|none|Logstash grok expressions]]
  
 
Grok is a language based on regular expressions. This is the heart of ''Logstash''.  
 
Grok is a language based on regular expressions. This is the heart of ''Logstash''.  

Revision as of 16:21, 5 February 2015


Logstash grok expressions

Grok is a language based on regular expressions. This is the heart of Logstash.

Thanks to Grok each log event can be analyzed and split into fields.


Tooling

You can create your own grok patterns and test them with the following on-line processor:

http://grokdebug.herokuapp.com/


Setup

Grok is installed with Logstash. So you don't have to install anything. :)


Put all your configuration files in /etc/logstash/grok/*.grok



Grok expressions

Here are some GROK expressions you can use right away!


Apache2 error log

Create configuration file:

vim /etc/logstash/grok/apache2ErrorLog.grok


Put the following content:

HTTPERRORDATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}
APACHEERRORLOG \[%{HTTPERRORDATE:timestamp}\] \[%{WORD:severity}\] \[client %{IPORHOST:clientip}\] %{GREEDYDATA:message_remainder}


IpTables

Create configuration file:

vim /etc/logstash/grok/iptables.grok


Put the following content:

NETFILTERMAC %{COMMONMAC:dst_mac}:%{COMMONMAC:src_mac}:%{ETHTYPE:ethtype}
ETHTYPE (?:(?:[A-Fa-f0-9]{2}):(?:[A-Fa-f0-9]{2}))

# Iptables generic values
IPTABLES_MAC_LAYER IN=(%{WORD:in_device})? OUT=(%{WORD:out_device})? *(MAC=(%{NETFILTERMAC})?)?
IPTABLES_SRC_DEST SRC=(%{IP:src_ip})? DST=(%{IP:dst_ip})?
IPTABLES_FLAGS LEN=%{INT:pkt_length} *(TOS=%{BASE16NUM:pkt_tos})? *(PREC=%{BASE16NUM:pkt_prec})? *(TTL=%{INT:pkt_ttl})? *(ID=%{INT:pkt_id})? (?:DF)*
IPTABLES_PROTOCOL PROTO=%{WORD:protocol}
IPTABLES_HEADER %{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME} .* %{IPTABLES_MAC_LAYER} %{IPTABLES_SRC_DEST} %{IPTABLES_FLAGS} %{IPTABLES_PROTOCOL}

# IPv6 + v4
IPTABLES_IP_SUFFIX SPT=%{INT:src_port} DPT=%{INT:dst_port} *(WINDOW=%{INT:pkt_window})? *(RES=%{BASE16NUM:pkt_res})? .* *(URGP=%{INT:pkt_urgp})?
IPTABLES_IP %{IPTABLES_HEADER} %{IPTABLES_IP_SUFFIX}

# ICMP
IPTABLES_ICMP %{IPTABLES_HEADER} *(TYPE=%{INT:icmp_type})? *(CODE=%{BASE16NUM:icmp_code})?

# Generic pattern
IPTABLES_GENERIC %{IPTABLES_HEADER} (?<content>(.|\r|\n)*)

# Error pattern
IPTABLES_ERROR %{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME} .* %{IPTABLES_MAC_LAYER} %{IPTABLES_SRC_DEST} (?<content>(.|\r|\n)*)


Fail2ban

Create configuration file:

vim /etc/logstash/grok/fail2ban.grok


Put the following content:

FAIL2BAN %{TIMESTAMP_ISO8601:timestamp} %{JAVACLASS:criteria}: %{LOGLEVEL:level} \[%{WORD:service}\] Ban %{IPV4:clientip}


Log4j

We use some common log4j patterns, it's easy to extract the overall log message:

###### %d %5p %t %c - %m%n 

LOG4J ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - *(%{GREEDYDATA:content})

# Some logs might start with spaces :'S ...
LOG4J_COMMON_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)

# Nominal cases
LOG4J_COMMON_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)

# When log is split on many lines right away
LOG4J_COMMON_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} (?<content>(.|\r|\n)*)


###### %d %5p %c{1} - %m%n 

# Some logs might start with spaces :'S ...
LOG4J_ALT_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)

# Nominal cases
LOG4J_ALT_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)

# When log is split on many lines right away
LOG4J_ALT_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} (?<content>(.|\r|\n)*)


###### %d %5p %t %c{1} - %m%n 

# Some logs might start with spaces :'S ...
LOG4J_ALT_2_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)

# Nominal cases
LOG4J_ALT_2_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)

# When log is split on many lines right away
LOG4J_ALT_2_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} (?<content>(.|\r|\n)*)


Super strong expression

To match multiple cases at once:

  •  %d %5p %t %c - %m%n
  •  %d %5p %t %c{1} - %m%n
  •  %d %5p %c - %m%n
  •  %d %5p %c{1} - %m%n
^\s*%{TIMESTAMP_ISO8601:timestamp}\s*%{LOGLEVEL:level} (?:(%{USERNAME:thread} %{JAVACLASS:logger}|%{USERNAME:thread} {WORD:logger}|%{JAVACLASS:logger}|%{WORD:logger})) (?<content>(.|\r|\n)*)


VEHCO specific patterns

Having a generic "content" is not enough!! You need to extract information from it.

Here are some examples:

Logs

2014-11-21 12:00:47,922 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.business.AuthClient \ 
   - Replying to OBC auth data DONE. Smart-card --> OBC   |   smartcardId 02951DA314000000
2014-11-21 12:38:26,981 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.dao.ampq.JmsTopicListener \
   -  [x] Received message 'startAuthentication' for smart-card: 02667AA314000000, consumer smartcardId: 02667AA314000000
2014-11-21 12:38:27,033 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.cardreaderlisthandler.cardreader.ReaderLocker \
   - Terminal: OMNIKEY AG CardMan 3121 02 00 | Smart-card ID: 02667AA314000000 # locked
2014-11-21 12:38:30,920 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.cardreaderlisthandler.cardreader.ReaderLocker \
   - Terminal: OMNIKEY AG CardMan 3121 02 00 | Smart-card ID: 02667AA314000000 # unlocked


Grok patterns

LOG_SENTENCE (?:[A-Za-z0-9\s\-><\\/.+*\[\]&%'#]+)*
RTD_TERMINAL_SUFFIX Terminal: %{LOG_SENTENCE:rtd_terminal_id} .* *(Smart-card ID: %{WORD:rtd_smartcard_id}) # %{WORD:rtd_terminal_state}
RTD_AUTH_START_SUFFIX %{LOG_SENTENCE:rtd_action}: %{WORD:rtd_smartcard_id}
RTD_AUTH_DONE_SUFFIX %{LOG_SENTENCE:rtd_action}. *(smartcardId %{WORD:rtd_smartcard_id})?


RTD_TERMINAL ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_TERMINAL_SUFFIX}
RTD_AUTH_START ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_AUTH_START_SUFFIX}
RTD_AUTH_DONE ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_AUTH_DONE_SUFFIX}