HF version is 6.2.2 and running on RHEL 6 x86_64 - 8 cores and 16 GB of memory. It is a VM.
We have a heavy forwarder that is consuming standard syslog from a file fed by rsyslog using a standard input. Nothing special about the data. Using the DMC, I see that the syslog indexing rate is steady around 70-90 kb per sec and everything is fine. Occasionally the indexing rate jumps to double and triple the normal rate and the HF starts to not be able to consume the log in a timely and falls behind arond 170 kb per sec. I can see that the following queues medium fill rate gets very high.
Parsing queue
Agg Queue
Typing Queue - the Typing queue fill rate is pretty high all the time but gets much worse during these periods.
I see this in the splunkd.log
INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
Is this the result of the parsing queue being full?
The Estimated CPU usage per processor is normally 50% and it doubles during high indexer volume. The main consumer of the cpu is regexreplacement.
This is an assumption, but my thinking is that the regexreplacement is from a regex in my props. I did some research and removed all my regex in the props except for what is contained in the Cisco IOS TA. After the changes, the queue fill rate for all three queues are not as bad as it was but it is still high and we still see slow log consumption during higher volume times. The regexreplacement process jumps to 75% 85% cpu.
The only change to the cisco ios TA is an added stanza to route IOS data to an index
props.conf
#
# Force the sourcetype
#
[syslog]
TRANSFORMS-force_sourcetype_for_cisco_ios = force_sourcetype_for_cisco_ios, force_sourcetype_for_cisco_ios-xr, force_sourcetype_for_cisco_ios-xe, route_to_cisco_ios_index
transform.conf
[route_to_cisco_ios_index]
SOURCE_KEY = MetaData:Sourcetype
REGEX = cisco:ios
DEST_KEY = _MetaData:Index
FORMAT = test_net_device
My input to consume the file
[monitor:///opt/test/all_logs]
blacklist = \.gz$
disabled = false
followTail = 0
sourcetype = syslog
index = test_syslog
ignoreOlderThan = 6h
The log rotates at 1 gb
limits.conf
[thruput]
# throughput limiting at index time
maxKBps = 2048
Is this expected behavior? Can a HF only handle so much log volume at a time?
Thanks in advance for any thoughts!
↧