Troubleshooting SOC-Syslog Hardware
Context: What is SOC-Syslog?
SOC-Syslog is a dedicated server in the SOC Network serving as the central collection point for logs from all of our Linux-based servers. We use RSyslog, an open-source program for UNIX and Unix-like computer systems, to forward log messages in our IP network. After logs are ingested to SOC-Syslog, an application called the Splunk Universal Forwarder forwards all these logs to our Splunk Server for indexing.
Problem Description
The Security Operations Center was completing a routine search on logs being ingested to our Splunk Server. We noticed a decrease in the number of daily ingested logs. The SOC discovered that on September 17, 2025, the Splunk Universal Forwarder stopped forwarding logs from SOC-Syslog to the Splunk Server.
We SSH’ed into the SOC-Syslog server to check if logs were being sent from the nodes to SOC-Syslog. We confirmed the logs stopped being forwarded on September 17, 2025. By the time we realized there were ingestion issues on SOC-Syslog, it had been days since SOC-Syslog was non-operational. The server was online but essentially in a read-only state.

Initial Remediation Attempts
In an attempt to remediate the issue, the SOC decided to try three things in the following order:
Restart RSyslog Service
systemctl restart rsyslog
Attempted to restart the service in case RSyslog was the issue.
Restart Splunk Universal Forwarder
cd /opt/splunkforwarder/bin && ./splunk restart
Attempted to restart the forwarding process from SOC-Syslog to the Splunk Server.
Reboot the Server
reboot
Attempted to restart the entire SOC-Syslog server.
Boot Loop Issue
After running the reboot command, the SOC realized the server did not restart successfully. Due to the location difference between the SOC and Student Data Center (SDC), it took about 20 minutes until the SOC realized that SOC-Syslog was stuck in a boot loop.
Entering Emergency Shell
In an attempt to break out of this boot loop, the SOC researched how to enter the Red Hat Enterprise Linux (RHEL) 9.5 Emergency Shell.
During the initial boot sequence, the SOC edited the GRUB Loader with the following modifications:
# Removed parameters:
quiet rhgb
# Added parameters:
systemd.unit=multi-user.target
This process allowed us to see all the system logs that were occurring on system startup, showcasing many services that failed to start.

After entering the emergency shell, we were able to boot into the operating system and run commands.
Root Cause Analysis
One of the most concerning findings came from running:
df -h
This command displayed all the filesystems on the operating system, and concerns immediately arose. The directory for the operating system (RHEL 9.5) was installed under /dev/mapper/rhel-root with 70GB allocated space, but only 20KB was available.

Initially, we assumed this was reserved system space and didn’t understand this was the actual OS partition that was full.
Identifying the Culprit
Upon realization that this was the culprit, we scanned the directory to find the largest files on the operating system.
We discovered that:
The
/var/logdirectory contained more than half of the allotted storage of the entire OS partition (40+ GB)Specifically, the file
/var/log/messageswas taking up all the space on the operating system partition, as it contained all the logs ingested from all our Linux-based nodes in a single file
Interestingly, we noticed that a dedicated directory called /mnt/storage was not being used at all.
Resolution
To get SOC-Syslog back online, our solution was to move the problematic log file:
mv /var/log/messages /mnt/storage
This command immediately relieved our storage issue on the rhel-root directory.
Results
After moving the file:
We restarted the server without the GRUB loader modification
Services started successfully once again
Backlogged logs began being ingested to SOC-Syslog
Logs were successfully forwarded to the Splunk Server
Lessons Learned
Monitor disk space on critical logging infrastructure
Utilize dedicated storage partitions (like
/mnt/storage) for log collectionIn a future write-up we will showcase how we revamped our logging infrastructure entirely!