RHEL Advanced #2: Kernel Tuning — sysctl, tuned, kdump
In #1 Boot Process we saw how the kernel gets loaded into memory. This post picks up from there. Once boot finishes, we tune the kernel parameters to fit the workload, swap workload profiles in a single line, and — for the rare occasion when the kernel dies in a panic — capture a memory dump at that exact moment for post-mortem analysis.
Position of this post in the RHEL Advanced series:
- #1 Boot Process — GRUB2, dracut, Recovery Mode
- #2 Kernel Tuning — sysctl, tuned, kdump ← this post
- #3 Performance Analysis — sar, top/htop, iostat, vmstat, perf
- #4 SELinux Advanced — Writing Policy, audit2allow
- #5 Security Hardening — auditd, OpenSCAP, FIPS
- #6 Subscription / Satellite / Insights
- #7 Cockpit for GUI Management and Web Console
Where Each of the Three Tools Sits #
| Tool | What it adjusts | When applied |
|---|---|---|
sysctl | Kernel parameters (vm, net, kernel, fs) | Immediately at runtime + on every boot |
tuned | Predefined workload profiles (sysctl + cpufreq + io-scheduler bundles) | Immediately when a profile is applied |
kdump | Capture a memory dump at the moment of kernel panic | At the moment of panic |
sysctl is the per-line tool you touch directly, tuned is the abstraction that bundles those lines into workload-sized profiles and applies them at once, and kdump is the safety net that only kicks in when the kernel dies. Together they form a coherent operational set.
sysctl — Runtime Kernel Parameters #
The Linux kernel exposes its parameters as files under /proc/sys/. sysctl is the command that reads and writes those files.
# list every parameter
$ sudo sysctl -a | less
# read a specific parameter
$ sudo sysctl vm.swappiness
vm.swappiness = 30
# change immediately (lost on reboot)
$ sudo sysctl -w vm.swappiness=10A dotted key like vm.swappiness is just notation for /proc/sys/vm/swappiness.
$ sudo sysctl vm.swappiness
$ cat /proc/sys/vm/swappiness # same valuePermanent settings — /etc/sysctl.d/ #
To apply on every boot, write the values to a file. The standard on RHEL 9 is to drop modular files into /etc/sysctl.d/*.conf.
# memory / swap
vm.swappiness = 10
vm.dirty_ratio = 20
vm.dirty_background_ratio = 5
# network
net.core.somaxconn = 4096
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
# file descriptors
fs.file-max = 2097152# only this file
$ sudo sysctl -p /etc/sysctl.d/99-tune.conf
# every standard location (the action that runs automatically at boot)
$ sudo sysctl --systemThe single file /etc/sysctl.conf still exists for backward compatibility, but the operational standard is the split files under /etc/sysctl.d/. It is easier to trace where a change came from by filename, and it pairs well with Ansible or packages that drop bundled settings as a single file.
Precedence and the filename convention #
sysctl --system reads these directories in lexicographic order.
/etc/sysctl.d/*.conf
/run/sysctl.d/*.conf
/usr/lib/sysctl.d/*.conf
/etc/sysctl.confWithin a directory, files are read in alphabetical order, so operational settings usually carry a 99- prefix to be applied last (e.g., 99-tune.conf).
Frequently touched keys #
A bundle of keys you reach for often in operations.
| Key | Meaning | Recommended |
|---|---|---|
vm.swappiness | Page cache vs swap preference (0=avoid swap, 100=swap aggressively) | 10 for servers, 1 ~ 5 for DBs |
vm.dirty_ratio | Dirty page limit (%) — synchronous flush past this | 20 |
vm.overcommit_memory | Memory overcommit policy | 1 for DB / Redis |
net.core.somaxconn | Max listen() queue size | 4096+ |
net.ipv4.tcp_max_syn_backlog | SYN queue | 4096+ |
net.ipv4.ip_local_port_range | Ephemeral port range | 1024 65535 |
fs.file-max | System-wide file descriptor limit | 2 million+ |
kernel.pid_max | Maximum PID | 4194304 with many containers |
DB workloads, web server workloads, and container hosts each call for different settings. Pin one standard file for each role, drop it under /etc/sysctl.d/, and keep consistency by copying the same file to every new machine.
Cases where a change does not stick #
- Read-only parameters — keys like
kernel.osreleasecannot change after boot. Trying givespermission denied. - Boot-time-only parameters — some
vm.*keys are runtime-mutable, but values likekernel.numa_balancingonly behave as intended when given as a boot parameter. - Inside a container — parts of
/proc/sys/are isolated by container namespaces. You must change them on the host for the change to take effect.
tuned — Workload Profiles #
tuned is the daemon that bundles many tuning items — sysctl values + CPU governor + I/O scheduler + disk readahead and so on — into named profiles and applies them at once. RHEL 9 ships it pre-installed and starts it automatically at boot.
$ sudo systemctl status tuned
$ sudo systemctl enable --now tunedListing and applying profiles #
# list available profiles
$ sudo tuned-adm list
Available profiles:
- accelerator-performance
- balanced - General non-specialized tuned profile
- desktop
- hpc-compute
- latency-performance - Optimize for low latency at the cost of throughput
- network-latency
- network-throughput
- powersave
- throughput-performance - Broadly applicable tuning that provides excellent...
- virtual-guest - Optimize for running inside a virtual guest
- virtual-host - Optimize for running KVM guests
Current active profile: throughput-performance
# current profile
$ sudo tuned-adm active
# change profile
$ sudo tuned-adm profile virtual-guest
# recommended profile (RHEL inspects the machine and suggests one)
$ sudo tuned-adm recommend
virtual-guesttuned-adm recommend automatically distinguishes bare metal, virtual machines, and laptops, and proposes a profile accordingly. Boot RHEL on a virtualized instance like EC2, and it suggests virtual-guest.
Frequently used profiles #
| Profile | Where it fits |
|---|---|
throughput-performance | General server default. CPU governor performance, relaxed dirty ratio |
latency-performance | Response time first — trading systems, real-time processing |
network-latency | latency-performance + network queue tuning |
network-throughput | High-throughput networks (10G+ NICs) |
virtual-guest | KVM/AWS/GCP guest default |
virtual-host | KVM hypervisor host |
powersave | Power saving (laptops, etc.) |
accelerator-performance | GPU / accelerator workloads |
For DB machines start from throughput-performance or latency-performance; for cloud guests, virtual-guest.
What is inside a profile #
$ ls /usr/lib/tuned/throughput-performance/
tuned.conf
$ cat /usr/lib/tuned/throughput-performance/tuned.conf
[main]
summary=...
include=latency-performance
[cpu]
force_latency=cstate.id:3|3
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[disk]
readahead=>4096
[sysctl]
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10
vm.swappiness=10
net.core.busy_read=50
net.core.busy_poll=50
net.ipv4.tcp_fastopen=3Look at the [sysctl] section and you can see it is, in the end, just a bundle of sysctl keys. While the profile is active those keys hold the profile’s values; switching to a different profile reapplies the new values immediately.
Custom profiles #
It is common to inherit an existing profile and override only what you need.
$ sudo mkdir -p /etc/tuned/myapp-throughput
$ sudo vi /etc/tuned/myapp-throughput/tuned.conf[main]
summary=Custom throughput profile for myapp
include=throughput-performance
[sysctl]
net.core.somaxconn = 16384
net.ipv4.tcp_max_syn_backlog = 16384
vm.swappiness = 1
[vm]
transparent_hugepages=never$ sudo tuned-adm profile myapp-throughput
$ sudo tuned-adm active
Current active profile: myapp-throughput/etc/tuned/ is the user-defined area; /usr/lib/tuned/ is where the package ships its defaults. If a user profile and a system profile share the same name, the user profile wins.
Relationship between tuned and sysctl.d #
Values applied by tuned and values written under /etc/sysctl.d/ can collide. The precedence is simple — last write wins. At boot, tuned usually runs before sysctl --system, so /etc/sysctl.d/ ends up taking effect. But running tuned-adm profile X again at runtime overwrites those values at that moment.
Operational guidance:
- System-wide policy —
/etc/sysctl.d/ - Workload bundles —
tunedprofiles - If both touch the same key, unify on one side. Usually it is cleaner to move the key into the tuned profile and remove it from sysctl.d.
kdump — Memory Dump at Kernel Panic #
If the kernel dies in a panic and you have a dump (vmcore) of memory captured at that moment, you can analyze it post-mortem with tools like crash after rebooting. kdump handles that capture.
How it works #
The core idea of kdump is to keep two kernels in memory.
- At boot time, in addition to the normal kernel, a crash kernel is preloaded into memory (via the kexec mechanism).
- When the normal kernel panics, control jumps without a hardware reset to that preloaded crash kernel.
- The crash kernel writes the normal kernel’s memory region to disk as a vmcore file.
- Then a normal reboot.
Memory for the crash kernel is reserved at boot time, so a slice of RAM (typically 256 MB to a few GB) is unavailable during normal operation. It is not free, but it is nearly mandatory on production machines where panic analysis matters.
Enablement #
RHEL 9 usually ships with it enabled. Verify:
$ sudo systemctl status kdump
$ sudo kdumpctl status
# memory load status
$ sudo cat /sys/kernel/kexec_crash_loaded
1 # 1 means loadedIf disabled:
$ sudo dnf install -y kexec-tools
$ sudo systemctl enable --now kdumpThe crashkernel parameter #
The memory to reserve is specified via a GRUB kernel argument. RHEL 9 usually sets crashkernel=auto or an explicit value automatically, but some workloads require adjusting it.
$ cat /proc/cmdline
... crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M ...
# change to a specific value
$ sudo grubby --update-kernel=ALL --args="crashkernel=512M"
$ sudo rebootcrashkernel=512M is unconditionally 512MB. crashkernel=1G-4G:192M,4G-64G:256M,... applies different values depending on RAM size. Leaving the RHEL 9 default is usually safe.
Where vmcore is stored #
Configure where to store vmcore in /etc/kdump.conf.
# default: local disk
path /var/crash
core_collector makedumpfile -l --message-level 7 -d 31
# send to NFS
# nfs nfs.example.com:/srv/crash
# send via SSH
# ssh user@dump-server.example.com
# sshkey /root/.ssh/kdump_id_rsa
# if disk write fails, just reboot
# default reboot| Key | Meaning |
|---|---|
path | vmcore storage path |
core_collector makedumpfile -d 31 | Trim empty pages / page cache from the dump to shrink it (-d 31 recommended) |
nfs / ssh | Remote storage. Hedges against a broken local disk |
default | Action on dump failure (reboot, halt, poweroff, shell, dump_to_rootfs) |
After changing settings:
$ sudo kdumpctl rebuild
$ sudo systemctl restart kdumpTest — trigger a panic on purpose #
Never run this on a production machine — only on an isolated test machine:
$ sudo sysctl -w kernel.sysrq=1
$ echo c | sudo tee /proc/sysrq-triggerThe machine panics immediately and a vmcore drops at /var/crash/<date>/vmcore. Verify after reboot.
$ ls /var/crash/
127.0.0.1-2026-04-27-10:30:00/
$ ls /var/crash/127.0.0.1-2026-04-27-10:30:00/
vmcore vmcore-dmesg.txtvmcore-dmesg.txt alone often contains the dmesg output from just before the panic and is enough for a first-pass diagnosis.
Analyze with crash #
$ sudo dnf install -y crash
$ sudo dnf install -y kernel-debuginfo-$(uname -r) --enablerepo=rhel-9-for-x86_64-baseos-debug-rpms
$ sudo crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux \
/var/crash/127.0.0.1-2026-04-27-10:30:00/vmcore
crash> bt # stack trace at the moment of panic
crash> log # kernel log
crash> ps # process list
crash> mod # loaded modules
crash> sys # system infoA single bt (backtrace) shows which function the panic occurred in. Deep analysis is its own topic, but simply having a vmcore available is itself an operational safety net.
Common Pitfalls #
- Using only
sysctl -wand not writing to a file — the change is lost at reboot. Permanent settings must go under/etc/sysctl.d/. - Cramming everything into the single
/etc/sysctl.conf— you lose the ability to trace where a change came from. Split per topic into99-app.conf,99-network.conf, etc. tunedandsysctl.dcolliding on the same key — last applied wins. Unify on one side.- Insufficient kdump disk space — vmcore can be several GB. Keep room on the filesystem holding
/var/crash. - Forgetting
kdumpctl rebuildafter editingkdump.conf— changes do not take effect. Always rebuild → restart. - Removing the
crashkernel=argument — someone tidying up GRUB args drops it and kdump stops working from the next boot onward. Periodically verify withcat /proc/cmdline. - Setting
vm.overcommit_memory=1carelessly on a container host — it changes OOM patterns for some workloads. Validate per workload.
Commands Worth Remembering #
| Task | Command |
|---|---|
| Read sysctl value / change temporarily | sysctl <key> / sysctl -w <key>=<v> |
| Apply sysctl.d | sudo sysctl --system |
| Apply a single file | sudo sysctl -p /etc/sysctl.d/99-tune.conf |
| tuned active profile | sudo tuned-adm active |
| Switch tuned profile | sudo tuned-adm profile <name> |
| tuned recommendation | sudo tuned-adm recommend |
| kdump status | sudo kdumpctl status |
| Crash kernel loaded? | cat /sys/kernel/kexec_crash_loaded |
| Rebuild kdump | sudo kdumpctl rebuild && sudo systemctl restart kdump |
| Analyze vmcore | sudo crash <vmlinux> <vmcore> |
Wrap-up #
sysctl— adjusts runtime kernel parameters, with/etc/sysctl.d/*.conffor permanent separation. The99-prefix guarantees last application.tuned— bundles workload profiles.throughput-performance(server default),virtual-guest(cloud guest); custom profiles inherit from/etc/tuned/and override only what you need.kdump— captures memory dump at kernel panic. Reserve memory for the crash kernel viacrashkernel=, store vmcore at/var/crashor to NFS / SSH. Analyze post-mortem withcrash.- The roles — sysctl is per-line, tuned is the workload bundle, kdump is the panic safety net. When the same key collides, unify on one side.
The next post looks at what is eating the time on a machine where the kernel is running smoothly: performance analysis. We cover which tool to reach for — sar, top/htop, iostat, vmstat, perf — and which signals to read with each.