ELK Index Cleanup

From PheonixSolutions
Revision as of 10:06, 11 August 2025 by Tech team (talk | contribs) (Created page with "= Elasticsearch Index Cleanup & Cron Job Automation = This page documents the automated Elasticsearch index cleanup process, along with the associated cron jobs used for syst...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Elasticsearch Index Cleanup & Cron Job Automation

This page documents the automated Elasticsearch index cleanup process, along with the associated cron jobs used for system maintenance and certificate renewal.

Overview

Over time, Elasticsearch accumulates large amounts of index data. Old indices that are no longer needed can consume significant disk space and slow down operations. To address this, we use a **Bash script** that automatically deletes Elasticsearch indices older than a set retention period (default: **10 days**).

In addition to index cleanup, the server runs other automated tasks via `cron`:

  • **User activity audit script**
  • **SSL certificate renewal using Certbot**

Cron Jobs

The following cron jobs are configured on the server:

<syntaxhighlight lang="bash"> 30 19 * * * /root/delete-old-indices.sh >> /var/log/es_index_cleanup.log 2>&1 </syntaxhighlight>

  • `30 19 * * * /root/delete-old-indices.sh >> /var/log/es_index_cleanup.log 2>&1` – Cleans up old Elasticsearch indices daily at 19:30 and logs the output.

Index Cleanup Script

The script below deletes Elasticsearch indices older than **10 days**.

<syntaxhighlight lang="bash">

  1. !/bin/bash

ES_HOST="http://localhost:9200" ES_USER="admin" ES_PASS="QjFEEb0ZdjXYm" RETENTION_DAYS=10 TODAY_EPOCH=$(date +%s) NOW=$(date "+%Y-%m-%d %H:%M:%S")

echo -e "\n\n========= Index Cleanup Started at $NOW ========="

curl -s -u $ES_USER:$ES_PASS "$ES_HOST/_cat/indices?h=index" | while read index; do

 echo "Checking index: $index"
 # Extract date pattern (YYYY.MM.DD)
 date_part=$(echo "$index" | grep -oE '[0-9]{4}\.[0-9]{2}\.[0-9]{2}' | tail -n1)
 if -z "$date_part" ; then
   echo "  → Skipped: no valid date found"
   continue
 fi
 index_date_epoch=$(date -d "${date_part//./-}" +%s 2>/dev/null)
 if -z "$index_date_epoch" ; then
   echo "  → Skipped: invalid date format in $index"
   continue
 fi
 age_days=$(( (TODAY_EPOCH - index_date_epoch) / 86400 ))
 if [ "$age_days" -gt "$RETENTION_DAYS" ]; then
   echo "🗑️  Would delete: $index (Age: $age_days days)"
   # Uncomment to delete:
   # curl -s -u $ES_USER:$ES_PASS -XDELETE "$ES_HOST/$index"
 else
   echo "✅ Keeping: $index (Age: $age_days days)"
 fi

done

NOW_DONE=$(date "+%Y-%m-%d %H:%M:%S") echo "========= Index Cleanup Finished at $NOW_DONE =========" echo -e "\n\nScript done." </syntaxhighlight>

How It Works

  1. **Get the list of indices** from Elasticsearch using the `_cat/indices` API.
  2. **Extract the date** from the index name (format: `YYYY.MM.DD`).
  3. **Convert the date to epoch time** for comparison.
  4. **Calculate the index age** in days.
  5. If the index age is **greater than the retention period** (10 days), it is deleted.
  6. Otherwise, the index is kept.

Notes

  • Ensure your Elasticsearch user (`ES_USER`) has privileges to delete indices.
  • Test the script first with the `curl -XDELETE` command commented out to avoid accidental data loss.
  • Adjust `RETENTION_DAYS` to your desired value.

Logs

All cleanup operations are logged to:

<syntaxhighlight lang="bash"> /var/log/es_index_cleanup.log </syntaxhighlight>