ELK Index Cleanup
Elasticsearch Index Cleanup & Cron Job Automation
This page documents the automated Elasticsearch index cleanup process, along with the associated cron jobs used for system maintenance and certificate renewal.
Overview
Over time, Elasticsearch accumulates large amounts of index data. Old indices that are no longer needed can consume significant disk space and slow down operations. To address this, we use a **Bash script** that automatically deletes Elasticsearch indices older than a set retention period (default: **10 days**).
In addition to index cleanup, the server runs other automated tasks via `cron`:
- **User activity audit script**
- **SSL certificate renewal using Certbot**
Cron Jobs
The following cron jobs are configured on the server:
<syntaxhighlight lang="bash"> 30 19 * * * /root/delete-old-indices.sh >> /var/log/es_index_cleanup.log 2>&1 </syntaxhighlight>
- `30 19 * * * /root/delete-old-indices.sh >> /var/log/es_index_cleanup.log 2>&1` – Cleans up old Elasticsearch indices daily at 19:30 and logs the output.
Index Cleanup Script
The script below deletes Elasticsearch indices older than **10 days**.
<syntaxhighlight lang="bash">
- !/bin/bash
ES_HOST="http://localhost:9200" ES_USER="admin" ES_PASS="QjFEEb0ZdjXYm" RETENTION_DAYS=10 TODAY_EPOCH=$(date +%s) NOW=$(date "+%Y-%m-%d %H:%M:%S")
echo -e "\n\n========= Index Cleanup Started at $NOW ========="
curl -s -u $ES_USER:$ES_PASS "$ES_HOST/_cat/indices?h=index" | while read index; do
echo "Checking index: $index"
# Extract date pattern (YYYY.MM.DD)
date_part=$(echo "$index" | grep -oE '[0-9]{4}\.[0-9]{2}\.[0-9]{2}' | tail -n1)
if -z "$date_part" ; then echo " → Skipped: no valid date found" continue fi
index_date_epoch=$(date -d "${date_part//./-}" +%s 2>/dev/null)
if -z "$index_date_epoch" ; then echo " → Skipped: invalid date format in $index" continue fi
age_days=$(( (TODAY_EPOCH - index_date_epoch) / 86400 ))
if [ "$age_days" -gt "$RETENTION_DAYS" ]; then echo "🗑️ Would delete: $index (Age: $age_days days)" # Uncomment to delete: # curl -s -u $ES_USER:$ES_PASS -XDELETE "$ES_HOST/$index" else echo "✅ Keeping: $index (Age: $age_days days)" fi
done
NOW_DONE=$(date "+%Y-%m-%d %H:%M:%S") echo "========= Index Cleanup Finished at $NOW_DONE =========" echo -e "\n\nScript done." </syntaxhighlight>
How It Works
- **Get the list of indices** from Elasticsearch using the `_cat/indices` API.
- **Extract the date** from the index name (format: `YYYY.MM.DD`).
- **Convert the date to epoch time** for comparison.
- **Calculate the index age** in days.
- If the index age is **greater than the retention period** (10 days), it is deleted.
- Otherwise, the index is kept.
Notes
- Ensure your Elasticsearch user (`ES_USER`) has privileges to delete indices.
- Test the script first with the `curl -XDELETE` command commented out to avoid accidental data loss.
- Adjust `RETENTION_DAYS` to your desired value.
Logs
All cleanup operations are logged to:
<syntaxhighlight lang="bash"> /var/log/es_index_cleanup.log </syntaxhighlight>
Related Links
- [Elasticsearch _cat APIs Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html)
- [Certbot Documentation](https://certbot.eff.org/docs/)