I had a problem where I needed to find out what users have accessed some API-endpoint from Nginx-logs, so I made a handy script for that.
Kalle Tolonen
Feb. 27, 2025
Last updated on March 3, 2025
Update: I made an update to the script.
Zcat is nice, since it allows us to view gz-compressed logs and then use awk on the output.
The script for your pleasure:
#!/bin/bash
# Check if the log directory is passed as an argument
if [ -z "$1" ]; then
echo "Usage: $0 /path/to/log/directory/"
exit 1
fi
LOG_DIR="$1"
# Generate an output file name by removing any trailing slashes and incorporating the directory name
DIR_NAME=$(basename "${LOG_DIR%/}")
OUTPUT_FILE="${DIR_NAME}_unique_api_user_combinations.txt"
# Empty the output file if it exists
> "$OUTPUT_FILE"
process_log() {
awk '{
api = $7; # Assume the API endpoint is the 7th field
svc_user = "";
for (i = 1; i <= NF; i++) {
if ($i ~ /^svc/) { # Find the field starting with "svc"
svc_user = $i;
break;
}
}
if (svc_user != "") {
# Normalize the API endpoint by replacing /<integer_segment> with /[ID]
# Also, replace anything after [?,=,:] with char[PARAMS]
normalized_api = gensub(/\/[0-9]+[^\s]*/, "/[ID]", "g", api);
normalized_api = gensub(/\?.*/, "?[PARAMS]", "g", normalized_api);
normalized_api = gensub(/\=.*/, "=[PARAMS]", "g", normalized_api);
normalized_api = gensub(/\:.*/, ":[PARAMS]", "g", normalized_api);
print normalized_api, svc_user;
}
}' "$1" >> "$OUTPUT_FILE"
}
# Process each regular log file in the directory
for log_file in "$LOG_DIR"/*.log; do
process_log "$log_file"
done
# Process each compressed log file in the directory
for gz_log_file in "$LOG_DIR"/*.log-*.gz; do
zcat "$gz_log_file" | process_log "/dev/stdin"
done
# Remove duplicates from the output file
sort "$OUTPUT_FILE" | uniq > "${OUTPUT_FILE}.tmp"
mv "${OUTPUT_FILE}.tmp" "$OUTPUT_FILE"
echo "Unique API/user combinations written to $OUTPUT_FILE"
Mix & match for your use.
No published comments yet.
Your comment may be published.