Bash script to limit the size of a directory by deleting files that were accessed most recently

Earlier, I used the simple find command to delete tar files that were not available in the last x days (in this example, 3 days):

find /PATH/TO/FILES -type f -name "*.tar" -atime +3 -exec rm {} \;

Now I need to improve this script by deleting the date in access order and my bash writing skills are a bit rusty. Here is what I need to do:

  • check directory size / PATH / TO / FILES
  • if size 1) is larger than size X, get a list of files by access date
  • delete files to be smaller than X

The advantage here is in the cache and backup directories, I will only delete what I need to keep it within the limit, while the simplified method can exceed the size limit if one day is especially large. I assume I need to use stat and bash for a loop?

+5
source share
3 answers

Here is a simple, easy to read and understand method that I came up with for this:

DIRSIZE=$(du -s /PATH/TO/FILES | awk '{print $1}')
if [ "$DIRSIZE" -gt "$SOMELIMIT" ]
  then
    for f in `ls -rt --time=atime /PATH/TO/FILES/*.tar`; do
    FILESIZE=`stat -c "%s" $f`
    FILESIZE=$(($FILESIZE/1024))

    DIRSIZE=$(($DIRSIZE - $FILESIZE))
    if [ "$DIRSIZE" -lt "$LIMITSIZE" ]; then
        break
    fi
done
fi
+4
source

I improved the brunner314 example and fixed the problems in it.

Here is the working script I am using:

#!/bin/bash
DELETEDIR="$1"
MAXSIZE="$2"
if [[ -z "$DELETEDIR" || -z "$MAXSIZE" || "$MAXSIZE" -lt 1 ]]; then
    echo "usage: $0 [directory] [maxsize in megabytes]" >&2
    exit 1
fi
find "$DELETEDIR" -type f -printf "%T@::%p::%s\n" \
| sort -rn \
| awk -v maxbytes="$((1024 * 1024 * $MAXSIZE))" -F "::" '
  BEGIN { curSize=0; }
  { 
  curSize += $3;
  if (curSize > maxbytes) { print $2; }
  }
  ' \
  | tac | awk '{printf "%s\0",$0}' | xargs -0 -r rm
# delete empty directories
find "$DELETEDIR" -mindepth 1 -depth -type d -empty -exec rmdir "{}" \;
+6
source

, stat awk. , :

find /PATH/TO/FILES -name '*.tar' -type f \
| sed 's/ /\\ /g' \
| xargs stat -f "%a::%z::%N" \
| sort -r \
| awk '
  BEGIN{curSize=0; FS="::"}
  {curSize += $2}
  curSize > $X_SIZE{print $3}
  '
| sed 's/ /\\ /g' \
| xargs rm

, , .

find, , , 3 . sed, - , , xargs stat . -f "% a::% z::% N" stat , , . "::" , . -r, .

Now we have a list of all the files that interest us, so that the last one has access to the earliest access. Then the awk script sums up all the sizes as it goes through the list and starts to print them when it gets more than $ X_SIZE. Files that are not displayed in this way will be saved, other file names will be sent to sed again to avoid any spaces, and then to xargs, which starts them rm.

+1
source

All Articles