Importing ALL THE FILETYPES Using Bash

I wrote a Python script to automate importing images from my cell phone’s camera onto my computer a while ago. It worked, but the performance was lacking and, more importantly, it relied on the file name of each image being a timestamp in YYYYMMDD_hhmmss format. I recently bought a proper camera, which names files according to a prefix and an incrementing sequence, so that assumption was no longer valid. Furthermore, I struggled to find a Python library capable of reading the EXIF data from the formats my camera produces (particularly HEIF and Canon’s proprietary CR3 raw format), so I decided to rewrite the script in Bash.

Boilerplate: Argument Parsing and Validation

Basic argument parsing in Bash is very easy. Check out this Stack Overflow post for a detailed explanation. As we don’t need any positional arguments, all we need is the following:

src=""
dest=""
dry_run=0

# parse command line arguments
while [[ $# -gt 0 ]]
do
    case $1 in
        -s|--source)
            src="$2"
            shift
            shift
            ;;
        -d|--destination)
            dest="$2"
            shift
            shift
            ;;
        --dry-run)
            dry_run=1
            shift
            ;;
        *)
            printf '\033[91merror:\033[0m %s is not a valid argument\n' "$1" && exit
    esac
done

# resolve paths
src=$(realpath "$src")
dest=$(realpath "$dest")

realpath resolves the paths to remove any extra slashes. If it fails (e.g. if the user passes a directory that doesn’t exist), it’s handled during validation. Bash lets us do really concise one-liner validations like so:

# validate command line arguments
[[ -z "$src" ]] && printf '\033[91merror:\033[0m source directory not provided\n' && exit
[[ -z "$dest" ]] && printf '\033[91merror:\033[0m destination directory not provided\n' && exit

[[ ! -d "$src" ]] && printf '\033[91merror:\033[0m %s is not a directory\n' "$src" && exit
[[ ! -d "$dest" ]] && printf '\033[91merror:\033[0m %s is not a directory\n' "$dest" && exit

Dry Run Features

I wanted the script to have a dry run feature so I can make sure everything is parsed correctly before any files are actually copied. The dry run counts files by date and progressively writes that count to standard output. We use an associative array (which is the Bash equivalent of a dictionary) using dates as keys and the file counts as values:

# if dry run, declare an associative array to hold our dates
[[ $dry_run -gt 0 ]] && declare -A dates

We also declare an array to hold all the file names for which we can’t find a timestamp. Explicitly declaring the array isn’t necessary (like it is for an associative array), but it’s good for readability to have it defined with a comment explaining what it’s used for:

# declare an array to hold files without timestamps
no_timestamp=()

Processing Files

Now for the actual work. We want to find out when each picture or video was taken and name all of the output files accordingly. First things first, we skip directories (I could add support for recursing into them, but as of right now I haven’t had the need) and get the base name of the file:

for path in "$src"/*;
do
    # skip directories. we'll allow for recursing later
    [[ -d "$path" ]] && printf '\033[93mwarning:\033[0m skipping directory %s\n' "$path" && continue

    # get just the file name, without extensions
    filename=$(basename "$path" | cut -d'.' -f 1)

To get the timestamps, we look at every file with the matching base name (without extension) and see if we can find one with an EXIF timestamp. This way, if we come across a file that exiv2 can’t figure out, hopefully there will be a related file that it can:

    # extract timestamp
    # look through all files with the same name to find one with exif data
    timestamp_raw=""
    for sibling in "$src"/"$filename"*; do
        timestamp_raw=$(exiv2 -g Exif.Image.DateTime pr "$sibling" 2> /dev/null | tr -s ' ' | cut -d' ' -f 4-5)
        [[ -n "$timestamp_raw" ]] && break
    done

If we can’t find an EXIF timestamp for any of the files, we assume the file is a video, and use ffprobe to extract its timestamp. Then we parse out all of the different components of the timestamp to rebuild it our way later:

    if [[ -n "$timestamp_raw" ]]; then
        # parse timestamp
        date_raw=$(echo "$timestamp_raw" | cut -d' ' -f 1)

        year=$(echo "$date_raw" | cut -d':' -f 1)
        month=$(echo "$date_raw" | cut -d':' -f 2)
        day=$(echo "$date_raw" | cut -d':' -f 3)

        time=$(echo "$timestamp_raw" | cut -d' ' -f 2)

        hour=$(echo "$time" | cut -d':' -f 1)
        min=$(echo "$time" | cut -d':' -f 2)
        sec=$(echo "$time" | cut -d':' -f 3)
    else
        # try to extract video timestamp
        timestamp_raw=$(ffprobe -v quiet "$path" -show_entries format_tags=creation_time | sed 2!d | cut -d'=' -f 2)
        date_raw=$(echo "$timestamp_raw" | cut -d'T' -f 1)

        year=$(echo "$date_raw" | cut -d'-' -f 1)
        month=$(echo "$date_raw" | cut -d'-' -f 2)
        day=$(echo "$date_raw" | cut -d'-' -f 3)

        time_raw=$(echo "$timestamp_raw" | cut -d'T' -f 2 | cut -d'.' -f 1)

        hour=$(echo "$time_raw" | cut -d':' -f 1)
        min=$(echo "$time_raw" | cut -d':' -f 2)
        sec=$(echo "$time_raw" | cut -d':' -f 3)
    fi

We reconstruct the timestamp, and check to make sure it’s what we expect it to be using a regular expression:

    date="${year}-${month}-${day}"
    timestamp="${year}${month}${day}_${hour}${min}${sec}"

    if ! (echo "$timestamp" | grep -Pq '^\d{8}_\d{6}$'); then
        no_timestamp+=("$path")
        continue
    fi

This way, if there’s some parsing error that made it through to this stage, it’ll be caught here.

If we’re doing a dry run, we just want to count the files grouped by date. We want some output while this is happening, so it’s clear that the script isn’t just hanging, so we use a carriage return to display the file count on the same line as it increases:

    # if dry run, display the number of files per date as we count them
    if [[ $dry_run -gt 0 ]]; then
        if [[ -z ${dates["$date"]} ]]; then
            (( ${#dates[@]} != 0 )) && printf '\n'
            dates["$date"]=0
        fi
        printf "\r%s: %s file(s)" "$date" $((dates["$date"] + 1))
        dates["$date"]=$((dates["$date"] + 1))
        continue
    fi

This has the disadvantage of producing multiple lines for the same date if the files aren’t processed in chronological order. However, as pictures from my phone are named with a timestamp, and pictures from my camera are named with an increasing sequence, the directories I throw at this script should be sorted in chronological order.

All that remains is to generate the output filename and directory and copy the file. We extract the file extension and convert it to lower case, construct the new file name, and copy the file to the destination directory. We use --backup=t to handle the case in which multiple pictures were taken in the same second (e.g. a continuous shooting mode) and --no-preserve=mode,ownership so that the output files match the permissions and ownership of their destination (as the files are mode 777 as my system reads them from the SD card, which is not desirable):

    # convert extension to lower case
    extension=$(basename "$path" | cut -d'.' -f 2- | tr '[:upper:]' '[:lower:]')

    # create new file name
    new_filename="${timestamp}.${extension}"

    # make directory and copy
    mkdir -p "${dest}/$date"
    cp -v --backup=t --no-preserve=mode,ownership "$path" "${dest}/${date}/${new_filename}"
done

Finally, we list out the files with no timestamp. If we were doing a dry run, we’ll need to write an extra newline:

# if dry run, print a newline, because we don't after counting the images for the final date
if [[ $dry_run -gt 0 ]]; then
    printf '\n'
fi

# list files with no timestamp
if (( ${#no_timestamp[@]} != 0 )); then
    printf 'files with no timestamp:\n'
    for path in "${no_timestamp[@]}"; do
        printf '%s\n' "$(basename "$path")"
    done
fi

And that’s it!

Final Remarks

One reason I like writing shell scripts is the ease with which I can verify a line of code does what I expect it to do. If I want to see what the output of ffprobe looks like, I can pop open a terminal and run it. Then if I want to see what cut -d'=' -f 2 does to that output, I can hit the up arrow and tack it on with a pipe. Errors are quickly caught and fixed, as I can rapidly iterate on and see the output of the commands being run.

The goal of scripting is to take something you would otherwise do manually (by typing commands into the shell) and automating it. As a shell script is nothing more than a list of shell commands, it’s the natural choice for such automation, which is why I think I’ll be using Bash for all my scripting needs from now on.