Importing ALL THE FILETYPES Using Bash
I wrote a Python script to automate importing images from my cell phone’s camera
onto my computer a while ago. It worked, but the performance was lacking and,
more importantly, it relied on the file name of each image being a timestamp in
YYYYMMDD_hhmmss
format. I recently bought a proper camera, which names files
according to a prefix and an incrementing sequence, so that assumption was no
longer valid. Furthermore, I struggled to find a Python library capable of
reading the EXIF data from the formats my camera produces (particularly HEIF and
Canon’s proprietary CR3 raw format), so I decided to rewrite the script in Bash.
Boilerplate: Argument Parsing and Validation
Basic argument parsing in Bash is very easy. Check out this Stack Overflow post for a detailed explanation. As we don’t need any positional arguments, all we need is the following:
src=""
dest=""
dry_run=0
# parse command line arguments
while [[ $# -gt 0 ]]
do
case $1 in
-s|--source)
src="$2"
shift
shift
;;
-d|--destination)
dest="$2"
shift
shift
;;
--dry-run)
dry_run=1
shift
;;
*)
printf '\033[91merror:\033[0m %s is not a valid argument\n' "$1" && exit
esac
done
# resolve paths
src=$(realpath "$src")
dest=$(realpath "$dest")
realpath
resolves the paths to remove any extra slashes. If it fails (e.g. if
the user passes a directory that doesn’t exist), it’s handled during validation.
Bash lets us do really concise one-liner validations like so:
# validate command line arguments
[[ -z "$src" ]] && printf '\033[91merror:\033[0m source directory not provided\n' && exit
[[ -z "$dest" ]] && printf '\033[91merror:\033[0m destination directory not provided\n' && exit
[[ ! -d "$src" ]] && printf '\033[91merror:\033[0m %s is not a directory\n' "$src" && exit
[[ ! -d "$dest" ]] && printf '\033[91merror:\033[0m %s is not a directory\n' "$dest" && exit
Dry Run Features
I wanted the script to have a dry run feature so I can make sure everything is parsed correctly before any files are actually copied. The dry run counts files by date and progressively writes that count to standard output. We use an associative array (which is the Bash equivalent of a dictionary) using dates as keys and the file counts as values:
# if dry run, declare an associative array to hold our dates
[[ $dry_run -gt 0 ]] && declare -A dates
We also declare an array to hold all the file names for which we can’t find a timestamp. Explicitly declaring the array isn’t necessary (like it is for an associative array), but it’s good for readability to have it defined with a comment explaining what it’s used for:
# declare an array to hold files without timestamps
no_timestamp=()
Processing Files
Now for the actual work. We want to find out when each picture or video was taken and name all of the output files accordingly. First things first, we skip directories (I could add support for recursing into them, but as of right now I haven’t had the need) and get the base name of the file:
for path in "$src"/*;
do
# skip directories. we'll allow for recursing later
[[ -d "$path" ]] && printf '\033[93mwarning:\033[0m skipping directory %s\n' "$path" && continue
# get just the file name, without extensions
filename=$(basename "$path" | cut -d'.' -f 1)
To get the timestamps, we look at every file with the matching base name
(without extension) and see if we can find one with an EXIF timestamp. This way,
if we come across a file that exiv2
can’t figure out, hopefully there will be
a related file that it can:
# extract timestamp
# look through all files with the same name to find one with exif data
timestamp_raw=""
for sibling in "$src"/"$filename"*; do
timestamp_raw=$(exiv2 -g Exif.Image.DateTime pr "$sibling" 2> /dev/null | tr -s ' ' | cut -d' ' -f 4-5)
[[ -n "$timestamp_raw" ]] && break
done
If we can’t find an EXIF timestamp for any of the files, we assume the file is a
video, and use ffprobe
to extract its timestamp. Then we parse out all of
the different components of the timestamp to rebuild it our way later:
if [[ -n "$timestamp_raw" ]]; then
# parse timestamp
date_raw=$(echo "$timestamp_raw" | cut -d' ' -f 1)
year=$(echo "$date_raw" | cut -d':' -f 1)
month=$(echo "$date_raw" | cut -d':' -f 2)
day=$(echo "$date_raw" | cut -d':' -f 3)
time=$(echo "$timestamp_raw" | cut -d' ' -f 2)
hour=$(echo "$time" | cut -d':' -f 1)
min=$(echo "$time" | cut -d':' -f 2)
sec=$(echo "$time" | cut -d':' -f 3)
else
# try to extract video timestamp
timestamp_raw=$(ffprobe -v quiet "$path" -show_entries format_tags=creation_time | sed 2!d | cut -d'=' -f 2)
date_raw=$(echo "$timestamp_raw" | cut -d'T' -f 1)
year=$(echo "$date_raw" | cut -d'-' -f 1)
month=$(echo "$date_raw" | cut -d'-' -f 2)
day=$(echo "$date_raw" | cut -d'-' -f 3)
time_raw=$(echo "$timestamp_raw" | cut -d'T' -f 2 | cut -d'.' -f 1)
hour=$(echo "$time_raw" | cut -d':' -f 1)
min=$(echo "$time_raw" | cut -d':' -f 2)
sec=$(echo "$time_raw" | cut -d':' -f 3)
fi
We reconstruct the timestamp, and check to make sure it’s what we expect it to be using a regular expression:
date="${year}-${month}-${day}"
timestamp="${year}${month}${day}_${hour}${min}${sec}"
if ! (echo "$timestamp" | grep -Pq '^\d{8}_\d{6}$'); then
no_timestamp+=("$path")
continue
fi
This way, if there’s some parsing error that made it through to this stage, it’ll be caught here.
If we’re doing a dry run, we just want to count the files grouped by date. We want some output while this is happening, so it’s clear that the script isn’t just hanging, so we use a carriage return to display the file count on the same line as it increases:
# if dry run, display the number of files per date as we count them
if [[ $dry_run -gt 0 ]]; then
if [[ -z ${dates["$date"]} ]]; then
(( ${#dates[@]} != 0 )) && printf '\n'
dates["$date"]=0
fi
printf "\r%s: %s file(s)" "$date" $((dates["$date"] + 1))
dates["$date"]=$((dates["$date"] + 1))
continue
fi
This has the disadvantage of producing multiple lines for the same date if the files aren’t processed in chronological order. However, as pictures from my phone are named with a timestamp, and pictures from my camera are named with an increasing sequence, the directories I throw at this script should be sorted in chronological order.
All that remains is to generate the output filename and directory and copy the
file. We extract the file extension and convert it to lower case, construct the
new file name, and copy the file to the destination directory. We use
--backup=t
to handle the case in which multiple pictures were taken in the
same second (e.g. a continuous shooting mode) and --no-preserve=mode,ownership
so that the output files match the permissions and ownership of their
destination (as the files are mode 777 as my system reads them from the SD card,
which is not desirable):
# convert extension to lower case
extension=$(basename "$path" | cut -d'.' -f 2- | tr '[:upper:]' '[:lower:]')
# create new file name
new_filename="${timestamp}.${extension}"
# make directory and copy
mkdir -p "${dest}/$date"
cp -v --backup=t --no-preserve=mode,ownership "$path" "${dest}/${date}/${new_filename}"
done
Finally, we list out the files with no timestamp. If we were doing a dry run, we’ll need to write an extra newline:
# if dry run, print a newline, because we don't after counting the images for the final date
if [[ $dry_run -gt 0 ]]; then
printf '\n'
fi
# list files with no timestamp
if (( ${#no_timestamp[@]} != 0 )); then
printf 'files with no timestamp:\n'
for path in "${no_timestamp[@]}"; do
printf '%s\n' "$(basename "$path")"
done
fi
And that’s it!
Final Remarks
One reason I like writing shell scripts is the ease with which I can verify a
line of code does what I expect it to do. If I want to see what the output of
ffprobe
looks like, I can pop open a terminal and run it. Then if I want to
see what cut -d'=' -f 2
does to that output, I can hit the up arrow and tack
it on with a pipe. Errors are quickly caught and fixed, as I can rapidly iterate
on and see the output of the commands being run.
The goal of scripting is to take something you would otherwise do manually (by typing commands into the shell) and automating it. As a shell script is nothing more than a list of shell commands, it’s the natural choice for such automation, which is why I think I’ll be using Bash for all my scripting needs from now on.
back to top