Find And Remove Duplicate Files, Similar Images And More With Czkawka
Czkawka is a fast (multi-threaded) application to find and remove duplicate files, invalid symlinks, similar images, and more.
Original story by Logix from the Linux Uprising Blog. Published 2020-05-13, Originally published 2020-05-12.
This work is available under the Creative Commons Attribution (CC BY) license.
Czkawka. Screenshot by Logix, licensed under the Creative Commons Attribution license.
Czkawka is similar in both user interface and functionality to FSlint, a duplicate file finder for Linux which has not been updated from Python2 and thus, is no longer available for many Linux distributions.
The application is written in Rust, it comes with both GUI (GTK3) and CLI frontends, and is available for Linux, macOS and Microsoft Windows.
Using Czkawka, you can remove unnecessary files from your computer such as:
- duplicate files
- similar images (with image previews)
- music duplicates
- big files
- temporary files
- zeroed files
- invalid symlinks
- broken files
- empty files
- empty directories
From its simple user interface, you can include or exclude directories (with the option to either only scan the top directory or recursive scan) and items, and optionally add a list of allowed extensions.
Some "unnecessary files" categories have their own options. For example when searching for duplicate files you can specify the minimum file size, check method (Hash, HashMb, Size or Name) and hash type (Blake3, CRC32 or XXH3). For similar images there are options to specify the minimum file size, and the level of similarity (ranging from minimal to very high). As for music duplicates, Czkawka allows setting the minimum file size, and the song title, artist, album title, album artist and year.
From the application options you can specify to show a confirmation dialog when deleting, move deleted files to trash instead of deleting them (this is unchecked by default, so you may want to enable this option so in case you remove the wrong files, you can restore them), disable showing image previews when scanning for similar images, etc.
In the Czkawka options you'll also find options for saving the current configuration, loading a saved configuration, and resetting it. Here, "configuration" means the settings you've entered in Czkawka for finding duplicates, like the included and excluded directories, check method, etc.
The command line interface of Czkawka seems to be on-par with the GUI, feature-wise (at least on a first look), and its help is extensive, with examples. So if you're looking for a way to automate duplicate file removal, scan and remove similar images, etc., from a script, give it a try. Note that the GUI and CLI interfaces are available to download in separate binaries though!
The tool was updated to version 3.0.0 yesterday, receiving various improvements:
- Option to not ignore hardlinks
- Hardlink support for GUI
- New settings window
- Unify file removing
- Dry run in duplicates CLI
- Option to turn off cache
- Add confirmation dialog when trying to remove all files in group
- Add confirmation dialog when removing files with delete key
- Open file on double-click or using the Enter key
- Allow putting files in trash instead fully removing them
Using Czkawka (GUI)
To search for duplicate files (or some other category from the left-hand side column, like similar images, invalid symlinks, etc.), add the directories you want to scan at the top of the application. You can also add directories or items to exclude, and allowed extensions. Then click thebutton in the bottom left-hand side to begin finding the duplicates (or other unnecessary files).
The first time you're performing a search, Czkawka may take a while (depending on many factors, the number of files included in the search, your hardware, etc.), but the second and subsequent scans are a lot faster than the first thanks to the application's caching feature (you can disable this though from its settings).
For each found duplicate, the application lets you select them using multiple filters (e.g., , etc.):
With this selection you can then choose to delete, symlink or hardlink the found files. You can also save the duplicates to a text file.
In case you're opting to delete the found duplicates, I recommend opening the Czkawka options and enabling the option to move deleted duplicate files to trash, so you can restore them later in case you've deleted the wrong file.
You may want to visit the Czkawka usage instructions for more details.
You can download Czkawka from github.com/qarmin/czkawka/releases. You can find Czkawka binaries for Linux, Windows and macOS. For each, there are separate GUI and command line binaries available for download.
Besides the binaries from the application's releases tab, there are also Snap, Flatpak, AUR, and PPA packages that you can use to install Czkawka. Or you can build it from source. See the application installation section for details.
The GitHub releases tab offers Czkawka GUI both as a generic binary, as well as an AppImage (with the CLI being available only as a separate generic binary). They all should work on any Linux distribution, but note that in my case, the application didn't respect my system GTK theme when using the AppImage binary; this didn't happen using the generic binary.
If you opt to get the generic GUI binary (
linux_czkawka_gui) from the application GitHub releases tab, place this file in your home directory, then install it to
sudo install ~/linux_czkawka_gui /usr/local/bin/czkawka-gui
After this, you can remove the linux_czkawka_gui file from your home directory.
Using this, you won't find Czkawka in your applications menu. So either launch it via Alt + F2 or a by opening a terminal and typing czkawka-gui, or you can add a menu entry for it using a tool such as MenuLibre.
To install the Czkawka Linux CLI tool in /usr/local/bin as czkawka-cli, download linux_czkawka_cli, place it in your home directory, then run:
sudo install ~/linux_czkawka_cli /usr/local/bin/czkawka-cli
You can now remove the
linux_czkawka_cli binary file from your home directory.