Cookie Consent by Free Privacy Policy Generator ๐Ÿ“Œ Czkawka 7.0 โ€” Krokiet, a new gui in Slint, cli binary without needing libc, performance improvements and a handful of new features for the data cleaner.

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š Czkawka 7.0 โ€” Krokiet, a new gui in Slint, cli binary without needing libc, performance improvements and a handful of new features for the data cleaner.


๐Ÿ’ก Newskategorie: Linux Tipps
๐Ÿ”— Quelle: reddit.com

Hi,

Recently I released new version of Czkawka and wanted to share it here with medium link, but automod removed it, so I just copy-paste here content of https://medium.com/@qarmin/czkawka-7-0-a465036e8788

https://preview.redd.it/78o2wlw6cjjc1.jpg?width=1434&format=pjpg&auto=webp&s=08ca05bd276c547bd86d315e08782c96321ac05a

I am happy to introduce the new version 7.0 of Czkawka app โ€” a fully free program for finding duplicates, broken and temporary files, similar images, videos, empty files and folders and several other things.

In the three years of the applicationโ€™s existence, I have thought about stopping development many times, because it contains almost everything I use on a daily basis, and adding and fixing thousands of lines of code, testing new features, modifying and verifying the CI or thinking about the implementation of the gui can be very tedious, so I have had to take breaks in order not to burn myself out completely and abandon the project, but each time I came across elements that would be nice to add/change, so the application is constantly being developed.

As you may have seen on github, the number of issues has long since exceeded 300 and is constantly growing. Most of them mainly concern new features, which I only consider a small part of as implementable, and implement even less. This is not due to my laziness (although I admit I am a lazy person), but because I do not see sufficient user need for the implementation of a feature, or it would take too much time to implement or maintain it - at the start of this project I implemented a large number of new features without thinking too much, so that over the following years I had to gradually improve and generalize the code.

This version is probably the biggest, and all due to:

New gui

The biggest feature in this version, is the new user interface โ€œKrokietโ€, written completely from scratch, which, unlike the existing interface, uses the Slint library instead of GTK, which, like Czkawka, is written in Rust.

The reason for the change was a number of bugs, the difficulty of packaging, and the need for a major redesign of the application, because almost everywhere are used deprecated items like TreeView, which are obsolete and will not be available in GTK 5(https://docs.gtk.org/gtk4/class.TreeView.html).

I tested the use of various libraries such as gtk, qt, tauri, slint, egui, iced, but the final choice was slint โ€” a cursory and subjective overview of the pros and cons is available hereโ€” https://github.com/qarmin/czkawka/tree/master/krokiet#why-slint.

โ€‹

Czkawka on the left and Krokiet on the right

Differences in favour of Krokiet(and Slint)

  • Almost entirely static binary files โ€” GTK is usually dynamically linked, making the application itself dependent on the GTK version as well as all dependent libraries on the system. On Windows, due to the lack of a gtk4 installer, I had to manually distribute all the dll files along with the application. Krokiet, on the other hand, is on each platform distributed as a single file, which is much easier to transfer and use.
  • Operation outside Linux โ€” Gtk was primarily developed with Linux in mind. The Windows and Mac port is horribly buggy (although there are sometimes problems on Linux too). Random crashes, not starting up, not working correctly โ€” these are just some of the problems reported on github, which usually I cannot reproduce and fix.
  • Compiling under Windows โ€” I have tried to compile Czkawka several times under Windows, but as it happens with gtk, it failed to compile. For this reason, I had to use a complicated cross-compilation method due to the large amount of dynamic libraries. Slint uses almost entirely rust dependencies, which makes it very easy and pleasant to compile.
  • GUI creation without compilation โ€” GTK 4 has Cambalache which is great for GUI creation, but it is not an official tool, so the author creating a program does not have to follow the standard that GTK promotes/imposes and also this application does not have the community support it deserves. Slint, unlike Cambalache, does not have a drag&drop editor, but makes up for it with a built-in lsp editor which displays the interface in real time and allows to execute functions and code within the slint files, which is really useful and makes it easier to test changes, especially as compiling a program in slint is quite long. In the case of Krokiet, generated rust code from slint files have 150,000 lines after reformatting.
  • More rust api โ€” in GTK, due of the use of TreeView, I often worked with TreeIter, which are wrappers created by gtk-rs, containing pointers to the model underneath, which were very easy to misuse them โ€” and thatโ€™s not why I use rust, to deal with memory errors and use address sanitizer/valgrind.

Differences in favour of Czkawka(and GTK):

  • More features โ€” I developed Czkawka for over 3 years, so itโ€™s logical that the amount of code in it is significant, and I wasnโ€™t able to rewrite everything, especially since a substantial part of the code is closely tied to GTK (mainly operations on iterators in models). Additionally, Slint doesnโ€™t have sufficiently developed equivalents for some GTK widgets.
  • More advanced default components โ€” For example, a ListView in GTK has intuitive logic for selecting multiple items using the keyboard, and in Slint, such functionality needs to be implemented manually. Also, other widgets in GTK have many more settings that can change their default behavior, eliminating the need to constantly reinvent the same widgets with only minor differences.
  • Multilingualism โ€” In Czkawka, I used the fluent library to provide users ability to change the language, which worked very well, was easy to compile, and had a very simple key -> value format. However, the creators of Slint have limited the use of external tools to gettext in their language. I have two issues with this translation method. Firstly, it is a tool written in C, which complicates compilation and cross-compilation. Secondly, the file format doesnโ€™t appeal to me; while it may better illustrate which phrases are used in the code, for me, it is not necessary for a small project and complicates translations.

I found a few bugs while using Slint, so itโ€™s fair to say that in return for the product, I ran tests on it for free

As you can see, Krokiet is not yet finished, and there are several elements that can be improved:

  • Logo โ€” shows my graphic design skills in full glory; however, I am thinking about adding something more โ€œofficialโ€
  • Icons โ€” I managed to create two icons, which are not masterpieces either. I looked for an AI tool for generating icons, but found nothing.
  • Implementation of some missing elements from Czkawka
  • Progress bar during file deletion/movement
  • Improvement of the layout of certain elements โ€” despite having the GUI already created, I still wonder if it can be enhanced to be more user-friendly
  • Moving the arrows through the results
  • Mouse/keyboard selection of multiple records
  • Reversing the selection with the middle mouse button

The Future of Czkawka โ€” Gui GTK version

Alright, but youโ€™re probably wondering about the future of the current interface?

In fact, nothing spectacular is going to happen; Iโ€™m simply minimizing amount of bug fixing and adding new features. Since czkawka_core is shared by all components, changes/improvements in this library will be visible in this GUI version, even though it wonโ€™t be directly modified.

Application will probably live ~10โ€“15 years until distributions phase out Gtk 4 from their package repositories, so thereโ€™s more than enough time for a transition. Of course, thereโ€™s a chance that someone might want to update the GTK version in the application, but being a realist, I know that itโ€™s almost impossible.

Performance improvements

In this version, apart from the usual fixes and improvements, while reviewing the code after a break, I noticed that some elements in the project are not as efficient as they could be. This is because certain parts of the program were not updated since I created the app.

  • Reaching for file metadata only when necessary

While collecting files/folders to scan, I have to check whether a particular element in the folder is a file/folder to assign it appropriately depending on the mode. Previously, I fetched metadata of entry at the start and then using the is_file/is_dir methods on a variable of type Metadata, which result in unnecessary disk data retrieval.

let Ok(read_dir) = fs::read_dir(current_folder) else { return; }; for entry in read_dir.into_iter().flatten() { let Ok(metadata) = entry.metadata() else { continue; } if metadata.is_file() { if is_valid(&entry) { // Process record } } else if metadata.is_dir() { if is_valid(&entry) { // Process record } } } 

While reviewing DirEntry API, it turned out that it already contains information about the file type. So, fetching file metadata could be delayed and performed on a smaller number of files, filtered by other, less time-consuming methods.

let Ok(read_dir) = fs::read_dir(current_folder) else { return; }; for entry in read_dir.into_iter().flatten() { let Ok(file_type) = entry.file_type() else { continue }; if file_type.is_file() { if is_valid(&entry) { let Ok(metadata) = entry.metadata() else { continue; } // Process record } } else if file_type.is_dir() { if is_valid(&entry) { let Ok(metadata) = entry.metadata() else { continue; } // Process record } } } 

During profiling, I found a similar issue in the fontdb library, and changing the way is_file/is_dir is handled should reduce the font search time by around 12%. So, if you have the time, you can try fixing it hereโ€” https://github.com/RazrFalcon/fontdb/issues/60

  • Optimization of excluded elements

To verify whether a file can be used in a particular tool, it needs to pass several validations, such as checking if itโ€™s not in an excluded folder, has the appropriate size, or if its name is not excluded.

The name exclusion is done using manually parsed wildcards like โ€œ/home/*/.cacheโ€, which ignores paths like โ€œ/home/user/.cache/otherappโ€. The issue became more noticeable when I started using more of them and due to other optimizations, as it became much more visible on the flame graph.

The problem was that the function responsible for checking whether a given name is excluded, allocated a string (conversion from PathBuf to String), a vector for the indexes used to traverse the path, and a vector for parts of the wildcard for each file and wildcard. Caching strings and eliminating unnecessary allocations significantly improved the performance of this step, which, alongside fetching file metadata from the disk, was the most time-consuming operation.

  • Speeding up the search for empty folders

If youโ€™ve ever been curious about the mode I use most often, itโ€™s searching for empty folders. Strange? Perhaps โ€” but even stranger is that it worked very slowly, with delays of several seconds, visible in the GUI on the bar showing the number of checked folders when it didnโ€™t update.

When searching for empty folders, all folders are initially considered as potentially empty. Then, when any non-folder element is found in a folder, it and all its ancestors are marked as non-empty folders. This prevents situations where, after deleting an empty folder from a directory, another folder becomes empty, requiring multiple scans to remove all empty folders.

I've always thought that the issue was due to having folders on the disk with thousands of files directly inside them. Imagine my surprise when I saw what the hotspot revealed (by the way, a great tool -https://github.com/KDAB/hotspot)

Flamegraph from searching for empty folders in Czkawka 6.1

As can be seen, the code responsible for reading metadata entries from the disk is highly parallelized and, despite using ~25% of the instructions, is responsible for less than 10% of the total execution time.

The app spent most of its time in the std::path::prepare_components function, which of course was not called directly anywhere, so I had no idea where it came from.

Eventually I got to this function, which, however, doesnโ€™t look like it does any comparisons on paths:

fn set_as_not_empty_folder(folder_entries: &mut BTreeMap<PathBuf, FolderEntry>, current_folder: &Path) { let mut d = folder_entries.get_mut(current_folder).unwrap(); d.is_empty = FolderEmptiness::No; // Loop to recursively set as non empty this and all parent folders loop { d.is_empty = FolderEmptiness::No; if d.parent_path.is_some() { let cf = d.parent_path.clone().unwrap(); d = folder_entries.get_mut(&cf).unwrap(); } else { break; } } } 

However, probably more experienced programmers know that, BTreeMap works internally by comparing elements with each other, so a single get/get_mut can compare elements up to a dozen times before it finds the particular one. So the mystery why there is visible comparing is explained, and only remains to find the reason why it works so slow. So letโ€™s look at the compare function

impl PartialEq for PathBuf { #[inline] fn eq(&self, other: &PathBuf) -> bool { self.components() == other.components() } } 

so the components are compared, and what are they and how are they compared?

pub fn components(&self) -> Components<'_> { let prefix = parse_prefix(self.as_os_str()); Components { path: self.as_u8_slice(), prefix, has_physical_root: has_physical_root(self.as_u8_slice(), prefix) || has_redox_scheme(self.as_u8_slice()), front: State::Prefix, back: State::Body, } } 

impl<'a> PartialEq for Components<'a> { #[inline] fn eq(&self, other: &Components<'a>) -> bool { let Components { path: _, front: _, back: _, has_physical_root: _, prefix: _ } = self;

 // Fast path for exact matches, e.g. for hashmap lookups. // Don't explicitly compare the prefix or has_physical_root fields since they'll // either be covered by the `path` buffer or are only relevant for `prefix_verbatim()`. if self.path.len() == other.path.len() && self.front == other.front && self.back == State::Body && other.back == State::Body && self.prefix_verbatim() == other.prefix_verbatim() { // possible future improvement: this could bail out earlier if there were a // reverse memcmp/bcmp comparing back to front if self.path == other.path { return true; } } // compare back to front since absolute paths often share long prefixes Iterator::eq(self.clone().rev(), other.clone().rev()) } }impl<'a> PartialEq for Components<'a> { #[inline] fn eq(&self, other: &Components<'a>) -> bool { let Components { path: _, front: _, back: _, has_physical_root: _, prefix: _ } = self; 

Itโ€™s a lot of code and comparisons, right? It turned out that I was using the sorting provided by the binary tree only for displaying results in the CLI. So, in favor of manual sorting, I changed the BTreeMap type to HashMap and, just in case, replaced PathBuf with String. And now the function looks like this:

pub(crate) fn set_as_not_empty_folder(folder_entries: &mut HashMap<String, FolderEntry>, current_folder: &str) { let mut d = folder_entries.get_mut(current_folder).unwrap(); if d.is_empty == FolderEmptiness::No { return; // Already set as non empty by one of his child } // Loop to recursively set as non empty this and all his parent folders loop { d.is_empty = FolderEmptiness::No; if d.parent_path.is_some() { let cf = d.parent_path.clone().unwrap(); d = folder_entries.get_mut(&cf).unwrap(); if d.is_empty == FolderEmptiness::No { break; // Already set as non empty, so one of child already set it to non empty } } else { break; } } } 

What about performance? Previously, scanning 3 million files and folders, with filesystem caching (done automatically when scanning multiple times), took 48 seconds. Now, it takes only 1.75 seconds.

โ€‹

Flamegraph from searching for empty folders in Czkawka 7.0

  • Speeding up cache loading/saving

To avoid unnecessary conversions, the object where file information like name and size were read, was identical for each mode. The issue was that it also contained information about the hash or symlink, even though they were not used anywhere outside the duplicate or symlink modes.

pub struct FileEntry { pub path: PathBuf, pub size: u64, pub modified_date: u64, pub hash: String, pub symlink_info: Option<SymlinkInfo>, } 

Therefore, I concluded that itโ€™s better to create basic structures and then convert them into more advanced types. This may put a bit of a burden on the CPU, but I believe the compiler will do its best to optimize it. Thanks to this, almost every mode now uses generic file searching, and the FileEntry structure has decreased from 96 bytes to 40 bytes (and it was usually created thousands/millions of times).

pub struct FileEntry { pub path: PathBuf, pub size: u64, pub modified_date: u64, } 

Unfortunately, I didnโ€™t do any benchmarks, but it should help and better utilize the CPU cache โ€” I based on this https://youtu.be/2EWejmkKlxs

Also, the size of the cache files on disk has been reduced slightly by deleting some fields, so unfortunately, the files will have to be scanned again to populate the cache.

  • micro-optimisations

In addition to major changes, while reviewing the code, I made various small optimizations where I thoutht appropriate. In one place, I removed an if statement, and elsewhere, unnecessary clones(probably some of these changes the compiler would have done automatically, but as you can see in https://youtu.be/V6ug3e3jC54 not always this is automatically done).

In the hotspot, I noticed that the function check_if_entry_have_valid_extension, was using a bit more cpu than I thought is necessary and it initially looked like this

let Some(extension) = entry_data.path().extension() else {return false}; let extension_str = extension.to_string_lossy(); // Logic to check if extensions_str is excluded/allowed 

Having in mind that PathBuf consists of several elements, I checked what code the `path()` method contains. To my surprise, with each invocation, it performs the concatenation of the directory with the file name, which is visible in the flame graph with thousands of elements. Until then, I thought it was a completely free function, simply returning an element that exists in memory.

pub fn path(&self) -> PathBuf { self.dir.root.join(self.file_name_os_str()) } 

So, my goal became to avoid using it wherever possible. In this case, it was possible to make it faster, at the cost of introducing a somewhat less elegant solution โ€” the benchmark I used showed a 5x performance improvement.

let file_name = entry_data.file_name(); let Some(file_name_str) = file_name.to_str() else { return false }; let Some(extension_idx) = file_name_str.rfind('.') else { return false }; let extension = &file_name_str[extension_idx + 1..]; // Logic to check if extensions_str is excluded/allowed 

As you can see, sometimes you need to be flexible and not assume that just using Rust will automatically fix performance issues or make it as fast as possible (though usually even suboptimal Rust code will be orders of magnitude faster than similar code in Python).

Other changes

The new version also brings a number of minor changes:

  • Support for dragging folders for scanning in Czkawka โ€” unfortunately, Slint does not yet support this feature.
  • Generating (almost) fully static czkawka_cli files on Linux โ€” no dynamic linking to libc.
  • Predefined stack size for threads โ€” this should help when using musl, which by default used ridiculously small values (or maybe I used ridiculously large stack sizes, but to my justification, I couldnโ€™t find any place with it). It used to crash in random places (mainly it was used by the Docker image https://github.com/jlesage/docker-czkawka)
  • Generalization of parts of the code โ€” mainly in the file search area, but changes also affected CLI arguments and handling the file progress processing.
  • Adding a progress bar in the CLI โ€” ugly and inconsistent, but still a good starting point.
  • Handling excluded file extensions โ€” previously, you could only choose allowed extensions.
  • Compilation with Link-Time Optimization (LTO) โ€” Files built in CI are now compiled with fat LTO, which typically results in a 25โ€“50% reduction in file size and a 5โ€“10% increase in performance.

Etymology of the name

First Czkawka and Szyszka, and now Krokiet. Stupid names, misleading and not associated with anything specific, so why do I give them?

From the very beginning of the project, there have been quite a few voices suggesting that the applicationโ€™s name should be changed to something like โ€œAnother Duplicate Cleaner.โ€

I understand that entirely because itโ€™s much easier to remember a name that is commonly used than one that isnโ€™t even among the simplest words in its original language.

This application is my side project, which I create in my free time outside of work, and I donโ€™t really want to impose any strict rules on it (very high test coverage, very detailed checking of PR/new elements, etc.) because I donโ€™t have the time or energy for that, and it would also slow down the development of the application.

So, the name suggests that itโ€™s not an application for companies, created for profit, but just a simple application made purely for the joy of creating.

Ending words

This project emerged as a clumsy attempt to revive fslint and to learn Rust with a useful example. After all these years and versions, I can confidently confirm that it worked. It was my first larger project, before that, I mostly wrote small scripts in Python and C++ for studies and personal use.

The one thing that continues to amaze me is how popular this app has become. Based on https://github.com/EvanLi/Github-Ranking/blob/master/Top100/Rust.md, is among the 100 most popular rust programs/libraries on github with more than 14,000 stars.

Price โ€” free, MIT/GPL license (GPL โ€” gui code in slint, MIT โ€” everything else, so entire app is GPL) โ€” no ads, internet connection or statistics collection

Repository - https://github.com/qarmin/czkawka
Files to download โ€” https://github.com/qarmin/czkawka/releases

submitted by /u/krutkrutrar
[link] [comments] ...



๐Ÿ“Œ Czkawka 7.0 - Krokiet, a new gui in Slint, performance improvements and a handful of new features for the data cleaner.


๐Ÿ“ˆ 174.66 Punkte

๐Ÿ“Œ Czkawka 3.0.0 - Data cleaner written in Rust, now with hardlinking support, finding broken files, Mac GUI


๐Ÿ“ˆ 56.04 Punkte

๐Ÿ“Œ Czkawka 5.0 - my data cleaner, now using GTK 4 with faster similar image scan, heif images support, reads even more music tags


๐Ÿ“ˆ 44.36 Punkte

๐Ÿ“Œ Czkawka 5.0 - my data cleaner, now using GTK 4 with faster similar image scan, heif images support, reads even more music tags


๐Ÿ“ˆ 44.36 Punkte

๐Ÿ“Œ AMD's new chips bring better performance without needing more power


๐Ÿ“ˆ 41.4 Punkte

๐Ÿ“Œ Czkawka 5.1 - Unnecessary file cleaner of unnecessary files, with sorting, faster image comparsion, thread number selection


๐Ÿ“ˆ 41.02 Punkte

๐Ÿ“Œ Shell Design improvements, New icons, performance improvements and various new features and applications: Plasma Mobile sprint report


๐Ÿ“ˆ 40.67 Punkte

๐Ÿ“Œ Best and Simple Real-Time Validation Library Without Needing JavaScript Code


๐Ÿ“ˆ 33.94 Punkte

๐Ÿ“Œ Ubuntu 20.04: Switching from GUI to CLI, get a ghost cursor in the GUI


๐Ÿ“ˆ 33.83 Punkte

๐Ÿ“Œ Surface Headphones 2+ now certified for Microsoft Teams without needing a dongle


๐Ÿ“ˆ 32.17 Punkte

๐Ÿ“Œ ChatGPT in terminal without needing API keys


๐Ÿ“ˆ 32.17 Punkte

๐Ÿ“Œ Facebook Trains An AI To Navigate Without Needing a Map


๐Ÿ“ˆ 32.17 Punkte

๐Ÿ“Œ App Cleaner & Uninstaller 8.1 - Preview and remove applications and their service files (was App Cleaner).


๐Ÿ“ˆ 32.14 Punkte

๐Ÿ“Œ Pop!_OS will collaborate with Slint to offer it as an alternative toolkit for application development on Cosmic Desktop.


๐Ÿ“ˆ 31.89 Punkte

๐Ÿ“Œ Schlankes UI-Toolkit (fast) aus Berlin: Slint benรถtigt weniger als 300 KByte RAM


๐Ÿ“ˆ 31.89 Punkte

๐Ÿ“Œ Superpaper v1.1.1 released - the multi monitor wallpaper manager gets a Linux package, GUI, and UX & performance improvements


๐Ÿ“ˆ 30.61 Punkte

๐Ÿ“Œ Czkawka 1.0.0 - my new app written in GTK 3(Gtk-rs) and Rust for Linux to find duplicates, big files, empty folders etc.


๐Ÿ“ˆ 30.49 Punkte

๐Ÿ“Œ Microsoft Edge Dev channel update brings handful of new features, fixes


๐Ÿ“ˆ 30.44 Punkte

๐Ÿ“Œ Microsoft Edge Dev channel adds handful of new features, fixes


๐Ÿ“ˆ 30.44 Punkte

๐Ÿ“Œ Microsoft Edge Dev channel picks up loads of fixes, handful of new features


๐Ÿ“ˆ 30.44 Punkte

๐Ÿ“Œ The April Xbox Update is rolling out today with a handful of new features


๐Ÿ“ˆ 30.44 Punkte

๐Ÿ“Œ Czkawka 2.0.0 - multi-thread support, similar images and music finder and much more!


๐Ÿ“ˆ 30.28 Punkte

๐Ÿ“Œ Czkawka 3.1.0 - new version of my GTK app to find duplicates, similar images, same music, broken files etc.


๐Ÿ“ˆ 28.71 Punkte

๐Ÿ“Œ 'Say hello to my little vacuum cleaner!' US drug squad puts spycams in cleaner's kit


๐Ÿ“ˆ 28.58 Punkte

๐Ÿ“Œ 'Say hello to my little vacuum cleaner!' US drug squad puts spycams in cleaner's kit


๐Ÿ“ˆ 28.58 Punkte

๐Ÿ“Œ AnyMP4 iOS Cleaner 1.0.18 - iOS cleaner.


๐Ÿ“ˆ 28.58 Punkte

๐Ÿ“Œ Czkawka 4.1.0 - Fast duplicate finder, with finding invalid extensions, faster previews, builtin icons and a lot of fixes


๐Ÿ“ˆ 28.5 Punkte

๐Ÿ“Œ Terminal widget toolkit FINAL CUT 0.9.0 released โ€“ performance improvements and new features


๐Ÿ“ˆ 27 Punkte











matomo