shattr
A shatag clone in Ceylon.
Status
Basic features implemented:
-
--lookup: Lookup for duplicates. -
--scrub: Recompute checksums to detect silent corruption. -
--tag: Compute new checksums for files that don't have one, or when it is outdated.
Why
shatag -rl is slow since it needs to query SQL database for every file.
shattr reads all SHA-256 checksums in memory instead.
Installation
With ceylon
If you have Ceylon installed, you can download the .car archive (< 4K) at
Releases and put it into Ceylon module repository.
If your ceylon is recent enough, you can package it to a jar file via ceylon --fat-jar.
Running with java -jar starts faster than ceylon run.
With java directly
If you have Java Runtime (7+) installed , but not Ceylon,
you can download the fat jar file (3.2M).
Compile manually
Clone this repository and run ceylon compile.
Tested with Ceylon 1.2+.
May work with older versions.
Usage
Lookup
$SHATTR_COMMAND -l PATH_TO_HASHLIST
will print status of files under the current directory.
N empty file
D duplicated file
U unique file
? unknown file (without `sha256` xattr, no read permission, etc)
$SHATTR_COMMAND is one of:
-
ceylon run io.github.weakish.shattrif usingCeylon; -
java -jar /path/to/io.github.weakish.shattr-0.2.0.jarif usingjavadirectly.
If PATH_TO_HASHLIST is not specified,
shattr will use ~/.shatagdb-hash-list.txt.
Hash list format
PATH_TO_HASHLIST is a text file,
containing all SHA256 hashes of known files, one per line.
For example, if using shatag with an sqlite3 backend,
PATH_TO_HASHLIST can be produced via:
sqlite3 -noheader -csv ~/.shatagdb "select hash from contents;" > hashlist.csvCustomize output
By default we use a git status style output.
You can change output format style with --format FORMAT.
FORMAT is one of git, inotifywait, and csv.
--format FORMAT should be specified before hash list file.
--format inotifywait
EMPTY empty file
DUMPLICATED duplicated file
UNIQUE unique file
UNKNOWN file (without `sha256` xattr, no read permission, etc)
--format csv
Like --format inotifywait, but separated with comma ,, with path name quoted.
EMPTY,"empty_file.txt"
UNIQUE,"A file containing spaces and ""double quotes"""
--format your_own
You need to write a formatting function typed String(Status, Path).
Then register it in command line option parsing code in run().
scrub/tag
$SHATTR_COMMAND -s
$SHATTR_COMMAND -t
Will compute checksums for all files under current directory (recursively).
Unlike shatag, -t will warn if checksum changes.
Contribute
Send pull requests at https://github.com/weakish/shattr.
Coding style
Prefer if . then . else . to . then . else .
We feel A then B else C is confusing.
Readers may think A then B else C is A ? B : C in other languages, but they are not the same:
-
A then B else Cis actually(A then B) else C:-
A then Bevaluates toBifAis notnull, otherwise evaluates tonull. -
X else Yevaluates toXifXis notnull, otherwise evaluates toY.
-
Thus the type of
BisT given T satisfies Object, i.e. requires to not benull.
I think if (A) then B else C is much cleaner.
Only use i++ to increase i.
y=i++ and y=++i is really confusing to me.
So I prefer to only uses i++ to increase i, e.g. in a while loop.
I think a meaningful evaluated value of i++ should be void
if the a programming language allows ++.
Same applies to i-- and --i.
Prefer functions to classes
We prefer to declare classes for new types (or type aliases).
Other
If you disagree the above, file an issue.
Send pull requests to add new coding style.
Please do not add formatting style such as use two spaces and closing braces on their own line.
Formatting style is unlikely to affect readability of code,
and can be auto adjusted via ceylon format.
License
0BSD.