A module to crawl zerome hubs.

Given hub A and B, we say A links to B iff there exists a user in A follows any user in B.

To crawl a hub for links to other hubs:

// Assuming the current directory is zeronet data dir (`zeronet.py --data_dir`).
Hub hub = crawl("HUB_ID");

To crawl a hub in /path/to/data/dir recursively (also crawl hubs hub links to):

Hubs hubs = crawl("HUB_ID", parsePath("/path/to/data/dir"),

Only one level of recursion is supported. And hubs you are not seeding will be skipped.

Hub is an Entry, and Hubs is a Set<Hub>, for accurate definition, see the aliases section below.

This module can also be used as a command line tool.

For example, assuming current directory is ZeroNet data direcotry, crawl RedHub:

java -jar zerome-crawler.jar --hub 1RedkCkVaXuVXrqCMpoXQS29bwaqsuFdL

Crawl all hubs, the data dir is /var/zeronet/data:

java -jar zerome-crawler.jar --all --data_dir /var/zeronet/data

List all hubs where at least one user has registered.

java -jar zerome-crawler.jar --all --list-only

Backup all seeding hubs where at least one user has registered, assuming the current directory is data_dir:

git init
java -jar zerome-crawler.jar --all -1 --seeding | xargs git add
# `1` lists one hub_id per line. It implies `list-only`.
git commit -m 'Auto snapshot of ZeroMe Hubs.'
Platform: Java
By: Jakukyo Friel
License: 0BSD
Packages
io.github.weakish.zeronet.zerome.crawler

Given a hub, produce a category of hubs with Link(A, B) as the morphism.

Dependencies
ceylon.collection1.2.2
ceylon.file1.2.2
ceylon.json1.2.2
ceylon.logging1.2.2
ceylon.test1.2.2
java.base7

Given a hub, produce a category of hubs with Link(A, B) as the morphism.

Link(A,B) is the existence of user a in A following user b in B.

Aliases
Hubshared Hub=> String->[HubMeta, HubLinks]

hub_id -> [HubMeta, HubLinks]

shared HubLinks=> HashSet<String>

A set of hub IDs.

HubMetashared HubMeta=> Map<String,String?>

title, description, zeronet_version

Hubsshared Hubs=> Set<Hub>

A set of Hubs.

Functions
crawlshared Hub|Hubs|Null crawl(String hub_id, Directory data_dir, Boolean recursive = false)

Gluing code or the real entry point. Returns Null when hub is not seeded or fail to parse content.json of hub.

Parameters:
  • recursive = false
crawl_allshared HubLinks|Hubs crawl_all(Directory data_dir, String user_registry, Boolean list_only)

Returns

- [[HubLinks]] when `list_only` is true;
- {[[Hub]]*} when `list_only` is false.
crawl_all_hubsshared HubLinks crawl_all_hubs(Directory data_dir, String user_registry)

Crawl zerome user register for hubs.

shared HubLinks crawl_links(Path hub_id_path)

Given a hub site path, returns all hub IDs whose users are followed by the given hub.

See also HubLinks
get_followshared JsonArray? get_follow(JsonObject user)

Returns non empty follow array.

get_json_stringshared String? get_json_string(JsonObject json, String key)

Given a json object and a key, returns object[key] iff object[key] is a string. Returns null when key not found or object[key] is not a string.

get_userdb_dirshared Directory get_userdb_dir(Directory data_dir, String user_registry)

Returns data/REGISTRY_ID/data/userdb/.

Throws
  • FileNotFoundException

    if directory not found

jsonifyshared JsonArray|JsonObject jsonify(Hubs|Hub|HubLinks|Null hub)

Returns JsonObject for Hub and JsonArray of JsonObject for Hubs.

load_json_objectshared JsonObject? load_json_object(File file)

Returns null if parsed result of file is not JsonObject.

metashared HubMeta meta(JsonObject content_json)

Parse content.json of a hub for meta data.

See also HubMeta
read_fileshared String read_file(File file)

Read the file whole.

resolve_directoryshared Directory? resolve_directory(Link link)

Resolve a link to direcotry.

resolve_fileshared File? resolve_file(Link link)

Resolve a link to direcotry.

resolve_path_to_directoryshared Directory? resolve_path_to_directory(Path path)

Resolve a path to a direcotry.

resolve_path_to_fileshared File? resolve_path_to_file(Path path)

Resolve a path to a flie.

runshared void run()

The ultimate exception handler.

Exceptions
LoadJsonFailshared LoadJsonFail

When failed to parse a string as json object.

UsageErrorshared UsageError

Command line usage error, e.g. missing argument for option.