A module to crawl zerome hubs.
Given hub A and B, we say A links to B iff there exists a user in A follows any user in B.
To crawl a hub for links to other hubs:
// Assuming the current directory is zeronet data dir (`zeronet.py --data_dir`). Hub hub = crawl("HUB_ID");
To crawl a hub in /path/to/data/dir
recursively (also crawl hubs hub links to):
Hubs hubs = crawl("HUB_ID", parsePath("/path/to/data/dir"),
Only one level of recursion is supported. And hubs you are not seeding will be skipped.
Hub
is an Entry
, and Hubs
is a Set<Hub>
,
for accurate definition, see the aliases section below.
This module can also be used as a command line tool.
For example, assuming current directory is ZeroNet data
direcotry,
crawl RedHub:
java -jar zerome-crawler.jar --hub 1RedkCkVaXuVXrqCMpoXQS29bwaqsuFdL
Crawl all hubs, the data dir is /var/zeronet/data
:
java -jar zerome-crawler.jar --all --data_dir /var/zeronet/data
List all hubs where at least one user has registered.
java -jar zerome-crawler.jar --all --list-only
Backup all seeding hubs where at least one user has registered,
assuming the current directory is data_dir
:
git init java -jar zerome-crawler.jar --all -1 --seeding | xargs git add # `1` lists one hub_id per line. It implies `list-only`. git commit -m 'Auto snapshot of ZeroMe Hubs.'
Packages | |
io.github.weakish.zeronet.zerome.crawler | Given a hub, produce a category of hubs with Link(A, B) as the morphism. |
Dependencies | ||
ceylon.collection | 1.2.2 | |
ceylon.file | 1.2.2 | |
ceylon.json | 1.2.2 | |
ceylon.logging | 1.2.2 | |
ceylon.test | 1.2.2 | |
java.base | 7 |
Given a hub, produce a category of hubs with Link(A, B) as the morphism.
Link(A,B) is the existence of user a in A following user b in B.
Aliases | |
Hub | shared Hub=> String->[HubMeta, HubLinks] hub_id -> [HubMeta, HubLinks] |
HubLinks | shared HubLinks=> HashSet<String> A set of hub IDs. |
HubMeta | shared HubMeta=> Map<String,String?> title, description, zeronet_version |
Hubs | shared Hubs=> Set<Hub> A set of |
Functions | |
crawl | shared Hub|Hubs|Null crawl(String hub_id, Directory data_dir, Boolean recursive = false) Gluing code or the real entry point.
Returns Null when Parameters:
|
crawl_all | shared HubLinks|Hubs crawl_all(Directory data_dir, String user_registry, Boolean list_only) Returns - [[HubLinks]] when `list_only` is true; - {[[Hub]]*} when `list_only` is false. |
crawl_all_hubs | shared HubLinks crawl_all_hubs(Directory data_dir, String user_registry) Crawl zerome user register for hubs. |
crawl_links | shared HubLinks crawl_links(Path hub_id_path) Given a hub site path, returns all hub IDs whose users are followed by the given hub. See also HubLinks |
get_follow | shared JsonArray? get_follow(JsonObject user) Returns non empty follow array. |
get_json_string | shared String? get_json_string(JsonObject json, String key) Given a json object and a key, returns object[key] iff object[key] is a string. Returns null when key not found or object[key] is not a string. |
get_userdb_dir | shared Directory get_userdb_dir(Directory data_dir, String user_registry) Returns Throws
|
jsonify | shared JsonArray|JsonObject jsonify(Hubs|Hub|HubLinks|Null hub) |
load_json_object | shared JsonObject? load_json_object(File file) Returns null if parsed result of file is not JsonObject. |
meta | shared HubMeta meta(JsonObject content_json) Parse content.json of a hub for meta data. See also HubMeta |
read_file | shared String read_file(File file) Read the file whole. |
resolve_directory | shared Directory? resolve_directory(Link link) Resolve a link to direcotry. See also resolve_file() |
resolve_file | shared File? resolve_file(Link link) Resolve a link to direcotry. See also resolve_directory() |
resolve_path_to_directory | shared Directory? resolve_path_to_directory(Path path) Resolve a path to a direcotry. See also resolve_directory() |
resolve_path_to_file | shared File? resolve_path_to_file(Path path) Resolve a path to a flie. See also resolve_file() |
run | shared void run() The ultimate exception handler. |
Exceptions | |
LoadJsonFail | shared LoadJsonFail When failed to parse a string as json object. |
UsageError | shared UsageError Command line usage error, e.g. missing argument for option. |