A client to zhihu.com API.
Currently it supports readonly APIs of zhuanlan.zhihu.com.
For example, column(blog) info and posts count of https://zhuanlan.zhihu.com/wooyun
:
String columnName = "wooyun"; String columnInfo = getColumn(columnName); if (is JsonObject column = parseJson(columnInfo)) { print(postsCount(column)); }
Get all posts from a column (assuming it has 42 posts):
String posts = getPosts(columnName, 42);
There are also some fetch*
functions as usage examples.
This module can also be used as a command line tool to backup a column:
java -jar zhihu.jar COLUMN_NAME
It will download column info, all posts with comments as json files. Also, it will fetch avatar, title and in post images.
Currently it uses a naive strategy to deal with name collosion: rename it with SHA256 postfix.
Possible clauses of name collosion:
images in posts have same file name, e.g.
http://hostA/same.jpg
and http://hostB/same.jpg
http://same/dirA/same.jpg
and http://same/dirB/same.jpg
rerun zhihu.jar
Also, incremental backup with reruning is not implemented yet.
Packages | |
io.github.weakish.zhihu | Readonly API client of zhuanlan.zhihu.com |
Dependencies | ||
ceylon.collection | 1.3.3 | |
ceylon.file | 1.3.3 | |
ceylon.http.client | 1.3.3 | |
ceylon.http.common | 1.3.3 | |
ceylon.json | 1.3.3 | |
ceylon.logging | 1.3.3 | |
ceylon.process | 1.3.3 | |
ceylon.random | 1.3.3 | |
ceylon.regex | 1.3.3 | |
ceylon.test | 1.3.3 | |
ceylon.time | 1.3.3 | |
ceylon.uri | 1.3.3 | |
de.dlkw.ccrypto.svc | 0.0.2 |
Readonly API client of zhuanlan.zhihu.com
Currently it supports:
zhuanlan.zhihu.com also has an endpoint of a single post, consist of
* lastestLikers: I do not know a way to get all likes `/api/posts/SLUG/{like,likes}` both get 404. * previous and next post without latestetLikes * Note there is no comments.
So except for latest likeers, it does not provide more information than posts
.
This API endpoint is not implemented.
Instead, an API to (all) likes may be implemented in future.
Functions to fetch avatar images, title images, and images in post content are also provided.
Aliases | |
Tasks | shared Tasks=> ArrayList<Process> task pool |
Functions | |
avatarUrl | shared Uri? avatarUrl(JsonObject json) Parse |
checkTasks | shared Tasks[2] checkTasks(Tasks tasks) Check tasks, return a tuple of [Unfinished, Failed] tasks. |
column | shared Uri column(String columnName) Return full column url. |
comments | shared Uri comments(Integer slug) Return url of comments of a single post. |
content | shared String content(JsonObject post) Extract post content. Throws |
contentImage | shared {String*} contentImage(String content) Parse post content for image urls. Also supports special syntax of zhihu hosted images. |
contents | shared {String*} contents(JsonArray posts) Map content(post) over posts. |
fetchAvatarFiles | shared Tasks fetchAvatarFiles(Uri? avatar, Uri? creatorAvatar) Non blocking. |
fetchColumnInfo | shared String fetchColumnInfo(String columnName) Returns column info json string. |
fetchComments | shared void fetchComments(JsonArray posts) Fetch comments json files, named `POST_SLUG_comments. |
fetchContentImages | shared {[Integer, String]*} fetchContentImages(JsonArray posts) Return a list of Content images may be hosted outside zhihu, which may cause file name collosion.
Name collosion is simply resolved as file name already exist,
i.e. rename with |
fetchFile | shared Process fetchFile(Uri url) Fetch file via wget, non blocking. |
fetchFileSync | shared [Integer, String] fetchFileSync(Uri url) Like |
fetchPosts | shared String? fetchPosts(Integer? count, String columnName) Returns a json string and saves to |
fetchTitleImages | shared Tasks fetchTitleImages(JsonArray posts) Fetch title images, non blocking. |
getColumn | shared String getColumn(String name) Non ascii characters are always 16 bit encoded, e.g. |
getComments | shared String? getComments(Integer slug) Returns comments json string of a single post. |
getContent | shared String getContent(Uri url, Boolean redirected = false) Given a Uri, get Response content, following one direct. Parameters:
Throws
|
getJsonValue | shared Value getJsonValue(JsonObject json, String key) Returns JsonObject[key] else null. |
getPosts | shared String getPosts(String name, Integer limit = ...) Get posts belong to a column, including post content, without comments. The result is an array of post entries.
Every post entry has a post id, i.e. - `"url": "/u/SLUG"` 301 to `/p/SLUG` (html url) - `"comments": "/api/posts/SLUG/comments"` comments - `"href": "/api/posts/SLUG" to a single post consist of * lastestLikers: I do not know a way to get all likes `/api/posts/SLUG/{like,likes}` both get 404. * previous post without latestetLikes * next post without latestLikes * Note there is no comments. Use [[getComments]] to fetch commments. Parameters:
|
parseColumnInfo | shared [Integer?, Uri?, Uri?] parseColumnInfo(String columnInfo) Parse columninfo json string for postsCount, urls of avatar and creatorAvatar. |
posts | shared Uri posts(String name, Integer limit = ...) Return posts url. Parameters:
Throws
|
postsCount | shared Integer postsCount(JsonObject json) Given a column info, returns postsCount. Throws
|
run | shared void run() Run the module |
slug | shared Integer slug(JsonObject post) Zhihu uses Integer for post slug. Throws
|
slugs | shared {Integer*} slugs(JsonArray posts) Map slug(post) over posts. |
titleImageUrl | shared Uri? titleImageUrl(JsonObject post)
|
titleImagesUrls | shared {Uri*} titleImagesUrls(JsonArray posts) Skip posts which can not be parsed as JsonObject, or does not contain title Image. |
urlify | shared Uri urlify(String imagePath, Integer cdnNumber = ...) Urlify images hosted by zhihu CDN. Parameters:
|
writeFile | shared void writeFile(String content, Path filePath) Write a String to a filePath, if filePath already exist, write to filePath_SHA256 |
Exceptions | |
KeyNotFound | shared KeyNotFound When not satisfied with just returnning null. |