A client to zhihu.com API.

Currently it supports readonly APIs of zhuanlan.zhihu.com.

For example, column(blog) info and posts count of https://zhuanlan.zhihu.com/wooyun:

String columnName = "wooyun";
String columnInfo = getColumn(columnName);
if (is JsonObject column = parseJson(columnInfo)) {
    print(postsCount(column));
}

Get all posts from a column (assuming it has 42 posts):

String posts = getPosts(columnName, 42);

There are also some fetch* functions as usage examples.

This module can also be used as a command line tool to backup a column:

java -jar zhihu.jar COLUMN_NAME

It will download column info, all posts with comments as json files. Also, it will fetch avatar, title and in post images.

Currently it uses a naive strategy to deal with name collosion: rename it with SHA256 postfix.

Possible clauses of name collosion:

  • images in posts have same file name, e.g.

    • http://hostA/same.jpg and http://hostB/same.jpg
    • http://same/dirA/same.jpg and http://same/dirB/same.jpg
  • rerun zhihu.jar

Also, incremental backup with reruning is not implemented yet.

Platform: Java
Packages
io.github.weakish.zhihu

Readonly API client of zhuanlan.zhihu.com

Dependencies
ceylon.collection1.3.3
ceylon.file1.3.3
ceylon.http.client1.3.3
ceylon.http.common1.3.3
ceylon.json1.3.3
ceylon.logging1.3.3
ceylon.process1.3.3
ceylon.random1.3.3
ceylon.regex1.3.3
ceylon.test1.3.3
ceylon.time1.3.3
ceylon.uri1.3.3
de.dlkw.ccrypto.svc0.0.2

Readonly API client of zhuanlan.zhihu.com

Currently it supports:

  • column info
  • all posts
  • comments for post
  • comments for all posts

zhuanlan.zhihu.com also has an endpoint of a single post, consist of

 * lastestLikers: I do not know a way to get all likes
 `/api/posts/SLUG/{like,likes}` both get 404.
 * previous and next post without latestetLikes
 * Note there is no comments.

So except for latest likeers, it does not provide more information than posts. This API endpoint is not implemented. Instead, an API to (all) likes may be implemented in future.

Functions to fetch avatar images, title images, and images in post content are also provided.

Aliases
Tasksshared Tasks=> ArrayList<Process>

task pool

Functions
avatarUrlshared Uri? avatarUrl(JsonObject json)

Parse columnInfo or creator for avatar url.

checkTasksshared Tasks[2] checkTasks(Tasks tasks)

Check tasks, return a tuple of [Unfinished, Failed] tasks.

columnshared Uri column(String columnName)

Return full column url.

commentsshared Uri comments(Integer slug)

Return url of comments of a single post.

contentshared String content(JsonObject post)

Extract post content.

contentImageshared {String*} contentImage(String content)

Parse post content for image urls. Also supports special syntax of zhihu hosted images.

contentsshared {String*} contents(JsonArray posts)

Map content(post) over posts.

fetchAvatarFilesshared Tasks fetchAvatarFiles(Uri? avatar, Uri? creatorAvatar)

Non blocking.

fetchColumnInfoshared String fetchColumnInfo(String columnName)

Returns column info json string.

fetchCommentsshared void fetchComments(JsonArray posts)

Fetch comments json files, named `POST_SLUG_comments.

fetchContentImagesshared {[Integer, String]*} fetchContentImages(JsonArray posts)

Return a list of [exitCode, failedUrl].

Content images may be hosted outside zhihu, which may cause file name collosion. Name collosion is simply resolved as file name already exist, i.e. rename with fileName_SHA256.

fetchFileshared Process fetchFile(Uri url)

Fetch file via wget, non blocking.

fetchFileSyncshared [Integer, String] fetchFileSync(Uri url)

Like fetchFile() but blocking. Return exit code and url.

fetchPostsshared String? fetchPosts(Integer? count, String columnName)

Returns a json string and saves to COLUMN_NAME_posts.json.

fetchTitleImagesshared Tasks fetchTitleImages(JsonArray posts)

Fetch title images, non blocking.

getColumnshared String getColumn(String name)

Non ascii characters are always 16 bit encoded, e.g. \ud7ff, even with request header Accept-Charset: utf-8.

getCommentsshared String? getComments(Integer slug)

Returns comments json string of a single post.

getContentshared String getContent(Uri url, Boolean redirected = false)

Given a Uri, get Response content, following one direct.

Parameters:
  • redirected = false
Throws
  • Exception

    when finally getting non 200

getJsonValueshared Value getJsonValue(JsonObject json, String key)

Returns JsonObject[key] else null.

getPostsshared String getPosts(String name, Integer limit = ...)

Get posts belong to a column, including post content, without comments.

The result is an array of post entries. Every post entry has a post id, i.e. Integer slug. Every post also has three links:

   - `"url": "/u/SLUG"` 301 to `/p/SLUG` (html url)
   - `"comments": "/api/posts/SLUG/comments"` comments
   - `"href": "/api/posts/SLUG" to a single post consist of
       * lastestLikers: I do not know a way to get all likes
               `/api/posts/SLUG/{like,likes}` both get 404.
       * previous post without latestetLikes
       * next post without latestLikes
       * Note there is no comments.
           Use [[getComments]] to fetch commments.
Parameters:
  • limit = 10
parseColumnInfoshared [Integer?, Uri?, Uri?] parseColumnInfo(String columnInfo)

Parse columninfo json string for postsCount, urls of avatar and creatorAvatar.

postsshared Uri posts(String name, Integer limit = ...)

Return posts url.

Parameters:
  • limit = 10
Throws
  • InvalidTypeException

    limit <= 0

postsCountshared Integer postsCount(JsonObject json)

Given a column info, returns postsCount.

Throws
  • KeyNotFound

    when there is no postsCount field

  • InvalidTypeException

    when postsCount does not have an Integer value

runshared void run()

Run the module io.github.weakish.zhihu.

slugshared Integer slug(JsonObject post)

Zhihu uses Integer for post slug.

Throws
  • InvalidTypeException

    when thre is no slug field in post

slugsshared {Integer*} slugs(JsonArray posts)

Map slug(post) over posts.

titleImageUrlshared Uri? titleImageUrl(JsonObject post)

titleImage may be an empty string (““).

titleImagesUrlsshared {Uri*} titleImagesUrls(JsonArray posts)

Skip posts which can not be parsed as JsonObject, or does not contain title Image.

urlifyshared Uri urlify(String imagePath, Integer cdnNumber = ...)

Urlify images hosted by zhihu CDN.

Parameters:
  • cdnNumber = randomCdnNumber()
writeFileshared void writeFile(String content, Path filePath)

Write a String to a filePath, if filePath already exist, write to filePath_SHA256

Exceptions
KeyNotFoundshared KeyNotFound

When not satisfied with just returnning null.