The advent of machine learning algorithms in publishing ushered the era of online book recommendations. First there was Goodreads, and then came Amazon. And now, there’s Tertulia, which scrapes an excessive amount of public data to recommend books to its users. There are also others out there that function similarly, be it an app or a website. However, even with their prevalence these days, algorithms still make plenty of mistakes, and you would think twice about using it when you dig deeper and discover its limitations.
But before we get to that, let’s find out how this technology works.
Machine learning systems called recommender systems, or recommendation systems, use data to assist users in finding new products and services. These algorithms power the suggestions that we likely consume, like those of TikTok “For You” page, YouTube video recommendations, Spotify playlists, Netflix films and TV series suggestions, Amazon product recommendations, Goodreads book recommendations, and other similar services.
These algorithms, however, need a decent amount of data to choose a recommendation strategy in order to produce meaningful and personalized recommendations. This data may include past purchase histories, contextual data, business-related data, user profile-based information about products, or content-based information. Then, all of these are combined and analyzed using artificial intelligence models so that the recommender system can predict what similar users will do in the future.
A user can get a recommendation by way of what they call “collaborative filtering.” Most recommender systems frequently start off without any data, because this type of filtering needs a lot of information to produce useful suggestions. For instance, the system must wait for someone to watch a number of videos on YouTube before making the right recommendations to the user.
Perhaps the more accurate method of handling this is “content-based filtering,” which is a real-time analysis of a user’s behavior. It considers product characteristics such size, description, color, and price point. The algorithm then presents similar products that are more likely to be added to the cart and bought. You may notice this when you see “products you may like” on a checkout page.
In order to more accurately predict which books readers will like to read next, Goodreads says its recommendation engine combines many proprietary algorithms and claims that it analyzes 20 billion data points. By examining how frequently books are found on the same shelves and if the same readers enjoyed them, it creates a map of the relationships between books. With its knowledge of books stored on a user’s shelves, Goodreads may determine how one user’s tastes differ from or are similar to those of other users.
Goodreads then may combine collaborative and content-based filtering techniques. Collaborative filtering draws on group knowledge to create suggestions based on users who share similar interests, while content-based filtering takes into account both book attributes and user attributes.
The fundamental objective of Goodreads’s recommender system is to get as many people to rate books as possible, allowing it to determine which books are the most popular and the kinds of books that readers would find interesting. For instance, if a user rates Atomic Habits by James Clear with five glowing stars, Goodreads may suggest another book by Clear or a book that is read by others who liked Atomic Habits.
Goodreads also records a plethora of information, including group interactions, conversations, Ask the Author activities, quizzes, and trivia. All of this information can be useful in developing a strong book recommendation system.
In a nutshell, Goodreads users make up a sizable portion of the data.
The limitations of content-based filtering include its inability to comprehend user interests beyond simple preferences. It knows some basic stuff about me, but that’s as far as it can get. What if it recommends a racist book? What if it recommends a book that might trigger readers without some heads-up? What if it recommends a book that is problematic? The keyword is nuance, and algorithms can’t tell the difference between two books that have similar stories.
In Book Riot’s very own Tailored Book Recommendations, a book recommendation service, bibliologists check if there is content contained that might be potentially sensitive to readers. There’s a lot of careful work being done behind the scenes, and this level of sophistication cannot be matched by a machine learning system.
Content-based filtering also suggests products based on how closely the descriptions and features match up, and they also take the user’s prior purchases into account. However, that creates a “filter bubble” and an “echo chamber” that ignores the user’s interests by suggesting products that are similar to those they already consumed. When it comes to books, if I rated a book by a white author with five stars, the system may lock me in that bubble by keeping on recommending me more white authors; I won’t get exposed to authors from marginalized backgrounds.
Meanwhile, algorithms need user data to suggest products, and there is a usual issue with collaborative filtering: a “cold start.” Content-based filtering doesn’t have this issue because it only needs user preference and product information. But with collaborative filtering, it can be difficult to recommend something useful to new users, because there’s no existing well of data to tap from. After signing up to a service, the algorithm takes time to learn one’s reading habits, remember preferences, analyze tastes, and so on. To be able to reach its full potential, it needs a gold mine of data to pull from, so it won’t give accurate recommendations for some time — if it ever does.
Goodreads faces this problem but offers a solution, too. To improve its recommendation algorithm, it wants you to do a lot of things, such as rating books, updating your favorite genres, and creating shelves. But that’s just simply admitting that they need a human oversight to intervene and to actually make things work.
Lastly, collaborative filtering systems restrict the recommendations of unrated items, such as new and obscure ones, to those with distinctive and specific tastes. That means that, for most users, new and unheard-of books won’t probably get recommended much because they aren’t rated yet. R.I.P. book discovery.
Machine learning algorithms are a subset of AI, but let’s dive further into the other subsets.
ChatGPT, a generative AI using “neural networks,” has already disrupted many industries, including publishing. It can do a lot of impressive things, so naturally, many have tried asking it for book recommendations. At first, they’re awe-struck with how good it seemed in making suggestions. But upon closer inspection, it actually spews errors, such as making up author or book names. On this Reddit post, many were disappointed by how bad the recommendations were that some suggested they ask a librarian instead. Another Reddit user also made such a request on ChatGPT, but they were also disappointed by errors in author names, and the glorified chatbot kept repeating a book title even though it was specifically told not to do it.
These incidents underscore the reality that ChatGPT is really great at bullshitting when, in fact, it doesn’t really know what it’s saying. And if you insist on using it to ask for books to read, just know that it has not been fed data from October 2021 onward, so the books released from that time period won’t be mentioned at all. And here’s more bad news: Since AI tools like ChatGPT have been fed over and over with English-language content, “[they] may disproportionately offer the preferences of the English-speaking internet.” That means it’s skipping a lot of great books from other languages and other countries.
Will all the pitfalls of algorithms — and AI in general — it seems like nothing beats book recommendations done by an actual human being. They are more accurate and more personal. Most of all, you can also find hidden gems that you really like rather than the bestsellers (and what everyone’s reading) that these machine learning systems always spit out.
With that said, you may want to check out TBR.co for personalized recommendations.