Moving from Goodreads to my own website
One of my goals with this website is to maintain a public list of books I read, with ratings, reviews and highlights.
I have rated books on Goodreads for a few years, but I have grown tired of the platform. Additionally, it seems odd to me to store this personal information with a 3rd party through an interface I have no control over.
My problem is that I have 124 books with ratings and reviews on Goodreads, but none on my website. Recreating the list by hand would take a lot of time. But more importantly, it would be repetitive and boring.
So I decided to try to do this in an automated way.
Exporting from Goodreads
After some Googling I found a great blogpost by Tom MacWright, who has already done the very same thing. He pointed out that Goodreads offers a CSV export option, and he even provided a script to convert the CSV to front-matter for static site generators.
I use Eleventy for this website, so this was a perfect solution. I exported my Goodreads book to a CSV file, modified the script a little bit, and generated all the ratings and reviews.
Obtaining book covers
After doing this, there is one thing from Goodreads which I miss: the book covers. They are not provided in the CSV, so I wrote a script to save them from my book list on Goodreads.
If you want to do the same, here are the steps I took (this is a NodeJS script):
1. Save the book list
Goodreads uses JavaScript to provide infinite scrolling on the book list, so a simple HTTP request scraper would not get a list of all the books. To keep things simple, open the book list in your browser, scroll all the way to the bottom, and then save the page as a HTML file.
2. Prepare the script
In the same folder as your saved HTML file, install the required NodeJS packages and then create a JavaScript file to write the script in:
npm install got cheerio node-fetch
subl scrape.js
got
and node-fetch
make HTTP requests and cheerio
lets you use jQuery-like syntax on the HTML response.
subl
is a command to create and open a file in Sublime Text, but you can use any text editor you want.
3. Write the script
// Require packages
const cheerio = require('cheerio')
const fetch = require('node-fetch');
const fs = require('fs')
const got = require('got');
// Load the HTML file
let html = fs.readFileSync('./booklist.html', 'utf8');
let $ = cheerio.load(html);
// Create an array of links to the page for each book
let books = $('.title > .value > a').get();
// Open each link and save the book cover to disk
for (let i = 0; i < books.length; i++) {
let response = await got('https://goodreads.com'
+ books[i].attribs.href)
$ = cheerio.load(response.body)
let title_full = $('#bookTitle').text().trim();
let cover = $('#coverImage').get();
let coverUrl = cover[0].attribs.src;
let title_first = title_full.split(':')[0]
let filename = title_first
.trim()
.replace(/[\s]+/g, '-')
.replace(/[^A-Z0-9-]/gi, '')
.toLowerCase()
.split('-')
.slice(0, 5)
.join('-');
const coverImage = await fetch(coverUrl);
const buffer = await coverImage.buffer();
fs.writeFile(`./covers/${filename}.jpg`, buffer, () =>
console.log('Saved a book cover'));
}
Now I have the book covers as well! The filename code is the same as for the (slightly modified) script I linked to above. This ensures that the cover filenames match the filenames generated from the CSV. That should make it easy to display the cover automatically, both in the book list and when opening an invididual book review.
Next steps
I am happy with the progress so far. To consider the book list complete, however, I still want to complete the following tasks:
- Review the generated data from the two scripts above, correcting conversion errors (e.g. with non-english characters).
- Redesign the list and book review layouts so they display the book covers (I am not using them yet)
- Add my Kindle notes and highlights below each book review (this could be time-consuming as the website seems to be hard to scrape)
More on that later!
Newer: Building a Startup Factory
Older: Remapping keys on macOS