Patreon seem to find it acceptable to do a whole bunch of security-through-obscurity measures on their site that verge on DRM. This is a symptom of the times, really. They don't provide an API for users, although they do provide an API for creators. Let's try to scrape MP3s.

<a data-tag="post-file-download"
   href="https://c10.patreonusercontent.com/LONGSTRING" class="sc-fzpans kmqqXw">Myfile.mp3</a>

This is what the link for downloads looks like. We are after the URL in the href attribute, but we won't be able to get it without providing our cookie. A query string parameter token-hash and token-time are included in the URL. This means that it's actually possible to download these files without sending any cookie, as the authentication details are already encoded in the URL.

So that means that the problem reduces to the following:

  • Expanding the infinite scroll of the site
  • Find all links and correlate them with some metadata

Expand the infinite scroll

Start out at the URL that has a tags filter that matches what you want.

https://www.patreon.com/somecreator/posts?filters%5Btag%5D=Patron%20Only%20Audio

We look for the 'Load more' button and expand it.

<button class="sc-fzoiQi ivnVZu" tabindex="0" type="button">...</button>

Patreon uses some stupid minified CSS classes, so I don't know how stable this identifier is. In this case, ivnVZu is the relevant class, as you can confirm by testing the output of document.querySelectorAll in the inspector.

function expandScroll() {
    const button = document.querySelector('.ivnVZu');
    if (button) {
        console.log("click");
        button.click();
        window.setTimeout(expandScroll, 30000);
    } else {
        console.log("terminating expansion");
    }
}

The site is so ungodly slow that this can hang Chrome quite easily. I suggest either having an unrealistic amount of patience, or pausing script execution in the "Sources" tab in the inspector after a while.

Scrape the URLs

Scraping the URLs is a bit more interesting. We need two pieces of information:

<a data-tag="post-published-at" href="/posts/some-post-12345" class="sc-fzpans bwkMGo">May 16 at 9:28am</a>

There may be more information in the show title but I don't think that we can guarantee uniqueness. And the previously mentioned URL.

From some experimentation in the inspector, I can tell that the common ancestor of these elements is the following: <div data-tag="post-card">...</div>

This at least is a bit semantic.

Let's scrape:

var postNodes = document.querySelectorAll('[data-tag="post-card"]');
var data = Array.from(postNodes).map(el => {
    const publishedAt = el.querySelector('[data-tag="post-published-at"]');
    const postFileDownload = el.querySelector('[data-tag="post-file-download"]');

    return {
        publishedAt: publishedAt.textContent,
        href: postFileDownload.getAttribute('href')
    };
});

Now you can console.log(JSON.stringify(data)) and paste it into a text editor, you can save it as exported.json. Using that JSON data you can use this Python script to download the files into timestamped files:

import json
import requests
import unicodedata
import re

with open('exported.json', 'r') as f:
    data = json.load(f)

def slugify(value):
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^:\w\s-]', '', value).strip().lower()
    return re.sub('[-\s]+', '-', value)


for rec in data:
    timestamp = rec['publishedAt']
    href = rec['href']
    filename = "{}.mp3".format(slugify(timestamp))
    print(filename)
    r = requests.get(href, stream=True, timeout=30)

    with open(filename, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=128):
            fd.write(chunk)