As a software engineer, automation is a natural extension of problem-solving when dealing with repetitive tasks. And in general, if you ever give me any task that involves a hint of repetition, you can expect me to automate it. Automating something complex to save time is both pleasing and rewarding.
Recently, I worked on a project to automate interactions with a XenForo-based forum, saving myself countless hours of manual work. While I won’t disclose the specifics of the forum, the project involved fetching posts, extracting URLs, and automating interactions such as “liking” posts via HTTP POST requests.
What’s the point? Some of the website’s content was only available to active users. This scratched an itch for me.
This post will delve into the technical process of building a bot that fetches forum pages, scrapes post data, and performs automated actions. The core of the project revolved around understanding how web requests work, maintaining session state, and sending authenticated POST requests in bulk to interact with multiple posts.
Problem: Automating Tedious Web Activity
The task was straightforward: I needed to interact with hundreds of posts across multiple pages in a XenForo forum. Manually navigating through the pages and clicking on posts to perform interactions was time-consuming and error-prone.
The solution was to develop a bot that could:
- Fetch forum pages to get a list of all posts.
- Scrape the necessary data from each post.
- Send automated POST requests to perform interactions with each post (such as “liking” them).
Approach: HTTP Requests and Sessions
To automate this process, I didn’t use a browser automation tool like Selenium. Instead, I opted for a more lightweight solution that involved direct HTTP requests. Here’s a breakdown of the development process:
1. Understanding the Forum’s Request Structure
XenForo forums, like many other web applications, follow a predictable structure for their HTTP requests. Each interaction—whether it’s navigating pages or liking posts—sends a POST request with specific parameters.
Using Firefox Developer Tools, I inspected the network traffic generated when I manually liked a post. The key part of the POST request looked like this:
_xfRequestUri=%2Fthreads%2Fsome-thread.12345%2Fpage-2&_xfWithData=1&_xfToken=xyz123&_xfResponseType=json
From this, I identified several important components:
_xfRequestUri
: The URL of the thread or post._xfToken
: A CSRF token used to validate the request._xfWithData
and_xfResponseType
: Standard XenForo parameters that control the format of the response.
Each “like” was essentially an HTTP POST request with these parameters, and the CSRF token (_xfToken
) was refreshed periodically.
The Firefox plugin Copy as Python Requests allowed me to translate manual requests with cookies directly to Python code.
2. Fetching Forum Pages and Extracting URLs
The first step was fetching forum pages to identify which posts I needed to interact with. Each page in the thread contained multiple posts, and I needed to scrape the page to extract the URLs of these posts.
Here’s a rough outline of the Python code I used to fetch the HTML of a page (without cookies etc.):
import requests
from bs4 import BeautifulSoup
# Set up a session to maintain cookies and headers
session = requests.Session()
# URL of the forum page
url = 'https://example.com/threads/some-thread.12345/page-1'
# Send GET request to fetch the page
response = session.get(url)
# Parse the HTML with BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Find all post elements
posts = soup.find_all('div', class_='message')
# Extract post IDs or URLs
post_urls = [post['data-url'] for post in posts]
This code:
- Opens an HTTP session to keep cookies (which includes session state).
- Fetches the HTML of the forum page.
- Uses BeautifulSoup to parse the page and extract the URLs or post IDs of each post.
3. Scraping Posts on Each Page
With the list of URLs or post IDs in hand, I could now automate the interaction process. Each post on a XenForo page had a specific element structure, and I needed to extract the relevant data to perform automated interactions.
# Function to scrape post details
def scrape_posts(page_url):
response = session.get(page_url)
soup = BeautifulSoup(response.text, 'html.parser')
# Find all post blocks on the page
posts = soup.find_all('article', class_='message')
post_data = []
for post in posts:
post_id = post['data-content']
user = post.find('h4', class_='message-author').get_text()
post_data.append({
'post_id': post_id,
'user': user
})
return post_data
# Scrape a specific page
page_posts = scrape_posts('https://example.com/threads/some-thread.12345/page-1')
This script allows me to:
- Extract the post ID and user data for each post.
- Collect all relevant posts on a page for later interaction.
4. Maintaining Session and Cookies
When interacting with the forum, maintaining the session was critical. XenForo uses cookies and session tokens to track authenticated users. Fortunately, the Python requests.Session()
object manages these automatically after login or initial page access.
# Login function to authenticate session
def login(session, username, password):
login_url = 'https://example.com/login'
login_data = {
'login': username,
'password': password,
'_xfToken': 'your_csrf_token_here'
}
session.post(login_url, data=login_data)
return session
# Example of logging in
session = login(session, 'my_username', 'my_password')
This way, the session remains authenticated as I move across different pages and posts, allowing the bot to send requests without constantly needing to log in.
5. Sending POST Requests to Interact with Posts
Now that I had the list of post IDs and the session was maintained, it was time to automate the “like” action. I emulated the manual process of liking a post by sending POST requests directly.
Here’s the code that handles sending a POST request for each post:
def like_post(post_id):
# URL and data payload for liking a post
like_url = f'https://example.com/posts/{post_id}/like'
post_data = {
'_xfToken': 'your_csrf_token_here',
'_xfRequestUri': f'/threads/some-thread/post-{post_id}',
'_xfWithData': '1',
'_xfResponseType': 'json'
}
# Send POST request to like the post
response = session.post(like_url, data=post_data)
if response.status_code == 200:
print(f'Successfully liked post {post_id}')
else:
print(f'Failed to like post {post_id}: {response.text}')
# Interact with all posts on the current page
for post in page_posts:
like_post(post['post_id'])
In this step:
- I dynamically built the URL for each post, appending the post ID.
- I constructed the POST request data with the required CSRF token and request URI.
- Each POST request was sent using the session, ensuring the bot was authenticated.
6. Delays to Mimic Human Interaction
To avoid being detected as a bot, I added randomized delays between requests to mimic human activity. Without these delays, rapid requests could trigger rate limits or anti-bot mechanisms on the forum.
import time
import random
# Add a delay between each request
for post in page_posts:
like_post(post['post_id'])
delay = random.uniform(1, 3) # Random delay between 1 and 3 seconds
time.sleep(delay)
Outcome: Efficient Web Automation
This bot saved me hours of manual work by automating the entire process of interacting with posts. The key takeaway is that by understanding how HTTP requests work, maintaining session state, and carefully managing tokens and delays, you can automate complex web tasks without the overhead of browser automation tools like Selenium.
Within just a couple of hours, the bot generated enough activity so that the forum considered my profile a trusted and experienced member, and I gained access to all locked content.
Whether you’re scraping data or automating repetitive tasks, building bots like this can greatly enhance your productivity while giving you more control over how the automation works.
If you’re interested in building similar automation for your projects, feel free to reach out, and I’d be happy to help guide you through the process!