Skip to content

Testing Your Playwright Scraper

To ensure your Playwright scraper is effectively fortified against bot detection, it’s essential to test it using fingerprinting tools.

Earlier, we demonstrated using websites like bot.sannysoft.com to test your Playwright code stealthiness.

Another excellent tool for this purpose is bot.incolumitas.com, which provides a comprehensive analysis of your browser's fingerprint and highlights potential leaks that might reveal your browser identity.

1. Without Fortification

First, let's look at a basic Playwright script without any fortification techniques. We will test this script using the tools mentioned to get a .behavioralClassificationScore

This score will allow us to determine if we are being detected as human or a bot.

const playwright = require("playwright");

function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

/**
 * This is obviously not the best approach to
 * solve the bot challenge. Here comes your creativity.
 *
 * @param {*} page
 */
async function solveChallenge(page) {
  // wait for form to appear on page
  await page.waitForSelector("#formStuff");
  // overwrite the existing text by selecting it
  // with the mouse with a triple click
  const userNameInput = await page.$('[name="userName"]');
  await userNameInput.click({ clickCount: 3 });
  await userNameInput.type("bot3000");
  // same stuff here
  const emailInput = await page.$('[name="eMail"]');
  await emailInput.click({ clickCount: 3 });
  await emailInput.type("[email protected]");
  await page.selectOption('[name="cookies"]', "I want all the Cookies");
  await page.click("#smolCat");
  await page.click("#bigCat");
  // submit the form
  await page.click("#submit");

  // handle the dialog
  page.on("dialog", async (dialog) => {
    console.log(dialog.message());
    await dialog.accept();
  });

  // wait for results to appear
  await page.waitForSelector("#tableStuff tbody tr .url");
  // just in case
  await sleep(100);

  // now update both prices
  // by clicking on the "Update Price" button
  await page.waitForSelector("#updatePrice0");
  await page.click("#updatePrice0");
  await page.waitForFunction(
    '!!document.getElementById("price0").getAttribute("data-last-update")'
  );

  await page.waitForSelector("#updatePrice1");
  await page.click("#updatePrice1");
  await page.waitForFunction(
    '!!document.getElementById("price1").getAttribute("data-last-update")'
  );

  // now scrape the response
  let data = await page.evaluate(function () {
    let results = [];
    document.querySelectorAll("#tableStuff tbody tr").forEach((row) => {
      results.push({
        name: row.querySelector(".name").innerText,
        price: row.querySelector(".price").innerText,
        url: row.querySelector(".url").innerText,
      });
    });
    return results;
  });

  console.log(data);
}

(async () => {
  const browser = await playwright["chromium"].launch({
    headless: false,
    args: ["--start-maximized"],
  });
  const context = await browser.newContext({ viewport: null });
  const page = await context.newPage();

  await page.goto("https://bot.incolumitas.com/");

  await solveChallenge(page);

  await sleep(6000);

  const new_tests = JSON.parse(
    await page.$eval("#new-tests", (el) => el.textContent)
  );
  const old_tests = JSON.parse(
    await page.$eval("#detection-tests", (el) => el.textContent)
  );

  console.log(new_tests);
  console.log(old_tests);

  //await page.close();
  await browser.close();
})();

Test

As you can see from the result above, we get a of 22% human , indicating it is likely to be detected as a bot .behavioralClassificationScore

This low score reflects that the script's behavior does not closely mimic that of a real human user, making it more susceptible to bot detection mechanisms.

2. With Fortification

Now, let's see a fortified version of the Playwright script:

We will be implementing different enhancements to make the script's behavior appear more like that of a real human user.

Here are the key fortification techniques implemented:

  1. Random Mouse Movements :
  2. Adding random mouse movements simulates the unpredictable nature of human mouse behavior.
  3. This is achieved by moving the mouse cursor to random positions on the screen with slight pauses between movements.
  4. Delays Between Actions :
  5. Introducing delays between key presses, clicks, and dialog interactions mimics the natural time a human takes to perform these actions.
  6. This includes delays between typing characters, clicking elements, and handling dialog boxes.
  7. User Agent and Context Configuration :
  8. Setting a common user agent helps in blending the browser's identity with real user patterns.
  9. The browser context is configured with typical user settings, such as geolocation, permissions, and locale.
  10. Browser Launch Arguments :
  11. Modifying the browser launch arguments to disable features that can reveal automation, such as , , , , and .--disable-blink-features=AutomationControlled --disable-extensions--disable-infobars --enable-automation--no-first-run
  12. These arguments help make the browser appear more like a standard user browser.

Here's the fortified version of the Playwright script:

const playwright = require("playwright");

function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}

async function randomMouseMove(page) {
for (let i = 0; i < 5; i++) {
await page.mouse.move(Math.random() _ 1000, Math.random() _ 1000);
await sleep(100);
}
}

async function solveChallenge(page) {
await page.evaluate(() => {
const form = document.querySelector("#formStuff");
form.scrollIntoView();
});
await sleep(1000);

const userNameInput = await page.$('[name="userName"]');
await userNameInput.click({ clickCount: 3 });
for (let char of "bot3000") {
await userNameInput.type(char);
await sleep(500);
}

const emailInput = await page.$('[name="eMail"]');
await emailInput.click({ clickCount: 3 });
for (let char of "[email protected]") {
await emailInput.type(char);
await sleep(600);
}

await page.selectOption('[name="cookies"]', "I want all the Cookies");
await sleep(1200);
await page.click("#smolCat");

await sleep(900);
await page.click("#submit");

page.on("dialog", async (dialog) => {
console.log(dialog.message());
await sleep(2000); // add delay before accepting the dialog
await dialog.accept();
});

await page.waitForSelector("#tableStuff tbody tr .url");
await sleep(100);

await page.waitForSelector("#updatePrice0");
await page.click("#updatePrice0");
await page.waitForFunction(
'!!document.getElementById("price0").getAttribute("data-last-update")'
);
await sleep(1000);

await page.waitForSelector("#updatePrice1");
await page.click("#updatePrice1");
await page.waitForFunction(
'!!document.getElementById("price1").getAttribute("data-last-update")'
);
await sleep(800);

let data = await page.evaluate(function () {
let results = [];
document.querySelectorAll("#tableStuff tbody tr").forEach((row) => {
results.push({
name: row.querySelector(".name").innerText,
price: row.querySelector(".price").innerText,
url: row.querySelector(".url").innerText,
});
});
return results;
});

console.log(data);
}

(async () => {
const browser = await playwright["chromium"].launch({
headless: false,
args: [
"--start-maximized",
"--disable-blink-features=AutomationControlled",
"--disable-extensions",
"--disable-infobars",
"--enable-automation",
"--no-first-run",
],
});
const context = await browser.newContext({
viewport: null,
userAgent:
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
geolocation: { longitude: 12.4924, latitude: 41.8902 },
permissions: ["geolocation"],
locale: "en-US",
});
const page = await context.newPage();

await page.goto("https://bot.incolumitas.com/");

await randomMouseMove(page); // add random mouse movements before solving the challenge

await solveChallenge(page);

await sleep(6000);

const new_tests = JSON.parse(
await page.$eval("#new-tests", (el) => el.textContent)
  );
  const old_tests = JSON.parse(
    await page.$eval("#detection-tests", (el) => el.textContent)
);

console.log(new_tests);
console.log(old_tests);

await browser.close();
})();

Test

We’ve made several enhancements to a script that interacts with a webpage using the Playwright library. The goal was to make the script’s behavior appear more like that of a real human user on the webpage.

  • Firstly, we introduced delays between key presses and clicks to simulate the time a real user might take to perform these actions. We also added random mouse movements across the page to mimic a user’s cursor movements.
  • Secondly, we added a delay before accepting any dialog boxes that appear on the webpage, simulating the time a real user might take to read and respond to the dialog.
  • Lastly, we set a common user agent for the browser to make it appear more like a typical user’s browser.

These enhancements significantly improved the script’s ability to mimic a real user’s behavior. In our case, our comes out to be 70% human, which is quite good considering the simplicity of our script. behavioralClassificationScore

Our script is now less likely to be detected as a bot by the webpage’s bot detection mechanisms.

Please note that this is a basic approach, and some websites may have more sophisticated bot detection mechanisms. You might need to consider additional strategies like integrating residential and mobile proxies.

Comments