Build a CloakBrowser Automation Workflow with Stealth Chromium, Persistent Profiles, and Browser Signal Inspection
In this tutorial, we discover CloakBrowser, a Python-friendly browser automation instrument that makes use of Playwright-style APIs inside a stealth Chromium setting. We start by establishing CloakBrowser, making ready the required browser binary, and resolving the widespread Colab asyncio loop subject by working the sync browser workflow in a separate employee thread. We then transfer by sensible automation steps, together with launching a browser, creating personalized browser contexts, inspecting browser-visible indicators, interacting with a native take a look at web page, saving session state, restoring localStorage, utilizing persistent browser profiles, capturing screenshots, and extracting rendered web page content material for parsing.
import os
import sys
import json
import time
import shutil
import base64
import subprocess
import concurrent.futures
from pathlib import Path
from datetime import datetime
from textwrap import dedent
def run_cmd(cmd, test=True, seize=False):
print(f"n$ {' '.be part of(cmd)}")
consequence = subprocess.run(
cmd,
test=test,
textual content=True,
stdout=subprocess.PIPE if seize else None,
stderr=subprocess.STDOUT if seize else None,
)
if seize and consequence.stdout:
print(consequence.stdout[:4000])
return consequence
print("Installing CloakBrowser and helper packages...")
run_cmd([
sys.executable, "-m", "pip", "install", "-q", "-U",
"cloakbrowser", "playwright", "pandas", "beautifulsoup4"
])
print("nInstalling Chromium runtime dependencies for Colab...")
attempt:
run_cmd([sys.executable, "-m", "playwright", "install-deps", "chromium"], test=False)
besides Exception as e:
print("Dependency installer warning:", repr(e))
from cloakbrowser import (
launch,
launch_context,
launch_persistent_context,
ensure_binary,
binary_info,
)
import pandas as pd
from bs4 import BeautifulSoup
from IPython.show import show, Image
WORKDIR = Path("/content material/cloakbrowser_advanced_tutorial")
WORKDIR.mkdir(mother and father=True, exist_ok=True)
SCREENSHOT_PATH = WORKDIR / "cloakbrowser_result.png"
STORAGE_STATE_PATH = WORKDIR / "storage_state.json"
PROFILE_DIR = WORKDIR / "persistent_profile"
print("nPreparing CloakBrowser binary...")
attempt:
ensure_binary()
besides Exception as e:
print("Binary setup warning:", repr(e))
print("nCloakBrowser binary information:")
attempt:
information = binary_info()
print(json.dumps(information, indent=2, default=str))
besides Exception as e:
print("Could not learn binary information:", repr(e))
We begin by putting in CloakBrowser, Playwright, pandas, and BeautifulSoup so the Colab setting has every little thing wanted for browser automation and consequence evaluation. We additionally set up Chromium runtime dependencies, import the principle CloakBrowser launch utilities, and outline the working paths for screenshots, storage state, and persistent profiles. We then put together the CloakBrowser binary and print its particulars to substantiate the browser engine is put in accurately earlier than working automation.
def make_data_url(html: str) -> str:
encoded = base64.b64encode(html.encode("utf-8")).decode("ascii")
return f"information:textual content/html;base64,{encoded}"
def print_section(title):
print("n" + "=" * 80)
print(title)
print("=" * 80)
def safe_close(obj, label="object"):
attempt:
if obj:
obj.shut()
besides Exception as e:
print(f"Warning whereas closing {label}: {e}")
def run_sync_browser_job_in_thread(fn, *args, **kwargs):
"""
Google Colab and Jupyter already run an asyncio occasion loop.
CloakBrowser at the moment exposes Playwright-style sync helpers comparable to:
- launch()
- launch_context()
- launch_persistent_context()
Playwright's sync API can't run inside an already-running occasion loop.
Therefore, we run the complete browser automation job inside a separate thread.
"""
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(fn, *args, **kwargs)
return future.consequence()
test_page_html = dedent("""
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>CloakBrowser Local Automation Lab</title>
<fashion>
physique {
font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
max-width: 900px;
margin: 40px auto;
padding: 24px;
line-height: 1.5;
background: #f7f7f7;
shade: #222;
}
.card {
background: white;
border-radius: 18px;
padding: 24px;
box-shadow: 0 8px 30px rgba(0,0,0,0.08);
margin-bottom: 18px;
}
label {
show: block;
margin-top: 12px;
font-weight: 600;
}
enter, textarea, button {
width: 100%;
box-sizing: border-box;
padding: 12px;
margin-top: 8px;
border: 1px stable #ccc;
border-radius: 12px;
font-size: 15px;
}
button {
cursor: pointer;
background: #111;
shade: white;
font-weight: 700;
}
pre {
background: #111;
shade: #00ff99;
padding: 16px;
overflow-x: auto;
border-radius: 12px;
}
</fashion>
</head>
<physique>
<div class="card">
<h1>CloakBrowser Local Automation Lab</h1>
<p>
This web page runs domestically from a information URL. We use it to examine browser-visible
properties and reveal Playwright-style interplay safely.
</p>
</div>
<div class="card">
<h2>Interaction Form</h2>
<label>Name</label>
<enter id="identify" placeholder="Type your identify right here">
<label>Message</label>
<textarea id="message" rows="4" placeholder="Type a brief message"></textarea>
<button id="submit">Submit Local Form</button>
<p id="standing">Waiting for interplay...</p>
</div>
<div class="card">
<h2>Browser Signals</h2>
<pre id="indicators"></pre>
</div>
<script>
async perform gatherSignals() {
const canvas = doc.createElement("canvas");
const gl = canvas.getContext("webgl") || canvas.getContext("experimental-webgl");
let webglVendor = null;
let webglRenderer = null;
if (gl) {
const debugInfo = gl.getExtension("WEBGL_debug_renderer_info");
if (debugInfo) {
webglVendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
webglRenderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
}
}
const indicators = {
title: doc.title,
userAgent: navigator.userAgent,
webdriver: navigator.webdriver,
platform: navigator.platform,
languages: navigator.languages,
language: navigator.language,
hardwareConcurrency: navigator.hardwareConcurrency,
deviceMemory: navigator.deviceMemory || null,
pluginsLength: navigator.plugins ? navigator.plugins.size : null,
chromeObjectPresent: typeof window.chrome === "object",
timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
display screen: {
width: display screen.width,
top: display screen.top,
colorDepth: display screen.colorDepth,
pixelDepth: display screen.pixelDepth
},
viewport: {
innerWidth: window.innerWidth,
innerHeight: window.innerHeight,
devicePixelRatio: window.devicePixelRatio
},
webglVendor,
webglRenderer,
localStorageWorks: (() => {
attempt {
localStorage.setItem("cloakbrowser_test", "okay");
return localStorage.getItem("cloakbrowser_test") === "okay";
} catch (e) {
return false;
}
})()
};
doc.getElementById("indicators").textContent = JSON.stringify(indicators, null, 2);
return indicators;
}
doc.getElementById("submit").addEventListener("click on", () => {
const identify = doc.getElementById("identify").worth;
const message = doc.getElementById("message").worth;
localStorage.setItem("tutorial_name", identify);
localStorage.setItem("tutorial_message", message);
doc.getElementById("standing").textContent =
`Saved domestically for ${identify}: ${message}`;
});
gatherSignals();
</script>
</physique>
</html>
""").strip()
TEST_PAGE_URL = make_data_url(test_page_html)
We outline helper features for creating information URLs, printing part headers, safely closing browser objects, and working synchronous browser jobs inside a separate thread. We use the thread wrapper as a result of Google Colab already runs an asyncio loop, and this prevents Playwright’s sync API from failing. We additionally create a secure native HTML take a look at web page that collects browser-visible indicators, helps type interplay, and shops take a look at values in localStorage.
def cloakbrowser_tutorial_job():
outcomes = {
"basic_launch": None,
"advanced_context": None,
"storage_restore": None,
"persistent_profile": None,
"rendered_extraction": None,
"static_parsing": None,
"errors": [],
}
print_section("1. Basic CloakBrowser launch")
browser = None
attempt:
browser = launch(
headless=True,
humanize=True,
args=[
"--no-sandbox",
"--disable-dev-shm-usage",
],
)
web page = browser.new_page()
web page.goto("https://instance.com", wait_until="domcontentloaded", timeout=60000)
outcomes["basic_launch"] = {
"title": web page.title(),
"body_preview": web page.locator("physique").inner_text(timeout=15000)[:300],
"url": web page.url,
}
print(json.dumps(outcomes["basic_launch"], indent=2))
besides Exception as e:
error = {
"part": "basic_launch",
"error": repr(e),
}
outcomes["errors"].append(error)
print(error)
lastly:
safe_close(browser, "primary browser")
print_section("2. Advanced context launch with customized browser context")
context = None
attempt:
context = launch_context(
headless=True,
humanize=True,
viewport={"width": 1365, "top": 768},
locale="en-US",
timezone_id="America/New_York",
color_scheme="mild",
extra_http_headers={
"Accept-Language": "en-US,en;q=0.9",
"X-Tutorial-Run": "cloakbrowser-colab",
},
args=[
"--no-sandbox",
"--disable-dev-shm-usage",
],
)
web page = context.new_page()
web page.goto(TEST_PAGE_URL, wait_until="domcontentloaded", timeout=60000)
web page.locator("#identify").fill("CloakBrowser Colab User")
web page.locator("#message").fill(
"We are testing secure native browser automation in Google Colab."
)
web page.locator("#submit").click on()
web page.wait_for_timeout(1000)
indicators = web page.consider("() => gatherSignals()")
status_text = web page.locator("#standing").inner_text()
web page.screenshot(path=str(SCREENSHOT_PATH), full_page=True)
context.storage_state(path=str(STORAGE_STATE_PATH))
outcomes["advanced_context"] = {
"status_text": status_text,
"indicators": indicators,
"screenshot_path": str(SCREENSHOT_PATH),
"storage_state_path": str(STORAGE_STATE_PATH),
}
print(json.dumps(outcomes["advanced_context"], indent=2, default=str))
besides Exception as e:
error = {
"part": "advanced_context",
"error": repr(e),
}
outcomes["errors"].append(error)
print(error)
lastly:
safe_close(context, "superior context")
print_section("3. Restore localStorage utilizing storage_state")
restored_context = None
attempt:
restored_context = launch_context(
headless=True,
humanize=True,
storage_state=str(STORAGE_STATE_PATH),
viewport={"width": 1365, "top": 768},
locale="en-US",
timezone_id="America/New_York",
args=[
"--no-sandbox",
"--disable-dev-shm-usage",
],
)
restored_page = restored_context.new_page()
restored_page.goto(TEST_PAGE_URL, wait_until="domcontentloaded", timeout=60000)
restored_values = restored_page.consider("""
() => ({
tutorial_name: localStorage.getItem("tutorial_name"),
tutorial_message: localStorage.getItem("tutorial_message"),
cloakbrowser_test: localStorage.getItem("cloakbrowser_test")
})
""")
outcomes["storage_restore"] = restored_values
print(json.dumps(restored_values, indent=2))
besides Exception as e:
error = {
"part": "storage_restore",
"error": repr(e),
}
outcomes["errors"].append(error)
print(error)
lastly:
safe_close(restored_context, "restored context")
We outline the principle tutorial job and start by launching CloakBrowser in headless mode to open a easy public web page and extract its title, physique preview, and URL. We then create a personalized browser context with viewport, locale, timezone, shade scheme, and customized headers to simulate a extra managed browser session. We work together with the native take a look at type, gather browser indicators, save a screenshot, retailer session state, and then take a look at whether or not localStorage could be restored in a recent context.
print_section("4. Persistent profile demonstration")
if PROFILE_DIR.exists():
shutil.rmtree(PROFILE_DIR)
attempt:
ctx1 = launch_persistent_context(
str(PROFILE_DIR),
headless=True,
humanize=True,
viewport={"width": 1280, "top": 720},
locale="en-US",
timezone_id="America/New_York",
args=[
"--no-sandbox",
"--disable-dev-shm-usage",
],
)
p1 = ctx1.new_page()
p1.goto(TEST_PAGE_URL, wait_until="domcontentloaded", timeout=60000)
p1.consider("""
() => {
localStorage.setItem("persistent_profile_demo", "saved_across_browser_restarts");
localStorage.setItem("persistent_profile_timestamp", new Date().toISOString());
}
""")
first_value = p1.consider("() => localStorage.getItem('persistent_profile_demo')")
ctx1.shut()
ctx2 = launch_persistent_context(
str(PROFILE_DIR),
headless=True,
humanize=True,
viewport={"width": 1280, "top": 720},
locale="en-US",
timezone_id="America/New_York",
args=[
"--no-sandbox",
"--disable-dev-shm-usage",
],
)
p2 = ctx2.new_page()
p2.goto(TEST_PAGE_URL, wait_until="domcontentloaded", timeout=60000)
second_value = p2.consider(
"() => localStorage.getItem('persistent_profile_demo')"
)
second_timestamp = p2.consider(
"() => localStorage.getItem('persistent_profile_timestamp')"
)
ctx2.shut()
persistent_results = {
"first_run_value": first_value,
"second_run_value": second_value,
"second_run_timestamp": second_timestamp,
"profile_dir": str(PROFILE_DIR),
"persisted_successfully": first_value == second_value and second_value shouldn't be None,
}
outcomes["persistent_profile"] = persistent_results
print(json.dumps(persistent_results, indent=2))
besides Exception as e:
error = {
"part": "persistent_profile",
"error": repr(e),
}
outcomes["errors"].append(error)
print(error)
print_section("5. Browser-rendered extraction plus static parsing")
browser = None
attempt:
browser = launch(
headless=True,
humanize=True,
args=[
"--no-sandbox",
"--disable-dev-shm-usage",
],
)
web page = browser.new_page()
web page.goto("https://instance.com", wait_until="domcontentloaded", timeout=60000)
rendered = {
"title": web page.title(),
"h1": web page.locator("h1").inner_text(timeout=15000),
"paragraph": web page.locator("p").first.inner_text(timeout=15000),
"url": web page.url,
"captured_at": datetime.utcnow().isoformat() + "Z",
}
html = web page.content material()
soup = BeautifulSoup(html, "html.parser")
parsed = {
"static_title": soup.title.get_text(strip=True) if soup.title else None,
"static_h1": soup.discover("h1").get_text(strip=True) if soup.discover("h1") else None,
"hyperlinks": [
{
"text": a.get_text(strip=True),
"href": a.get("href"),
}
for a in soup.find_all("a")
],
}
outcomes["rendered_extraction"] = rendered
outcomes["static_parsing"] = parsed
print("Rendered extraction:")
print(json.dumps(rendered, indent=2))
print("nStatic parsing:")
print(json.dumps(parsed, indent=2))
besides Exception as e:
error = {
"part": "rendered_extraction",
"error": repr(e),
}
outcomes["errors"].append(error)
print(error)
lastly:
safe_close(browser, "extraction browser")
return outcomes
We proceed the identical tutorial job by creating a persistent browser profile listing and utilizing it to avoid wasting localStorage throughout browser restarts. We launch the persistent profile twice: first to put in writing a saved worth, and then to substantiate that the worth persists within the second launch. We additionally demonstrated rendering a dwell web page and extracting the rendered web page, then parsing the browser-returned HTML with BeautifulSoup for static content material evaluation.
tutorial_results = run_sync_browser_job_in_thread(cloakbrowser_tutorial_job)
print_section("6. Final tutorial abstract")
if SCREENSHOT_PATH.exists():
show(Image(filename=str(SCREENSHOT_PATH)))
rows = []
superior = tutorial_results.get("advanced_context")
if superior and superior.get("indicators"):
indicators = superior["signals"]
rows.prolong([
{
"category": "Browser signal",
"metric": "navigator.webdriver",
"value": signals.get("webdriver"),
},
{
"category": "Browser signal",
"metric": "navigator.plugins.length",
"value": signals.get("pluginsLength"),
},
{
"category": "Browser signal",
"metric": "window.chrome present",
"value": signals.get("chromeObjectPresent"),
},
{
"category": "Browser signal",
"metric": "timezone",
"value": signals.get("timezone"),
},
{
"category": "Browser signal",
"metric": "platform",
"value": signals.get("platform"),
},
{
"category": "Browser signal",
"metric": "viewport",
"value": json.dumps(signals.get("viewport")),
},
{
"category": "Browser signal",
"metric": "webglRenderer",
"value": signals.get("webglRenderer"),
},
])
persistent = tutorial_results.get("persistent_profile")
if persistent:
rows.append({
"class": "Persistence",
"metric": "persistent profile restored localStorage",
"worth": persistent.get("persisted_successfully"),
})
primary = tutorial_results.get("basic_launch")
if primary:
rows.append({
"class": "Navigation",
"metric": "primary launch title",
"worth": primary.get("title"),
})
rendered = tutorial_results.get("rendered_extraction")
if rendered:
rows.append({
"class": "Extraction",
"metric": "rendered h1",
"worth": rendered.get("h1"),
})
rows.prolong([
{
"category": "Output",
"metric": "screenshot_path",
"value": str(SCREENSHOT_PATH) if SCREENSHOT_PATH.exists() else None,
},
{
"category": "Output",
"metric": "storage_state_path",
"value": str(STORAGE_STATE_PATH) if STORAGE_STATE_PATH.exists() else None,
},
{
"category": "Output",
"metric": "working_directory",
"value": str(WORKDIR),
},
])
if tutorial_results.get("errors"):
for err in tutorial_results["errors"]:
rows.append({
"class": "Error",
"metric": err.get("part"),
"worth": err.get("error"),
})
summary_df = pd.DataFrame(rows)
show(summary_df)
print("nTutorial full.")
print("Files created:")
for path in sorted(WORKDIR.glob("*")):
print(" -", path)
We run the entire CloakBrowser tutorial job contained in the secure thread wrapper. We show the captured screenshot and construct a structured pandas abstract desk containing browser indicators, persistence outcomes, navigation output, extraction output, generated file paths, and any errors. We end by printing the tutorial completion message and itemizing all information created within the working listing.
In conclusion, we constructed a full Colab-ready CloakBrowser workflow that demonstrates each primary and superior browser-automation patterns in a secure, managed setting. We used native take a look at pages and easy public pages to know how CloakBrowser handles browser contexts, storage, persistence, screenshots, and the extraction of rendered HTML. We additionally made the tutorial dependable for pocket book environments by isolating the sync browser execution from Colab’s lively occasion loop. This offers a sturdy basis for constructing extra superior, accountable browser-automation pipelines with CloakBrowser and Python.
Check out the Full Codes here. Also, be at liberty to observe us on Twitter and don’t neglect to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and many others.? Connect with us
The publish Build a CloakBrowser Automation Workflow with Stealth Chromium, Persistent Profiles, and Browser Signal Inspection appeared first on MarkTechPost.
