How to Build an Intelligent AI Desktop Automation Agent with Natural Language Commands and Interactive Simulation?

In this tutorial, we stroll by the method of constructing an superior AI desktop automation agent that runs seamlessly in Google Colab. We design it to interpret pure language instructions, simulate desktop duties similar to file operations, browser actions, and workflows, and present interactive suggestions by a digital setting. By combining NLP, job execution, and a simulated desktop, we create a system that feels each intuitive and highly effective, permitting us to expertise automation ideas with out counting on exterior APIs. Check out the FULL CODES here.
import re
import json
import time
import random
import threading
from datetime import datetime
from typing import Dict, List, Any, Tuple
from dataclasses import dataclass, asdict
from enum import Enum
attempt:
from IPython.show import show, HTML, clear_output
import matplotlib.pyplot as plt
import numpy as np
COLAB_MODE = True
besides ImportError:
COLAB_MODE = False
We start by importing important Python libraries that help knowledge dealing with, visualization, and simulation. We arrange Colab-specific instruments to run the tutorial interactively in a seamless setting. Check out the FULL CODES here.
class TaskType(Enum):
FILE_OPERATION = "file_operation"
BROWSER_ACTION = "browser_action"
SYSTEM_COMMAND = "system_command"
APPLICATION_TASK = "application_task"
WORKFLOW = "workflow"
@dataclass
class Task:
id: str
kind: TaskType
command: str
standing: str = "pending"
outcome: str = ""
timestamp: str = ""
execution_time: float = 0.0
We outline the construction of our automation system. We create an enum to categorize job sorts and a Task dataclass that helps us observe every command with its particulars, standing, and execution outcomes. Check out the FULL CODES here.
class DigitalDesktop:
"""Simulates a desktop setting with purposes and file system"""
def __init__(self):
self.purposes = {
"browser": {"standing": "closed", "tabs": [], "current_url": ""},
"file_manager": {"standing": "closed", "current_path": "/residence/consumer"},
"text_editor": {"standing": "closed", "current_file": "", "content material": ""},
"e-mail": {"standing": "closed", "unread": 3, "inbox": []},
"terminal": {"standing": "closed", "historical past": []}
}
self.file_system = {
"/residence/consumer/": {
"paperwork/": {
"report.txt": "Important quarterly report content material...",
"notes.md": "# Meeting Notesn- Project updaten- Budget evaluate"
},
"downloads/": {
"knowledge.csv": "title,age,citynJohn,25,NYCnJane,30,LA",
"picture.jpg": "[Binary image data]"
},
"desktop/": {}
}
}
self.screen_state = {
"active_window": None,
"mouse_position": (0, 0),
"clipboard": ""
}
def get_system_info(self) -> Dict:
return {
"cpu_usage": random.randint(5, 25),
"memory_usage": random.randint(30, 60),
"disk_space": random.randint(60, 90),
"network_status": "related",
"uptime": "2 hours quarter-hour"
}
class NLPProcessor:
"""Processes pure language instructions and extracts intents"""
def __init__(self):
self.intent_patterns = directories)",
r"(download
def extract_intent(self, command: str) -> Tuple[TaskType, float]:
"""Extract job kind and confidence from pure language command"""
command_lower = command.decrease()
best_match = TaskType.SYSTEM_COMMAND
best_confidence = 0.0
for task_type, patterns in self.intent_patterns.gadgets():
for sample in patterns:
if re.search(sample, command_lower):
confidence = len(re.findall(sample, command_lower)) * 0.3
if confidence > best_confidence:
best_match = task_type
best_confidence = confidence
return best_match, min(best_confidence, 1.0)
def extract_parameters(self, command: str, task_type: TaskType) -> Dict[str, str]:
"""Extract parameters from command primarily based on job kind"""
params = {}
command_lower = command.decrease()
if task_type == TaskType.FILE_OPERATION:
file_match = re.search(r'[w/.-]+.w+', command)
if file_match:
params['filename'] = file_match.group()
path_match = re.search(r'/[w/.-]+', command)
if path_match:
params['path'] = path_match.group()
elif task_type == TaskType.BROWSER_ACTION:
url_match = re.search(r'https?://[w.-]+|[w.-]+.(com|org|web|edu)', command)
if url_match:
params['url'] = url_match.group()
search_match = re.search(r'(?:search|discover|google)s+["']?([^"']+)["']?', command_lower)
if search_match:
params['query'] = search_match.group(1)
elif task_type == TaskType.APPLICATION_TASK:
app_match = re.search(r'(browser|editor|e-mail|terminal|calculator)', command_lower)
if app_match:
params['application'] = app_match.group(1)
return params
We simulate a digital desktop with purposes, a file system, and system states whereas additionally constructing an NLP processor. We set up guidelines to establish consumer intents from pure language instructions and extract helpful parameters, similar to filenames, URLs, or utility names. This permits us to bridge pure language enter with structured automation duties. Check out the FULL CODES here.
class TaskExecutor:
"""Executes duties on the digital desktop"""
def __init__(self, desktop: DigitalDesktop):
self.desktop = desktop
self.execution_log = []
def execute_file_operation(self, params: Dict[str, str], command: str) -> str:
"""Simulate file operations"""
if "open" in command.decrease():
filename = params.get('filename', 'unknown.txt')
return f"✓ Opened file: {filename}n
File contents loaded in textual content editor"
elif "create" in command.decrease():
filename = params.get('filename', 'new_file.txt')
return f"✓ Created new file: {filename}n
File prepared for modifying"
elif "record" in command.decrease():
recordsdata = record(self.desktop.file_system["/home/user/documents/"].keys())
return f"
Files discovered:n" + "n".be part of([f" • {f}" for f in files])
return "✓ File operation accomplished efficiently"
def execute_browser_action(self, params: Dict[str, str], command: str) -> str:
"""Simulate browser actions"""
if "open" in command.decrease() or "go to" in command.decrease():
url = params.get('url', 'instance.com')
self.desktop.purposes["browser"]["current_url"] = url
self.desktop.purposes["browser"]["status"] = "open"
return f"
Navigated to: {url}n✓ Page loaded efficiently"
elif "search" in command.decrease():
question = params.get('question', 'search time period')
return f"
Searching for: '{question}'n✓ Found 1,247 outcomes"
return "✓ Browser motion accomplished"
def execute_system_command(self, params: Dict[str, str], command: str) -> str:
"""Simulate system instructions"""
if "test" in command.decrease() or "present" in command.decrease():
information = self.desktop.get_system_info()
return f"
System Status:n" +
f" CPU: {information['cpu_usage']}%n" +
f" Memory: {information['memory_usage']}%n" +
f" Disk: {information['disk_space']}% usedn" +
f" Network: {information['network_status']}"
return "✓ System command executed"
def execute_application_task(self, params: Dict[str, str], command: str) -> str:
"""Simulate utility duties"""
app = params.get('utility', 'unknown')
if "open" in command.decrease():
self.desktop.purposes[app]["status"] = "open"
return f"
Launched {app.title()}n✓ Application prepared to be used"
elif "shut" in command.decrease():
if app in self.desktop.purposes:
self.desktop.purposes[app]["status"] = "closed"
return f"
Closed {app.title()}"
return f"✓ {app.title()} job accomplished"
def execute_workflow(self, params: Dict[str, str], command: str) -> str:
"""Simulate advanced workflow execution"""
steps = [
"Analyzing workflow requirements...",
"Preparing automation steps...",
"Executing batch operations...",
"Validating results...",
"Generating report..."
]
outcome = "
Workflow Execution:n"
for i, step in enumerate(steps, 1):
outcome += f" {i}. {step} ✓n"
if COLAB_MODE:
time.sleep(0.1)
return outcome + "
Workflow accomplished efficiently!"
class DesktopAgent:
"""Main desktop automation agent class - coordinates all elements"""
def __init__(self):
self.desktop = DigitalDesktop()
self.nlp = NLPProcessor()
self.executor = TaskExecutor(self.desktop)
self.task_history = []
self.energetic = True
self.stats = {
"tasks_completed": 0,
"success_rate": 100.0,
"average_execution_time": 0.0
}
def process_command(self, command: str) -> Task:
"""Process a pure language command and execute it"""
start_time = time.time()
task_id = f"task_{len(self.task_history) + 1:04d}"
task_type, confidence = self.nlp.extract_intent(command)
job = Task(
id=task_id,
kind=task_type,
command=command,
timestamp=datetime.now().strftime("%H:%M:%S")
)
attempt:
params = self.nlp.extract_parameters(command, task_type)
if task_type == TaskType.FILE_OPERATION:
outcome = self.executor.execute_file_operation(params, command)
elif task_type == TaskType.BROWSER_ACTION:
outcome = self.executor.execute_browser_action(params, command)
elif task_type == TaskType.SYSTEM_COMMAND:
outcome = self.executor.execute_system_command(params, command)
elif task_type == TaskType.APPLICATION_TASK:
outcome = self.executor.execute_application_task(params, command)
elif task_type == TaskType.WORKFLOW:
outcome = self.executor.execute_workflow(params, command)
else:
outcome = "
Command kind not acknowledged"
job.standing = "accomplished"
job.outcome = outcome
self.stats["tasks_completed"] += 1
besides Exception as e:
job.standing = "failed"
job.outcome = f"
Error: {str(e)}"
job.execution_time = spherical(time.time() - start_time, 3)
self.task_history.append(job)
self.update_stats()
return job
def update_stats(self):
"""Update agent statistics"""
if self.task_history:
successful_tasks = sum(1 for t in self.task_history if t.standing == "accomplished")
self.stats["success_rate"] = spherical((successful_tasks / len(self.task_history)) * 100, 1)
total_time = sum(t.execution_time for t in self.task_history)
self.stats["average_execution_time"] = spherical(total_time / len(self.task_history), 3)
def get_status_dashboard(self) -> str:
"""Generate a standing dashboard"""
recent_tasks = self.task_history[-5:] if self.task_history else []
dashboard = f"""
╭━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╮
│
AI DESKTOP AGENT STATUS │
├──────────────────────────────────────────────────────┤
│
Statistics: │
│ • Tasks Completed: {self.stats['tasks_completed']:<10} │
│ • Success Rate: {self.stats['success_rate']:<10}% │
│ • Avg Exec Time: {self.stats['average_execution_time']:<10}s │
├──────────────────────────────────────────────────────┤
│
Desktop Applications: │
"""
for app, information in self.desktop.purposes.gadgets():
status_icon = "
" if information["status"] == "open" else "
"
dashboard += f"│ {status_icon} {app.title():<12} ({information['status']:<6}) │n"
dashboard += "├──────────────────────────────────────────────────────┤n"
dashboard += "│
Recent Tasks: │n"
if recent_tasks:
for job in recent_tasks:
status_icon = "
" if job.standing == "accomplished" else "
"
dashboard += f"│ {status_icon} {job.timestamp} - {job.kind.worth:<15} │n"
else:
dashboard += "│ No duties executed but │n"
dashboard += "╰━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╯"
return dashboard
We implement the executor that turns our parsed intents into concrete actions and sensible outputs on the digital desktop. We then wire every little thing collectively within the DesktopAgent, the place we course of pure language, execute duties, and constantly observe success, latency, and a stay standing dashboard. Check out the FULL CODES here.
def run_advanced_demo():
"""Run an superior interactive demo of the AI Desktop Agent"""
print("
Initializing Advanced AI Desktop Automation Agent...")
time.sleep(1)
agent = DesktopAgent()
print("n" + "="*60)
print("
AI DESKTOP AUTOMATION AGENT - ADVANCED TUTORIAL")
print("="*60)
print("A complicated AI agent that understands pure language")
print("instructions and automates desktop duties in a simulated setting.")
print("n
Try these instance instructions:")
print(" • 'open the browser and go to github.com'")
print(" • 'create a brand new file known as report.txt'")
print(" • 'test system efficiency'")
print(" • 'present me the recordsdata in paperwork folder'")
print(" • 'automate e-mail processing workflow'")
demo_commands = [
"check system status and show CPU usage",
"open browser and navigate to github.com",
"create a new file called meeting_notes.txt",
"list all files in the documents directory",
"launch text editor application",
"automate data backup workflow"
]
print(f"n
Running {len(demo_commands)} demonstration instructions...n")
for i, command in enumerate(demo_commands, 1):
print(f"[{i}/{len(demo_commands)}] Command: '{command}'")
print("-" * 50)
job = agent.process_command(command)
print(f"Task ID: {job.id}")
print(f"Type: {job.kind.worth}")
print(f"Status: {job.standing}")
print(f"Execution Time: {job.execution_time}s")
print(f"Result:n{job.outcome}")
print()
if COLAB_MODE:
time.sleep(0.5)
print("n" + "="*60)
print("
FINAL AGENT STATUS")
print("="*60)
print(agent.get_status_dashboard())
return agent
def interactive_mode(agent):
"""Run interactive mode for consumer enter"""
print("n
INTERACTIVE MODE ACTIVATED")
print("Type your instructions beneath (kind 'give up' to exit, 'standing' for dashboard):")
print("-" * 60)
whereas True:
attempt:
user_input = enter("n
Agent> ").strip()
if user_input.decrease() in ['quit', 'exit', 'q']:
print("
AI Agent shutting down. Goodbye!")
break
elif user_input.decrease() in ['status', 'dashboard']:
print(agent.get_status_dashboard())
proceed
elif user_input.decrease() in ['help', '?']:
print("
Available instructions:")
print(" • Any pure language command")
print(" • 'standing' - Show agent dashboard")
print(" • 'assist' - Show this assist")
print(" • 'give up' - Exit AI Agent")
proceed
elif not user_input:
proceed
print(f"Processing: '{user_input}'...")
job = agent.process_command(user_input)
print(f"n
Task {job.id} [{task.type.value}] - {job.standing}")
print(job.outcome)
besides KeyboardInterrupt:
print("nn
AI Agent interrupted. Goodbye!")
break
besides Exception as e:
print(f"
Error: {e}")
if __name__ == "__main__":
agent = run_advanced_demo()
if COLAB_MODE:
print("n
To proceed with interactive mode, run:")
print("interactive_mode(agent)")
else:
interactive_mode(agent)
We run a scripted demo that processes sensible instructions, prints outcomes, and finishes with a stay standing dashboard. We then present an interactive loop the place we kind pure language duties, test the standing, and obtain instant suggestions. Finally, we auto-start the demo and, in Colab, we present how to launch interactive mode with a single name.
In conclusion, we reveal how an AI agent can deal with all kinds of desktop-like duties in a simulated setting utilizing solely Python. We see how pure language inputs are translated into structured duties, executed with sensible outputs, and summarized in a visible dashboard. With this basis, we place ourselves to lengthen the agent with extra advanced behaviors, richer interfaces, and real-world integrations, making desktop automation smarter, extra interactive, and simpler to use.
Check out the FULL CODES here. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t neglect to be part of our 100k+ ML SubReddit and Subscribe to our Newsletter.
The put up How to Build an Intelligent AI Desktop Automation Agent with Natural Language Commands and Interactive Simulation? appeared first on MarkTechPost.