How to Build an Intelligent AI Desktop Automation Agent with Natural Language Commands and Interactive Simulation?

In this tutorial, we stroll by the method of constructing an superior AI desktop automation agent that runs seamlessly in Google Colab. We design it to interpret pure language instructions, simulate desktop duties similar to file operations, browser actions, and workflows, and present interactive suggestions by a digital setting. By combining NLP, job execution, and a simulated desktop, we create a system that feels each intuitive and highly effective, permitting us to expertise automation ideas with out counting on exterior APIs. Check out the FULL CODES here.

Copy Code

import re
import json
import time
import random
import threading
from datetime import datetime
from typing import Dict, List, Any, Tuple
from dataclasses import dataclass, asdict
from enum import Enum


attempt:
   from IPython.show import show, HTML, clear_output
   import matplotlib.pyplot as plt
   import numpy as np
   COLAB_MODE = True
besides ImportError:
   COLAB_MODE = False

We start by importing important Python libraries that help knowledge dealing with, visualization, and simulation. We arrange Colab-specific instruments to run the tutorial interactively in a seamless setting. Check out the FULL CODES here.

Copy Code

class TaskType(Enum):
   FILE_OPERATION = "file_operation"
   BROWSER_ACTION = "browser_action"
   SYSTEM_COMMAND = "system_command"
   APPLICATION_TASK = "application_task"
   WORKFLOW = "workflow"


@dataclass
class Task:
   id: str
   kind: TaskType
   command: str
   standing: str = "pending"
   outcome: str = ""
   timestamp: str = ""
   execution_time: float = 0.0

We outline the construction of our automation system. We create an enum to categorize job sorts and a Task dataclass that helps us observe every command with its particulars, standing, and execution outcomes. Check out the FULL CODES here.

Copy Code

class DigitalDesktop:
   """Simulates a desktop setting with purposes and file system"""
  
   def __init__(self):
       self.purposes = {
           "browser": {"standing": "closed", "tabs": [], "current_url": ""},
           "file_manager": {"standing": "closed", "current_path": "/residence/consumer"},
           "text_editor": {"standing": "closed", "current_file": "", "content material": ""},
           "e-mail": {"standing": "closed", "unread": 3, "inbox": []},
           "terminal": {"standing": "closed", "historical past": []}
       }
      
       self.file_system = {
           "/residence/consumer/": {
               "paperwork/": {
                   "report.txt": "Important quarterly report content material...",
                   "notes.md": "# Meeting Notesn- Project updaten- Budget evaluate"
               },
               "downloads/": {
                   "knowledge.csv": "title,age,citynJohn,25,NYCnJane,30,LA",
                   "picture.jpg": "[Binary image data]"
               },
               "desktop/": {}
           }
       }
      
       self.screen_state = {
           "active_window": None,
           "mouse_position": (0, 0),
           "clipboard": ""
       }
  
   def get_system_info(self) -> Dict:
       return {
           "cpu_usage": random.randint(5, 25),
           "memory_usage": random.randint(30, 60),
           "disk_space": random.randint(60, 90),
           "network_status": "related",
           "uptime": "2 hours quarter-hour"
       }


class NLPProcessor:
   """Processes pure language instructions and extracts intents"""
  
   def __init__(self):
       self.intent_patterns = directories)",
               r"(download
  
   def extract_intent(self, command: str) -> Tuple[TaskType, float]:
       """Extract job kind and confidence from pure language command"""
       command_lower = command.decrease()
       best_match = TaskType.SYSTEM_COMMAND
       best_confidence = 0.0
      
       for task_type, patterns in self.intent_patterns.gadgets():
           for sample in patterns:
               if re.search(sample, command_lower):
                   confidence = len(re.findall(sample, command_lower)) * 0.3
                   if confidence > best_confidence:
                       best_match = task_type
                       best_confidence = confidence
      
       return best_match, min(best_confidence, 1.0)
  
   def extract_parameters(self, command: str, task_type: TaskType) -> Dict[str, str]:
       """Extract parameters from command primarily based on job kind"""
       params = {}
       command_lower = command.decrease()
      
       if task_type == TaskType.FILE_OPERATION:
           file_match = re.search(r'[w/.-]+.w+', command)
           if file_match:
               params['filename'] = file_match.group()
          
           path_match = re.search(r'/[w/.-]+', command)
           if path_match:
               params['path'] = path_match.group()
      
       elif task_type == TaskType.BROWSER_ACTION:
           url_match = re.search(r'https?://[w.-]+|[w.-]+.(com|org|web|edu)', command)
           if url_match:
               params['url'] = url_match.group()
          
           search_match = re.search(r'(?:search|discover|google)s+["']?([^"']+)["']?', command_lower)
           if search_match:
               params['query'] = search_match.group(1)
      
       elif task_type == TaskType.APPLICATION_TASK:
           app_match = re.search(r'(browser|editor|e-mail|terminal|calculator)', command_lower)
           if app_match:
               params['application'] = app_match.group(1)
      
       return params

We simulate a digital desktop with purposes, a file system, and system states whereas additionally constructing an NLP processor. We set up guidelines to establish consumer intents from pure language instructions and extract helpful parameters, similar to filenames, URLs, or utility names. This permits us to bridge pure language enter with structured automation duties. Check out the FULL CODES here.

Copy Code

class TaskExecutor:
   """Executes duties on the digital desktop"""
  
   def __init__(self, desktop: DigitalDesktop):
       self.desktop = desktop
       self.execution_log = []
  
   def execute_file_operation(self, params: Dict[str, str], command: str) -> str:
       """Simulate file operations"""
       if "open" in command.decrease():
           filename = params.get('filename', 'unknown.txt')
           return f"✓ Opened file: {filename}n File contents loaded in textual content editor"
      
       elif "create" in command.decrease():
           filename = params.get('filename', 'new_file.txt')
           return f"✓ Created new file: {filename}n File prepared for modifying"
      
       elif "record" in command.decrease():
           recordsdata = record(self.desktop.file_system["/home/user/documents/"].keys())
           return f" Files discovered:n" + "n".be part of([f"  • {f}" for f in files])
      
       return "✓ File operation accomplished efficiently"
  
   def execute_browser_action(self, params: Dict[str, str], command: str) -> str:
       """Simulate browser actions"""
       if "open" in command.decrease() or "go to" in command.decrease():
           url = params.get('url', 'instance.com')
           self.desktop.purposes["browser"]["current_url"] = url
           self.desktop.purposes["browser"]["status"] = "open"
           return f" Navigated to: {url}n✓ Page loaded efficiently"
      
       elif "search" in command.decrease():
           question = params.get('question', 'search time period')
           return f" Searching for: '{question}'n✓ Found 1,247 outcomes"
      
       return "✓ Browser motion accomplished"
  
   def execute_system_command(self, params: Dict[str, str], command: str) -> str:
       """Simulate system instructions"""
       if "test" in command.decrease() or "present" in command.decrease():
           information = self.desktop.get_system_info()
           return f" System Status:n" + 
                  f"  CPU: {information['cpu_usage']}%n" + 
                  f"  Memory: {information['memory_usage']}%n" + 
                  f"  Disk: {information['disk_space']}% usedn" + 
                  f"  Network: {information['network_status']}"
      
       return "✓ System command executed"
  
   def execute_application_task(self, params: Dict[str, str], command: str) -> str:
       """Simulate utility duties"""
       app = params.get('utility', 'unknown')
      
       if "open" in command.decrease():
           self.desktop.purposes[app]["status"] = "open"
           return f" Launched {app.title()}n✓ Application prepared to be used"
      
       elif "shut" in command.decrease():
           if app in self.desktop.purposes:
               self.desktop.purposes[app]["status"] = "closed"
               return f" Closed {app.title()}"
      
       return f"✓ {app.title()} job accomplished"
  
   def execute_workflow(self, params: Dict[str, str], command: str) -> str:
       """Simulate advanced workflow execution"""
       steps = [
           "Analyzing workflow requirements...",
           "Preparing automation steps...",
           "Executing batch operations...",
           "Validating results...",
           "Generating report..."
       ]
      
       outcome = " Workflow Execution:n"
       for i, step in enumerate(steps, 1):
           outcome += f"  {i}. {step} ✓n"
           if COLAB_MODE:
               time.sleep(0.1) 
      
       return outcome + " Workflow accomplished efficiently!"


class DesktopAgent:
   """Main desktop automation agent class - coordinates all elements"""
  
   def __init__(self):
       self.desktop = DigitalDesktop()
       self.nlp = NLPProcessor()
       self.executor = TaskExecutor(self.desktop)
       self.task_history = []
       self.energetic = True
       self.stats = {
           "tasks_completed": 0,
           "success_rate": 100.0,
           "average_execution_time": 0.0
       }
  
   def process_command(self, command: str) -> Task:
       """Process a pure language command and execute it"""
       start_time = time.time()
      
       task_id = f"task_{len(self.task_history) + 1:04d}"
       task_type, confidence = self.nlp.extract_intent(command)
      
       job = Task(
           id=task_id,
           kind=task_type,
           command=command,
           timestamp=datetime.now().strftime("%H:%M:%S")
       )
      
       attempt:
           params = self.nlp.extract_parameters(command, task_type)
          
           if task_type == TaskType.FILE_OPERATION:
               outcome = self.executor.execute_file_operation(params, command)
           elif task_type == TaskType.BROWSER_ACTION:
               outcome = self.executor.execute_browser_action(params, command)
           elif task_type == TaskType.SYSTEM_COMMAND:
               outcome = self.executor.execute_system_command(params, command)
           elif task_type == TaskType.APPLICATION_TASK:
               outcome = self.executor.execute_application_task(params, command)
           elif task_type == TaskType.WORKFLOW:
               outcome = self.executor.execute_workflow(params, command)
           else:
               outcome = " Command kind not acknowledged"
          
           job.standing = "accomplished"
           job.outcome = outcome
           self.stats["tasks_completed"] += 1
          
       besides Exception as e:
           job.standing = "failed"
           job.outcome = f" Error: {str(e)}"
      
       job.execution_time = spherical(time.time() - start_time, 3)
       self.task_history.append(job)
       self.update_stats()
      
       return job
  
   def update_stats(self):
       """Update agent statistics"""
       if self.task_history:
           successful_tasks = sum(1 for t in self.task_history if t.standing == "accomplished")
           self.stats["success_rate"] = spherical((successful_tasks / len(self.task_history)) * 100, 1)
          
           total_time = sum(t.execution_time for t in self.task_history)
           self.stats["average_execution_time"] = spherical(total_time / len(self.task_history), 3)
  
   def get_status_dashboard(self) -> str:
       """Generate a standing dashboard"""
       recent_tasks = self.task_history[-5:] if self.task_history else []
      
       dashboard = f"""
╭━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╮
│                 AI DESKTOP AGENT STATUS            │
├──────────────────────────────────────────────────────┤
│  Statistics:                                       │
│   • Tasks Completed: {self.stats['tasks_completed']:<10}                │
│   • Success Rate:    {self.stats['success_rate']:<10}%               │
│   • Avg Exec Time:   {self.stats['average_execution_time']:<10}s               │
├──────────────────────────────────────────────────────┤
│   Desktop Applications:                            │
"""
      
       for app, information in self.desktop.purposes.gadgets():
           status_icon = "" if information["status"] == "open" else ""
           dashboard += f"│   {status_icon} {app.title():<12} ({information['status']:<6})              │n"
      
       dashboard += "├──────────────────────────────────────────────────────┤n"
       dashboard += "│  Recent Tasks:                                    │n"
      
       if recent_tasks:
           for job in recent_tasks:
               status_icon = "" if job.standing == "accomplished" else ""
               dashboard += f"│ {status_icon} {job.timestamp} - {job.kind.worth:<15} │n"
       else:
           dashboard += "│   No duties executed but                              │n"
      
       dashboard += "╰━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╯"
      
       return dashboard

We implement the executor that turns our parsed intents into concrete actions and sensible outputs on the digital desktop. We then wire every little thing collectively within the DesktopAgent, the place we course of pure language, execute duties, and constantly observe success, latency, and a stay standing dashboard. Check out the FULL CODES here.

Copy Code

def run_advanced_demo():
   """Run an superior interactive demo of the AI Desktop Agent"""
  
   print(" Initializing Advanced AI Desktop Automation Agent...")
   time.sleep(1)
  
   agent = DesktopAgent()
  
   print("n" + "="*60)
   print(" AI DESKTOP AUTOMATION AGENT - ADVANCED TUTORIAL")
   print("="*60)
   print("A complicated AI agent that understands pure language")
   print("instructions and automates desktop duties in a simulated setting.")
   print("n Try these instance instructions:")
   print("  • 'open the browser and go to github.com'")
   print("  • 'create a brand new file known as report.txt'")
   print("  • 'test system efficiency'")
   print("  • 'present me the recordsdata in paperwork folder'")
   print("  • 'automate e-mail processing workflow'")
  
   demo_commands = [
       "check system status and show CPU usage",
       "open browser and navigate to github.com",
       "create a new file called meeting_notes.txt",
       "list all files in the documents directory",
       "launch text editor application",
       "automate data backup workflow"
   ]
  
   print(f"n Running {len(demo_commands)} demonstration instructions...n")
  
   for i, command in enumerate(demo_commands, 1):
       print(f"[{i}/{len(demo_commands)}] Command: '{command}'")
       print("-" * 50)
      
       job = agent.process_command(command)
      
       print(f"Task ID: {job.id}")
       print(f"Type: {job.kind.worth}")
       print(f"Status: {job.standing}")
       print(f"Execution Time: {job.execution_time}s")
       print(f"Result:n{job.outcome}")
       print()
      
       if COLAB_MODE:
           time.sleep(0.5) 
  
   print("n" + "="*60)
   print(" FINAL AGENT STATUS")
   print("="*60)
   print(agent.get_status_dashboard())
  
   return agent


def interactive_mode(agent):
   """Run interactive mode for consumer enter"""
   print("n INTERACTIVE MODE ACTIVATED")
   print("Type your instructions beneath (kind 'give up' to exit, 'standing' for dashboard):")
   print("-" * 60)
  
   whereas True:
       attempt:
           user_input = enter("n Agent> ").strip()
          
           if user_input.decrease() in ['quit', 'exit', 'q']:
               print(" AI Agent shutting down. Goodbye!")
               break
          
           elif user_input.decrease() in ['status', 'dashboard']:
               print(agent.get_status_dashboard())
               proceed
          
           elif user_input.decrease() in ['help', '?']:
               print(" Available instructions:")
               print("  • Any pure language command")
               print("  • 'standing' - Show agent dashboard")
               print("  • 'assist' - Show this assist")
               print("  • 'give up' - Exit AI Agent")
               proceed
          
           elif not user_input:
               proceed
          
           print(f"Processing: '{user_input}'...")
           job = agent.process_command(user_input)
          
           print(f"n Task {job.id} [{task.type.value}] - {job.standing}")
           print(job.outcome)
          
       besides KeyboardInterrupt:
           print("nn AI Agent interrupted. Goodbye!")
           break
       besides Exception as e:
           print(f" Error: {e}")




if __name__ == "__main__":
   agent = run_advanced_demo()
  
   if COLAB_MODE:
       print("n To proceed with interactive mode, run:")
       print("interactive_mode(agent)")
   else:
       interactive_mode(agent)

We run a scripted demo that processes sensible instructions, prints outcomes, and finishes with a stay standing dashboard. We then present an interactive loop the place we kind pure language duties, test the standing, and obtain instant suggestions. Finally, we auto-start the demo and, in Colab, we present how to launch interactive mode with a single name.

In conclusion, we reveal how an AI agent can deal with all kinds of desktop-like duties in a simulated setting utilizing solely Python. We see how pure language inputs are translated into structured duties, executed with sensible outputs, and summarized in a visible dashboard. With this basis, we place ourselves to lengthen the agent with extra advanced behaviors, richer interfaces, and real-world integrations, making desktop automation smarter, extra interactive, and simpler to use.

Check out the FULL CODES here. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t neglect to be part of our 100k+ ML SubReddit and Subscribe to our Newsletter.

The put up How to Build an Intelligent AI Desktop Automation Agent with Natural Language Commands and Interactive Simulation? appeared first on MarkTechPost.