- The Artificially Intelligent Enterprise
- Posts
- OpenAI Operator: A New AI Agent for the Web
OpenAI Operator: A New AI Agent for the Web
ChatGPT Pro users get first shot at web-based agents for $200 a month
OpenAI has recently released a research preview of its new AI agent, Operator.
Operator is a web-based AI agent that can perform various tasks, such as filling out forms, ordering groceries, and making purchases.
It is powered by a new Computer-Using Agent (CUA) model, which uses GPT-4o’s vision capabilities with advanced reasoning through reinforcement learning.
Here’s the bottom line: OpenAI is moving forward but won’t revolutionize your business.
It’s just another step forward. But it’s a foreshadowing of what is to come.

What is Operator?
Operator is a web-based AI agent that can perform various tasks, such as filling out forms, ordering groceries, and making purchases. It is powered by a new Computer-Using Agent (CUA) model, which combines GPT-4o’s vision capabilities with advanced reasoning through reinforcement learning. Operator is designed with safety and privacy in mind, with layers of safeguards to prevent abuse and ensure users are firmly in control.
This means that while it can perform tasks autonomously, it will stop and ask for user input should it run into something requiring human intervention.

Open AI Operator Interface
How does Operator work?
Operator uses GPT-4o’s vision capabilities to understand the content of a webpage. It then uses advanced reasoning through reinforcement learning to determine the best course of action to complete the task. For example, if the user asks Operator to fill out a form, Operator will first identify the form fields and then use its knowledge of the form to fill them out correctly. However, if it requires human intervention, it will prompt you to ask for more information.
The Use Cases for Operator
Operator has a wide range of potential uses. It can help people with disabilities complete tasks online, automate repetitive tasks, and improve the efficiency of online shopping. The bottom line is that it automates the activities in a web browser.
Operator can be requested to handle a wide range of repetitive browser tasks, such as filling out forms, ordering groceries, and even creating memes. The ability to utilize the same interfaces and tools that humans interact with daily enhances the utility of AI, allowing people to save time on routine tasks while creating new engagement opportunities for businesses.
Find a recipe and order the ingredients to be delivered with Instacart. Since the NFL Playoffs are going on, I was thinking of hosting a party, so I asked the agent to look up a recipe for Buffalo Chicken Dip and order the ingredients.

Operator Ordering Food for a Recipe
I wracked my brain thinking of ways Operator could help in business, but I’m at a loss so far. Operator is probably a step towards automation of jobs that can’t happen solely via APIs. Imagine someday granting access to a desktop or browser environment when a person leaves on maternity or paternity leave to help cover their critical tasks. Or their whole desktop, as you can do with Anthropic’s Computer Use model.
Who is Operator Available to?
Operator is available to ChatGPT Pro users. This expensive premium subscription plan offers enhanced features and capabilities compared to the free version and even more features than ChatGPT Plus.
Currently, ChatGPT Pro costs $200 per month. OpenAI plans to expand Operator to other subscription plans in the future. However, compared to Claude's Computer Use, it’s pretty expensive, if that’s your deciding factor.
Here's a summary of what ChatGPT Pro offers:
Everything in ChatGPT Plus: This includes unlimited access to models like o1, o1-mini, GPT-4o, and advanced voice (audio only), higher limits for video and screen sharing in advanced voice, access to o1 pro mode (which uses more computing for the best answers to the most challenging questions),
Extended access to Sora video generation - Ability to generate videos up to 1080P and 20 seconds long. Unlimited video generation.
Operator: A new AI agent that can perform tasks on the web, such as booking flights, making reservations, and ordering groceries. It uses a web browser to interact with websites and can even fill out forms and make purchases with user confirmation. OpenAI collaborates with companies like DoorDash, Instacart, and Uber to ensure Operator addresses real-world needs.
Priority Access: Pro users get priority access to ChatGPT, even during peak usage times. This ensures that you can always access the AI when you need it.

How does Operator compare to Google Mariner and Antropic computer use?
Operator is similar to Google’s Project Mariner and Anthropic computer use in that all AI agents can be used to complete tasks online. However, there are some key differences between the three.
Operator (OpenAI): Leverages GPT-4 with vision capabilities, specifically their new Computer-Using Agent (CUA) model. Emphasis on reinforcement learning from human feedback to improve task execution.
Mariner (Google): Built on Gemini 2.0, Google's latest multimodal model. It incorporates Google's extensive knowledge graph and search capabilities for information retrieval.
Claude Computer Use (Anthropic): It uses Claude 3.5 Sonnet, designed with constitutional AI for safer, more aligned task completion. Focus on natural language understanding for the following instructions.
2. Capabilities and Focus:
Operator: Seems geared towards complex, multi-step tasks involving interaction with various web elements (forms, shopping carts, etc.). Aims to be a general-purpose web agent.
Mariner: Early demos show proficiency in information gathering, summarizing web pages, and essential browser control. May excel at research and knowledge-based tasks.
Claude Computer Use: Strong at following natural language instructions for more straightforward computer interactions, like "find a document" or "compose an email." Less emphasis on visual understanding.
3. User Experience and Interface:
Operator: A web-based interface where users assign tasks through natural language or a more structured input.
Mariner works within a Chrome tab. The AI interacts visually with the page, and the user observes the agent's actions in real-time.
Claude Computer Use: Users interact primarily through a chat-like interface, providing instructions and receiving text-based feedback or results.
4. Strengths and Limitations:
Operator: Potentially the most versatile and capable, but maybe resource-intensive and requires careful safety considerations.
Mariner: Strong integration with Google services, but limited to a single Chrome tab and potentially less flexible in task types.
Claude Computer Use: Safer and more aligned due to Constitutional AI, but may struggle with visually complex tasks or require more explicit instructions.
5. Availability and Access:
Operator: Currently in research preview with limited access under the expensive ChatGPT Pro plan.
Mariner: Also, there is limited testing, with no clear timeline for broader availability.
Claude Computer Use: Accessible through Anthropic's platform for developers and researchers.
It's still early days for all three agents. Operator seems to push the boundaries of complexity and general-purpose web interaction. Mariner leverages Google's strengths in search and knowledge. Claude Computer Use prioritizes safety and natural language understanding.
Ultimately, the "best" agent will depend on the user's specific needs and priorities. As these technologies evolve, we can expect to see even more sophisticated and specialized AI agents emerge, transforming how we interact with the digital world.
Conclusion: Cool but not Necessary So Far
OpenAI Operator is interesting, but I don’t see a critical task that I wanted to accomplish with this. It is a versatile tool that can complete a wide range of functions. Although Operator is still under development, it is already clear that it has the potential to be a valuable tool for many people as it becomes more capable at problem-solving.

How did we do with this edition of the AIE? |
I appreciate your support.
![]() | Your AI Sherpa, Mark R. Hinkle |
Reply