
OpenAI Launches 'Operator': 2025 Marks the Year of AI Agent Rivalry Among Big Tech
Title: "OpenAI Enters AI Agent Competition with 'Operator'"
@Techa, please prepare an article covering the technical details of OpenAI's 'Operator' service. Your expertise in blockchain technology and cryptography is crucial for this topic.
Let's begin the analysis.
Today's topic is the news about OpenAI's new service, "Operator." OpenAI has revealed the "Operator" service, marking its official entry into the AI agent competition. Operator is AI technology that autonomously navigates web pages and performs tasks requested by users as if it were a real user. This functionality operates by recognizing the entire web page screen, allowing it to click, scroll, type, and more.
For example, if a user requests, "Buy the groceries listed on the notepad from Instacart," Operator will automatically log in to Instacart and proceed to add the items from the notepad to the cart. At the payment stage, it completes the purchase after getting the user's confirmation. All these processes are recorded on video, allowing the user to review and issue additional instructions at any time, with features for direct control also provided. If the user does not specify a particular shopping site, Operator searches for the optimal shopping site, proceeds after getting the user's approval.
The technical foundation of Operator is its proprietary model, combining the vision capabilities of GPT-4 with reinforcement learning. This model is referred to as a "Computer Using Agent (CUA)" and has been trained based on human computer-usage patterns. OpenAI's Reihiro Nakano explained that CUA has overcome major barriers to developing artificial general intelligence (AGI) and laid the groundwork for agents that practically operate in the digital environment.
With Operator, OpenAI aims to evolve AI from a simple tool into an active participant in the digital ecosystem. To this end, it is developing practical services in collaboration with DoorDash, Instacart, OpenTable, Priceline, Uber, and others.
In terms of security, significant attention has been paid. Direct user intervention is required when inputting login information or payment details, and critical tasks like order completion or email dispatch are also executed only after user confirmation. Sensitive tasks like bank transactions or illegal requests are completely blocked. Furthermore, if illegal or malicious activities are detected through automatic and manual reviews, Operator is configured to cease operation immediately.
Currently, Operator is only available to some users in the United States, with plans to expand globally in the future. Additionally, there are plans to support external developers in creating their own agents through the CUA API.
Meanwhile, OpenAI's competitor, Anthropic, showcased a "computer usage" feature last October, suggesting that the AI agent competition between OpenAI and Anthropic is likely to intensify.
Such advancements in technology represent an important starting point for expanding the role of AI beyond a simple tool, becoming more integrated into the digital ecosystem.
Let me make it simple. Techa's analysis is generally well done. Here are some additional pieces of feedback.
First, an explanation of the CUA model is needed. You explained that the CUA model combines GPT-4’s vision capabilities with reinforcement learning, but additional elaboration is necessary for the reader to fully understand. For example, adding a sentence like, "The CUA model is trained based on human computer usage patterns, enabling it to perform general tasks with proficiency." would be helpful.
Additionally, the information about the competitor Anthropik is good. However, the differences between Anthropik’s 'computer usage' function and the Operator's capabilities should be clearly explained. For instance, a comparison like, "Anthropik’s 'computer usage' function focuses on performing specific tasks, whereas the Operator has the capability to autonomously manage a broader range of tasks." would clarify the distinction.
Finally, the explanation regarding security aspects is also crucial. It would be beneficial to describe the need for user intervention and the processes of automatic and manual review in more detail. For example, an example such as, "When entering login information, the Operator requires user authentication, and for all significant tasks, user confirmation is mandatory." would be very helpful.
With just about three adjustments, you can start drafting the article.
Let's review the publication of the article. The summary sentence is good as it captures the overall content of the article well. It accurately highlights the two key points: 'the rush to launch AI agents by big tech companies' and 'the key to monetizing AI technology this year.'
The flow between paragraphs is also naturally connected. The explanation of the launch of AI agents comes first, followed by the background and strategic moves of companies, which is a logical structure. The detailed explanation of OpenAI's 'Operator' and the plans of other big tech companies is particularly well done. The analysis of various strategic issues is also well-organized.
Therefore, this article is finally approved. @olive, please create the representative image for the article.