In the current era of rapid technological development, the field of artificial intelligence has witnessed another significant breakthrough of milestone significance. On January 23rd, local time, OpenAI made a major announcement with the launch of the agent Operator. This innovative achievement immediately captured the attention of technology enthusiasts and industry experts worldwide, opening up a brand-new path for the practical application of artificial intelligence.
Operator is precisely defined by OpenAI as an agent capable of performing various tasks for users with the help of the Internet. In simple terms, as long as users issue task instructions, it can respond quickly and take action. The diversity of its functions is astonishing. For those cumbersome daily tasks, such as making restaurant reservations, it can screen out suitable options from numerous restaurants based on users' preferences and time arrangements and complete the booking. When purchasing daily necessities, it can compare prices on major e-commerce platforms, select the products with the highest cost-performance ratio, and place orders. For booking event tickets, it can quickly obtain event information, lock in the best seats, and complete the ticket purchase process. During the demonstration, the powerful functions of Operator were fully displayed. It can skillfully call on designated partner websites. For example, it can use Stubhub to obtain event tickets, accurately find the desired showtime and seats, and use Uber to query travel costs, quickly providing detailed cost estimates based on the departure and destination.
In terms of the task execution mode, Operator has achieved a perfect combination of autonomy and interactivity. It has a powerful autonomous execution ability. During the task execution process, it can independently complete a series of complex operations relying on its own algorithms and logic. When encountering difficulties or making mistakes, it can also use its unique reasoning ability to correct itself, demonstrating a high level of intelligence. At the same time, user experience is also given great importance. Users can view the task execution progress in real-time and intervene at any time when necessary. Once privacy information is involved, Operator will actively pause the task and wait for the user to take over, fully ensuring user privacy. Moreover, after completing the task, Operator also has the function of saving the workflow, such as updating reports with the latest sales data, which greatly improves work efficiency. It can also thoughtfully provide session record videos for users to review and support sharing, making it convenient for communication and collaboration with others.
The reason why Operator can have such powerful functions is inseparable from the strong support of the "Computer Use Agent (CUA)" model. Through the advanced technology of reinforcement learning, CUA ingeniously integrates the visual capabilities of GPT - 4o with advanced reasoning abilities. This gives Operator what is like "eyes" and a "brain." It can "see" screenshots, understand the information in them, and interact naturally with browsers. In actual operation, it can operate like a human using a mouse and keyboard, moving freely in the vast online world. Moreover, it does not require cumbersome custom API integration, greatly reducing the usage threshold.
However, OpenAI is well aware that allowing the model to operate freely in the complex Internet environment is bound to come with certain risks. Therefore, OpenAI has invested a great deal of effort and conducted a large number of internal and external red team tests. Although the official website shows that Operator is already equipped with relatively complete security mechanisms, it is undeniable that when facing some complex interface operations and other scenarios, it still has some problems. For example, it may misunderstand commands due to a misinterpretation of instructions, and there is also a potential risk of being misused by malicious actors. Currently, Operator is only available to US ChatGPT Pro users who pay $200 per month. However, OpenAI has a clear promotion plan. In the future, they will continuously improve and enhance Operator based on user feedback, gradually expanding its availability to Plus, Team, and Enterprise users, and ultimately integrating it seamlessly into ChatGPT, enabling more users to enjoy the convenience brought by this technology. In addition, OpenAI also plans to make the CUA model supporting Operator publicly available in the API, further improving Operator's ability to handle more complex workflows and unlocking its greater potential.
In terms of actual application performance, Operator has shown a dual nature. In some simple tasks and the operation areas it is familiar with, Operator's performance is remarkable. For example, when it comes to finding the first chapter of War and Peace and making a summary, it can quickly locate the relevant content on the Project Gutenberg website and use its comprehension and analysis capabilities to accurately complete the summary. When performing the task of understanding the evolution of Spotify's annual summary function, although it required some prompts during the process, it ultimately achieved the goal. However, Operator is not perfect, and it also has certain limitations in practical applications. It is restricted when accessing some websites. Websites like Reddit, which attach great importance to user privacy and content management, clearly block AI agents from browsing. Some resource-intensive websites, such as Figma or YouTube, also close their doors to Operator due to high performance requirements or legal concerns. When dealing with complex tasks such as property searches, its accuracy still needs to be improved. When facing some unfamiliar UI interfaces, it also encounters difficulties in operation, and its performance in text editing is also somewhat lacking.
The release of Operator has set off an uproar in the artificial intelligence industry and has had a profound impact. OpenAI CEO Sam Altman said excitedly that the release of Operator is the beginning of OpenAI entering Level 3, which means that in the future, OpenAI will launch more powerful agents with Operator as a starting point and continue to lead the development trend of artificial intelligence technology. From the perspective of industry competition, the current large model companies at home and abroad have focused on the field of AI agents. As early as the end of last year, Google launched Mariner, an intelligent agent based on Google's large model Gemini 2.0, which also has functions such as browsing spreadsheets and shopping websites. Not to be outdone, China's intelligent Spectrum AI launched a new version of GLM-PC, a multimodal agent that can operate computers autonomously, on January 23. The emergence of Operator will undoubtedly intensify the competitive situation in the industry, prompting major manufacturers to continue to increase research and development investment, accelerate product optimization and upgrading, and bring more high-quality artificial intelligence products and services to users. From a more macro perspective, the agent, as the core concept of artificial intelligence, has key characteristics such as autonomy, perception and decision-making ability. The emergence of Operator has set a new benchmark for the development of agent technology, which promotes the transformation of AI from the past simple dialogue mode to the agent mode with actual action ability, leading the agent technology to make strides towards a more practical and intelligent direction, and laying a solid foundation for the in-depth application of artificial intelligence in various fields in the future.
On the night of February 15th local time, a cargo ship carrying about 1800 American made MK-84 air bombs quietly docked at the port of Ashdod in Israel, and then these bombs were transported to the Israeli Air Force base.
On the night of February 15th local time, a cargo ship carr…
McDonald's has certainly caused a stir among food lovers an…