[Article] Are You Ready to Let an AI Agent Use Your Computer?

Notice

Recent Posts

Recent Comments

Link

« 2026/01 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Tenma

[Article] Are You Ready to Let an AI Agent Use Your Computer? 본문

Study

[Article] Are You Ready to Let an AI Agent Use Your Computer?

Tenma2 2025. 3. 15. 15:24

출처 : https://spectrum.ieee.org/ai-agents-computer-use

Two years after the generative AI boom really began with the launch of ChatGPT, it no longer seems that exciting to have a phenomenally helpful AI assistant hanging around in your web browser or phone, just waiting for you to ask it questions. The next big push in AI is for AI agents that can take action on your behalf. But while agentic AI has already arrived for power users like coders, everyday consumers don’t yet have these kinds of AI assistants.

That will soon change. Anthropic, Google DeepMind, and OpenAI have all recently unveiled experimental models that can use computers the way people do—searching the web for information, filling out forms, and clicking buttons. With a little guidance from the human user, they can do thinks like order groceries, call an Uber, hunt for the best price for a product, or find a flight for your next vacation. And while these early models have limited abilities and aren’t yet widely available, they show the direction that AI is going.

“This is just the AI clicking around,” said OpenAI CEO Sam Altman in a demo video as he watched the OpenAI agent, called Operator, navigate to OpenTable, look up a San Francisco restaurant, and check for a table for two at 7pm.

Zachary Lipton, an associate professor of machine learning at Carnegie Mellon University, notes that AI agents are already being embedded in specialized software for different types of enterprise customers such as salespeople, doctors, and lawyers. But until now, we haven’t seen AI agents that can “do routine stuff on your laptop,” he says. “What’s intriguing here is the possibility of people starting to hand over the keys.”

[요지]

AI 붐이 일어나고 있는 현재

Anthropic

Google DeepMind

OpenAI

이 세개의 회사가 AI 기술을 선도 하고 있다.

AI agent를 생각해보면 사람을 대신하여 상품을 주문해주고 택시를 부르고 비행까지 합리적인 가격에 예약하는 기능을

생각할 것이다.

의사, 변호사등의 일부 직군들에서는 AI agent가 소프트웨어에 장착되어 있지만

현재 초기 AI agent는 이러한 기능을 널리 수행하기에는 무리가 있다.

AI Agents from Anthropic, Google DeepMind, and OpenAI

Anthropic was the first to unveil this new functionality, with an announcement in October that its Claude chatbot can now “use computers the way humans do.” The company stressed that it was giving the models this capability as a public beta test, and that it’s only available to developers who are building tools and products on top of Anthropic’s large language models. Claude navigates by viewing screenshots of what the user sees and counting the pixels required to move the cursor to a certain spot for a click. A spokesperson for Anthropic says that Claude can do this work on any computer and within any desktop application.

Next out of the gate was Google DeepMind with its Project Mariner, built on top of Google’s Gemini 2 language model. The company showed Mariner off in December but called it an “early research prototype” and said it’s only making the tool available to “trusted testers” for now. As another precaution, Mariner currently only operates within the Chrome browser, and only within an active tab, meaning that it won’t run in the background while you work on other tasks. While this requirement seems to somewhat defeat the purpose of having a time-saving AI helper, it’s likely just a temporary condition for this early stage of development.

Finally, in January OpenAI launched its computer-use agent (CUA), called Operator. OpenAI called it a “research preview” and made it available only to users who pay US $200 per month for OpenAI’s premium service, though the company said it’s working toward broader release. Yash Kumar, an engineer on the Operator team, says the tool can work with essentially any website. “We’re starting with the browser because this is where the majority of work happens,” Kumar says. But he notes that “the CUA model is also trained to use a computer, so it’s possible we could expand it” to work with other desktop apps.

Like the others, Operator relies on chain-of-thought reasoning to take instructions and break them down into a series of tasks that it can complete. If it needs more information to complete a task—like, for example, if you prefer to buy red or yellow onions—it will pause and ask for input. It also asks for confirmation before taking a final step, like booking the restaurant table or putting in the grocery order.

[요지]

Anthropic: Claude 챗봇이 인간처럼 컴퓨터를 조작하는 기능을 개발했으며, 현재 베타 테스트 중.
Google DeepMind: Project Mariner는 Chrome 브라우저 내에서만 작동하는 초기 연구 프로토타입으로 제한적으로 테스트 중.
OpenAI: Operator를 출시하여 웹 브라우저에서 작업을 수행하는 AI 도구를 제공하며, 향후 데스크톱 애플리케이션으로 확장 가능.

chain-of-thought reasoning

: AI가 복잡한 문제를 해결할 때 단계를 나누어 논리적으로 사고하는 과정

Safety Concerns for Computer-Use Agents

Here are some things that computer-use agents can’t yet do: log in to sites, agree to terms of service, solve captchas, and enter credit card or other payment details. If an agent comes up against one of these roadblocks, it hands the steering wheel back to the human user. OpenAI notes that Operator doesn’t take screenshots of the browser while the user is entering login or payment information.

The three companies have all noted that putting an AI in charge of your computer could pose safety risks. Anthropic has specifically raised the concern of prompt injection attacks, or ways in which malicious actors can add something to the user’s prompt to make the model take an unexpected action. “Since Claude can interpret screenshots from computers connected to the internet, it’s possible that it may be exposed to content that includes prompt injection attacks,” Anthropic wrote in a blog post.

CMU’s Lipton says that the companies haven’t revealed much information about the computer-use agents and how they work, so it’s hard to assess the risks. “If someone is getting your computer operator to do something nefarious, does that mean they already have access to your computer?” he wonders, and if so, why wouldn’t the miscreant just take action directly?

Still, Lipton says, with all the actions we take and purchases we make online, “It doesn’t require a wild leap of imagination to imagine actions that would leave the user in a pickle.” For example, he says, “Who will be the first person who wakes up and says, ‘My [agent] bought me a fleet of cars?’”

[요지]

AI 기반 컴퓨터 사용 에이전트(Computer-Use Agents)의 현재 한계와 보안 위험성을 강조하고 있다.

AI는 아직 웹사이트 로그인, 약관 동의, 결제 정보 입력 같은 작업을 수행할 수 없지만

로그인 중의 사진 등을 캡쳐할 가능성도 있다.

하지만 AI agent가 어떻게 동작하는지에 대한 정보가 부족하여

악용 가능성을 정확히 평가하기 어렵다.

또한 AI가 온라인 활동을 대신 할수록 원치 않은 행동을 할 가능성이 높아진다.

ex) 실수로 여러 대의 차를 구매할 수 있다.

Prompt Injection Attack

: 악의적인 사용자가 프롬프트에 특정 명령을 삽입해 AI의 예측 불가능한 행동을 유도할 수 있음

The Future of Computer-Use Agents

While none of the companies have revealed a timeline for making their computer-use agents broadly available, it seems likely that consumers will begin to get access to them this year—either through the big AI companies or through startups creating cheaper knockoffs.

OpenAI’s Kumar says it’s an exciting time, and that Operator marks a step toward a more collaborative future for humans and AI. “It’s a stepping stone on our path to AGI,” he says, referring to the long-promised dream/nightmare of artificial general intelligence. “The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks.”

If you remember the prescient 2013 movie Her, it seems like we’re edging toward the world that existed at the beginning of the film, before the sultry-voiced Samantha began speaking into the protagonist’s ear. It’s a world in which everyone has a boring and neutral AI to help them read and respond to messages and take care of other mundane tasks. Once the AI companies solidly achieve that goal, they’ll no doubt start working on Samantha.

요지]

AI 기반 컴퓨터 에이전트가 올해 안에 대중에게 제공될 가능성이 높다!

현재 AI는 사람을 돕는 단순한 보조 역할에 집중하고 있지만,

궁극적으로 감성적 교류까지 가능한 AI로 발전할 가능성이 있다.

Her 영화 속 미래가 점점 현실로 다가오고 있으며,

앞으로 AI가 인간과 상호작용하는 방식이 더욱 자연스러워질 것.

이렇게 AI agent에 대해 다룬

IEEE 기사를 읽어보았는데요

기대가 많은 만큼 우려도 많은 것 같네요

저도 대학 교양시간에

영화 'her'을 보면서 머지않은 미래에 이런 상황이 벌어질 것 같다는 생각을 했어요

정이 넘치던 이전과 달리

점점 사람들 사이의 관계는 이익 중심이 되고

잘못 하나에도 다같이 헐뜯는 사회가 된 것 같아요

사람들은 서로를 더욱 믿지 못하게 되고

옆에서 친구가 되어주는 사만다 같은 인공지능의 수요는 점점 늘어나겠죠

다같이 이해해주고 공감해주는 사회가 됬으면 좋겠지만

경쟁과 비교의 사회이고,

스스로 마인드 컨트롤을 하면서 사는게 삶에서 중요하게 느껴지네요!

'Study' 카테고리의 다른 글

[Certicate] 네이버 부스트캠프 (0)	2025.07.06
[Certificate] 정보처리기능사 (0)	2025.03.27
[Certificate] 한국계산과학공학회(KISTI) HPC&AI 겨울학교 (0)	2025.03.22
[Coding Night] (0)	2025.03.10

'Study' Related Articles

Tenma

[Article] Are You Ready to Let an AI Agent Use Your Computer? 본문

[Article] Are You Ready to Let an AI Agent Use Your Computer?

AI Agents from Anthropic, Google DeepMind, and OpenAI

Safety Concerns for Computer-Use Agents

The Future of Computer-Use Agents

'Study' 카테고리의 다른 글

티스토리툴바