Case Study Walkthrough

Automating Routine Work with AI

How a repetitive, click-by-click office task — DGIST receipt (expense) processing — was automated end-to-end using Claude with Computer Use. Built from the actual prompts used during the session.

← Back to Seminar Page

📝 The Task

Receipt processing on the DGIST portal is a perfect automation target: it is repetitive, rule-based, and done entirely by clicking buttons in a fixed order on a web page.

✉️ Emails & Receipts
download & extract info
🤖 Claude (Computer Use)
clicks the portal for you
✅ Submitted Claim
with proof attached
🎯 Five Principles Behind the Workflow

Before the steps, here is why the workflow is shaped the way it is. These transfer to almost any GUI-automation task.

Principle 1

Keep the human in the loop where it matters

Login and security steps stay manual. The script opens the login page; you log in yourself. Never hand credentials to automation.

Principle 2

Build incrementally with screenshots

Don't describe the whole flow at once. Drive one screen at a time: take a screenshot, tell Claude which button to click, verify, repeat.

Principle 3

Separate concerns into files

login.py handles the session, work.py does the task, config.py holds the inputs. Each piece is testable on its own.

Principle 4

Parameterize, don't hard-code

Employee numbers, approval numbers, and file paths become inputs in config.py, so the same script handles every receipt.

Principle 5

Fail loudly

If a required input can't be extracted, the script stops instead of silently submitting a wrong claim.

Principle 6

Verify end-to-end

After building the parts, run the whole pipeline — login → download → process — and watch it work before trusting it.

🗺️ The Workflow at a Glance

The session naturally broke into four phases. Each phase below links to the prompts that drove it.

A

Set up a persistent browser session

Open the login page, log in manually, and keep the session alive so the work script can reuse it.

B

Teach the click sequence, one screen at a time

Use screenshots to define exactly which button to press on each portal page until the full claim is filled in.

C

Gather the inputs automatically

Download the relevant Gmail messages & attachments, then extract the approval number from the receipt image.

D

Parameterize & verify the whole pipeline

Move inputs into config.py, fail on missing data, and run login → download → process end-to-end.

🔒 Phase A · Persistent Browser Session

Get a working, logged-in browser that the automation can drive — with the login itself done by hand.

1

State the task and keep login manual. Describe the repetitive task and explicitly say which part you'll do yourself.

💬 Opening prompt

"Computer use를 사용하여 업무 자동화를 하고 싶어. 내 업무는 영수증 처리이고, 인터넷 창에서 버튼들을 순서대로 눌러야 해 — 이를 자동화하는 스크립트를 만들어줘.
1. 로그인 페이지를 창으로 띄워줘 (로그인은 내가 수동으로 할게).
2. 로그인 후 업무 페이지로 접속.
여기까지 만들어주면, 다음부터는 스크린샷으로 어떤 버튼을 눌러야 할지 설명할게."
💡 Notice how the very first prompt sets the collaboration contract: "I log in, you take screenshots and I'll guide the clicks." This framing is what makes the rest of the session go smoothly.
2

Correct the target page. The post-login URL was wrong, so it was fixed in one short instruction.

💬 Follow-up

"로그인 완료했어. 로그인 후 메인 페이지가 아니라 실제 업무 페이지(영수증 신청 화면)로 접속하도록 수정해줘."
3

Split login from work, and persist the session. A recurring early problem: after manual login, the session wasn't being saved, so the work script saw a logged-out browser.

💬 Two key fixes

"로그인용 login.py와 업무용 work.py를 분리해줘."
"현재 로그인 후 세션/쿠키가 저장되지 않고 있어. 로그인 상태를 유지해서 work.py가 같은 세션을 재사용하도록 해줘."
The most common Computer-Use pitfall: the login window and the automation run in different browser contexts. Persisting the user-data directory (cookies/session) so both scripts share one profile is what fixes "실행했을 때 아무런 변화가 없어" (nothing happens when I run it).
🖱️ Phase B · Teach the Click Sequence

This is the heart of the session. Each portal screen is handled with a screenshot + a precise instruction. Below are the recurring patterns rather than all 30+ clicks.

1

One screen, one screenshot, one instruction. The whole expense form was built up like this — selecting payers, project code, budget item, bank account, card-approval lookup, attaching proof, and saving.

📷 Representative screen prompts (each sent with a screenshot)

"이 화면에서 결제자 옆의 돋보기를 눌러야 해."
"'전체' 체크박스를 클릭 → 이름/직원번호 칸에 '250225' 입력 → 엔터 → 첫 번째 결과를 더블 클릭."
"'사업명' 옆 돋보기를 클릭 → 사업코드 '20260515'를 더블 클릭."
"예산선택 팝업에서 예산과목명 → '기본경비_사업관리비', 지출계획 → '신임 교원 멘토링 운영' 선택."
2

When a click fails, describe the UI more precisely. Several screens needed a second, sharper instruction — this is normal and expected.

🔧 Disambiguating a stubborn control

"'선택' 버튼을 누르지 말고, 그 옆의 아래 화살표(드롭다운)를 클릭한 뒤 '법인카드'를 선택해야 해."
💡 Pattern: if Claude clicks the wrong nearby element, don't just say "it didn't work" — say which element to avoid and which to use ("not the Select button, the down-arrow beside it"). Spatial, exclusionary descriptions resolve most misclicks.
3

Help Claude notice pop-ups. A few times the automation didn't realize a new window had opened. A one-line nudge fixed it.

💬 Pop-up awareness

"사업명 관련 팝업창이 열렸는데, 네가 인지하지 못하고 있어. 그 팝업을 대상으로 동작해줘."
4

Respect ordering constraints the UI imposes. The portal required saving first before proof could be attached — a real-world quirk discovered mid-flow.

💬 Save-then-attach

"저장을 먼저 해야 증빙을 올릴 수 있네. '저장'을 누른 뒤, '증빙' 버튼을 눌러 회의록 파일을 첨부하고 다시 저장해줘."
The card-approval lookup is the trickiest screen: type the approval number (e.g. 74955789) → Enter → double-click the matching row. A "real user" (실 사용자) field was also added later via the same magnifier → search → double-click pattern.
📥 Phase C · Gather the Inputs Automatically

Instead of typing each receipt's data by hand, the inputs are pulled straight from email and the receipt image.

1

Download the relevant emails & attachments.

✉️ Gmail download script

"Gmail에서 특정 날짜에 온 '신임교원멘토링' 관련 메일들의 내용과 첨부파일을 다운로드하는 파이썬 스크립트를 작성해줘. 각 건의 폴더를 case_1, case_2, ... 로 저장해줘."
Resulting structure:
downloads/
  ├── case_1/ — one receipt claim
  │  ├── body.txt
  │  ├── receipt.jpg
  │  └── 회의록.hwp
  ├── case_2/
  └── ...
The Gmail API needs a one-time OAuth setup (a credentials.json from Google Cloud + adding yourself as a test user to clear the access_denied screen). Claude can walk you through the Cloud console screens.
2

Extract data from the attachments. Read images out of the .hwp meeting minutes, and pull the approval number directly from the receipt.

🔍 Extraction prompts

"회의록.hwp에 들어 있는 사진을 추출해서 볼 수 있어?"
"영수증에서 승인번호를 추출해줘."
⚙️ Phase D · Parameterize & Verify

Turn the one-off script into a reusable tool, and confirm the whole pipeline works.

1

Move inputs into a config file — and fail if they're missing.

📄 Config + fail-fast

"work.py직원번호 · 승인번호 · 파일경로를 입력으로 받게 해줘. 이 값들을 config.py에 두고 work.py가 이를 읽어 실행하도록 해줘. 값 추출에 실패하면 (잘못 작성되면) 실패로 처리해줘."
Final project shape:
receipt-automation/
  ├── login.py — opens login window, saves session
  ├── work.py — drives the portal, reads config.py
  ├── config.py — 직원번호 / 승인번호 / 파일경로
  ├── download_mail.py — Gmail → case_N folders
  └── downloads/ — case_1, case_2, ...
2

Run and verify the full pipeline.

✅ End-to-end check

"로그인 → 이메일 다운로드 → 영수증 처리, 모든 과정을 한 번 더 실행해서 검증해줘."
💡 "너가 한번 해봐" (you try it yourself) — once the steps are solid, let Claude run the entire flow unattended while you watch, then iterate on whatever breaks.
🌟 How to Apply This to Your Own Work

A checklist for turning any of your repetitive tasks into an automation.

1
Pick a task that is repetitive and rule-based

The best first target is something you do the same way every time — forms, data entry, downloading and filing attachments.

2
Demonstrate, don't specify

Walk through the task once with screenshots and plain-language instructions. You don't need to know the page's HTML — just describe what you see.

3
Keep credentials and irreversible actions manual

Log in yourself; review before the final "submit". Automation fills the form — you stay responsible for sending it.

4
Expect iteration on tricky controls

Dropdowns, pop-ups, and look-alike buttons may take a second, more precise instruction. That back-and-forth is the normal cost of building a reliable script.

5
Generalize once it works for one case

Pull the changing values into a config file, make missing data a hard error, then reuse the same script for every case (case_1, case_2, ...).