update

2026-05-27 20:35:21 +07:00 · 2026-05-27 20:35:21 +07:00 · 57ab1cde1a
parent 28dfdfbe13
commit 57ab1cde1a
5 changed files with 180 additions and 0 deletions
--- a/.claude/skills/chrome-session/SKILL.md
+++ b/.claude/skills/chrome-session/SKILL.md
@ -0,0 +1,133 @@
+---
+name: chrome-session
+description: Use this skill to drive a real Chrome instance that is signed in to Oleg's accounts (LinkedIn, Gmail, GitHub, HR platforms, etc.) via the chrome-devtools MCP. The browser is launched against a forked copy of his Chrome profile, so authenticated pages, cookies, and saved sessions are available — letting you read job postings on auth-walled HR platforms, scrape logged-in LinkedIn feeds, inspect application-form fields, and pull data from any site where Oleg is already signed in. Also use this skill as the automatic fallback when WebFetch/curl/standard tools fail to retrieve a page (bot blocking, 403, JS-only render, login wall).
+---
+
+# Chrome Session
+
+This skill operates a real Chrome browser, attached via the `chrome-devtools` MCP, using a local copy of Oleg's signed-in Chrome profile. It exists so you can fetch information from services where Oleg is logged in — without him having to copy-paste content manually.
+
+## When to use
+
+Use this skill when any of the following apply:
+
+- The user asks you to open / read / search something on **LinkedIn** (feed, profiles, jobs, messages, notifications).
+- The user asks to read **job descriptions** or **application forms** on HR platforms (Greenhouse, Lever, Workable, Ashby, SmartRecruiters, Workday, LinkedIn Jobs, Wellfound, YC Work at a Startup, company career pages, etc.).
+- The user asks to **search for jobs** by criteria, scroll a feed, or pull a list of postings.
+- Any page is **gated by login**, and Oleg is likely already signed in (Gmail, GitHub private repos, Notion, Trello UI, Slack web, etc.).
+- A **standard fetch tool failed** — WebFetch returned bot-blocking errors, 401/403, empty body because JS isn't executed, captcha, "please log in", or similar. In that case switch to this skill immediately, without asking for confirmation.
+
+Do NOT use this skill for:
+
+- Pages reachable anonymously where WebFetch works fine.
+- Anything destructive (don't submit forms, don't send messages) unless the user explicitly asked.
+
+## Setup overview
+
+- **Launcher script**: `scripts/launch-chrome.sh` — starts Chrome with `--remote-debugging-port=9222` against the local profile copy at `.chrome/Default`.
+- **Profile copy**: `.chrome/` is a fork of Oleg's `Profile 1` (the `usulpro@gmail.com` profile). Logged-in cookies live here. The directory is gitignored.
+- **MCP server**: `chrome-devtools` is configured in `.mcp.json` with `--browserUrl=http://127.0.0.1:9222`, i.e. it attaches to the already-running Chrome rather than launching its own.
+
+You do not launch Chrome yourself — Oleg starts it manually via the script. Your job is to detect whether it's running and to use the MCP tools to drive it.
+
+## Standard workflow
+
+### 1. Verify Chrome is running and attached
+
+Before any browser action, confirm the debug port is live:
+
+```bash
+curl -sf http://127.0.0.1:9222/json/version >/dev/null && echo OK || echo DOWN
+```
+
+If `DOWN`, tell Oleg:
+
+> Chrome не запущен с debug-портом. Запусти `./scripts/launch-chrome.sh` и скажи когда готово.
+
+Do NOT try to launch Chrome yourself — the script blocks the terminal and the user is the one who controls the browser window.
+
+### 2. Open / select the right page
+
+- Use `mcp__chrome-devtools__list_pages` to see open tabs.
+- Use `mcp__chrome-devtools__new_page` with the target URL to open a new tab. If navigation times out (10s default), it's usually fine — the tab still opens; check `list_pages` to confirm the final URL.
+- Use `mcp__chrome-devtools__select_page` to switch context if you need to drive a tab other than the selected one.
+
+### 3. Verify the page actually loaded what was expected
+
+After navigation, **always** confirm you landed on the expected URL/state before extracting content. Check:
+
+- Final URL — if LinkedIn redirected `/feed/` → `/login` or `/checkpoint/...`, Oleg is not signed in on that site.
+- A small `evaluate_script` to check `document.title` / a known DOM marker.
+
+If you detect a login wall, a "session expired" page, a captcha, or any other "you are not signed in" state — **stop and tell Oleg immediately**:
+
+> Не получается открыть `<URL>` — похоже, я не залогинен (`<observed state>`). Залогинься в этом Chrome-инстансе вручную и скажи, когда готово.
+
+Same rule for soft-fail cases (bot wall, "are you human?", 429, sudden empty body).
+
+### 4. Extract content
+
+Prefer `evaluate_script` for structured extraction. Snapshots (`take_snapshot`) give you the a11y tree with uids and are useful for finding clickable elements and form fields.
+
+#### Scrolling — important detail for LinkedIn and similar SPAs
+
+`window.scrollTo` / `PageDown` / `End` often do **nothing** because the real scroll container is an inner element (LinkedIn uses `<main>`; some apps use a div with custom `overflow:auto`). To trigger lazy loading:
+
+1. Find the scrollable container:
+   ```js
+   const scrollables = [...document.querySelectorAll('*')].filter(el => {
+     const s = getComputedStyle(el);
+     return (s.overflowY === 'auto' || s.overflowY === 'scroll')
+       && el.scrollHeight > el.clientHeight + 50;
+   });
+   ```
+2. Increment its `scrollTop` in steps (e.g. +1200px), dispatch a `wheel` event for safety, and `await` ~700-900ms between steps so lazy chunks can fetch.
+3. Re-check `scrollHeight` after each step — it grows as new content streams in.
+
+For LinkedIn feed specifically: the container is `<main>`, posts are marked by `<h2>` with text `"Feed post"`, and the post container is the nearest ancestor whose `innerText.length` exceeds a few hundred characters.
+
+#### Form-field inspection (job applications, etc.)
+
+When the user asks "what fields does this application form want?", use `take_snapshot` to enumerate inputs by their accessibility labels — that maps cleanly to "Name / Email / Resume / Cover letter / Custom question 1 / ...". Don't fill or submit anything unless Oleg explicitly tells you to.
+
+### 5. Be efficient with content
+
+- Cap extracted text per item (e.g. `.slice(0, 2500)`) — LinkedIn posts and job descriptions can be very long and you'll usually summarize anyway.
+- Deduplicate by container element when scanning lists.
+- Return data as JSON from `evaluate_script` — it's structured and easy to summarize afterwards.
+
+## Failure modes and how to react
+
+| Symptom | Likely cause | Action |
+|---------|--------------|--------|
+| `curl http://127.0.0.1:9222/json/version` fails | Chrome not launched with debug port | Ask Oleg to run `./scripts/launch-chrome.sh` |
+| `new_page` times out but the tab appears in `list_pages` | Slow site / heavy SPA | Continue — check the final URL via `list_pages` |
+| Final URL is `/login`, `/checkpoint`, `/signin` | Not signed in | **Stop. Tell Oleg.** Don't try to log in for him. |
+| Feed/list shows only the first few items | Lazy loading hasn't fired | Scroll the inner container, not `window` |
+| `evaluate_script` returns `count: 0` for known content | Wrong selectors (sites change markup) | Dump `document.body.innerText.slice(0, 500)` and a structural probe before guessing more selectors |
+| Captcha / "verify you're human" / 429 | Bot detection or rate limit | Stop. Tell Oleg. Don't retry in a loop. |
+
+## Don'ts
+
+- **Don't launch Chrome yourself** (the launcher blocks the terminal; Oleg owns the window).
+- **Don't try to sign Oleg in** when you hit a login wall — just report and wait.
+- **Don't submit forms, click "Apply", send messages, or post anything** unless explicitly instructed.
+- **Don't loop on bot-detection pages** — one detection means stop.
+- **Don't commit anything from `.chrome/`** — it's gitignored, keep it that way.
+- **Don't write secrets, cookies, or tokens** anywhere outside the running browser. If `evaluate_script` returns sensitive data, summarize, don't echo it back verbatim.
+
+## Example: "read first N LinkedIn posts"
+
+1. `curl -sf http://127.0.0.1:9222/json/version` → confirm Chrome up.
+2. `list_pages` — is LinkedIn already open? If not, `new_page(https://www.linkedin.com/feed/)`.
+3. Verify final URL contains `/feed/` and not `/login`.
+4. `evaluate_script`: find `<main>`, step `scrollTop += 1200` with 800ms waits until enough `"Feed post"` h2s have appeared.
+5. `evaluate_script`: extract `{actor, text}` for the first N posts (slice text to ~2500 chars).
+6. Summarize in Russian to Oleg.
+
+## Example: "what fields does this Greenhouse form want?"
+
+1. `new_page(<job-application-url>)`.
+2. `take_snapshot` — read the accessibility tree, list every `textbox`, `combobox`, `radio`, `file upload`, `checkbox` with its label.
+3. Report the fields grouped: required vs optional (snapshot exposes `required` state), grouped by section if the form has fieldsets.
+4. Do not fill or submit. Wait for Oleg's instruction.
--- a/.gitignore
+++ b/.gitignore
@ -13,3 +13,6 @@ output/html/

 # Puppeteer download cache
 .cache/
+
+# Local Chrome profile (contains cookies, logins — never commit)
+.chrome/
--- a/.mcp.json
+++ b/.mcp.json
@ -0,0 +1,12 @@
+{
+  "mcpServers": {
+    "chrome-devtools": {
+      "command": "npx",
+      "args": [
+        "-y",
+        "chrome-devtools-mcp@latest",
+        "--browserUrl=http://127.0.0.1:9222"
+      ]
+    }
+  }
+}
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -10,6 +10,16 @@ The visual format of every generated PDF must match `base/reference/Oleg_Proskur

 ---

+## Browser access for logged-in sites
+
+This project ships a Chrome session bound to Oleg's signed-in profile. Use it via the **`chrome-session` skill** (`.claude/skills/chrome-session/`). It is the right tool for: reading LinkedIn (feed, profiles, jobs), inspecting job descriptions and application forms on HR platforms (Greenhouse, Lever, Workable, Ashby, Workday, etc.), searching jobs, and any page where Oleg is already logged in.
+
+**Automatic fallback rule** — if a standard fetch tool (WebFetch, `curl`, etc.) fails to retrieve a page because of bot blocking, 401/403, captcha, JS-only render, or a login wall, switch to the `chrome-session` skill **immediately, without asking permission**. Reading the page through Oleg's logged-in Chrome is the expected behavior for this project.
+
+If the skill itself can't get the content (Chrome not running, or Oleg is not signed in on that site), stop and report — don't loop on retries and don't try to sign him in.
+
+---
+
 ## Folder layout

 ```
--- a/scripts/launch-chrome.sh
+++ b/scripts/launch-chrome.sh
@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
+USER_DATA_DIR="$PROJECT_DIR/.chrome"
+DEBUG_PORT="${CHROME_DEBUG_PORT:-9222}"
+
+if [ ! -d "$USER_DATA_DIR" ]; then
+  echo "Error: $USER_DATA_DIR does not exist. Copy your profile there first." >&2
+  exit 1
+fi
+
+if curl -sf "http://127.0.0.1:${DEBUG_PORT}/json/version" >/dev/null 2>&1; then
+  echo "Chrome already listening on port ${DEBUG_PORT}." >&2
+  exit 0
+fi
+
+exec google-chrome \
+  --remote-debugging-port="$DEBUG_PORT" \
+  --user-data-dir="$USER_DATA_DIR" \
+  "$@"