13 KiB
tracking/ — Job-search tracking and Telegram vacancy pipeline
This folder is the operational layer of the job search: the curated channel registry, the live cursor for incremental Telegram pulls, the staging area for messages awaiting triage, and the long-form logs of applications and outreach.
If you (Claude) are about to do anything related to "find vacancies in Telegram", "scan job channels", "what's new in Jobs", "triage a new channel", or similar — this is the file to read first. The main CLAUDE.md references it from the Telegram workflow section.
Files at a glance
| File | Purpose | In git? |
|---|---|---|
telegram_channels.json |
Curated source of truth — per-channel lang, priority, and filter (include/exclude). Tunable by hand. |
✅ committed |
telegram_state.json |
Per-machine cursor — last_message_id and last_seen_date per channel. Regenerated automatically. |
❌ gitignored |
telegram_inbox.json |
Output of the last fetch run — kept messages only, per channel, with lang/priority injected. Overwritten each run. |
❌ gitignored |
telegram_pending_channels.json |
Generated only when the last run had new (untriaged) channels — keyword-frequency scan to bootstrap their curation. Deleted on the next run if no pending. | ❌ gitignored |
applications.md |
One row per application — manually maintained, append-only. | ✅ committed |
outreach.md |
Cold messages, recruiter pings, follow-ups. One row per touchpoint. | ✅ committed |
Running the pipeline
Two scripts, chainable. Always run from project root.
~/.local/bin/uv run scripts/list_telegram_channels.py \
| ~/.local/bin/uv run scripts/fetch_telegram_jobs.py -
Step 1 — scripts/list_telegram_channels.py: reads the live "Jobs" folder from Telegram via Telethon and emits a JSON array of channel usernames (or numeric ids for private channels) to stdout. Always run fresh — Oleg curates the folder manually and adds new channels regularly.
Step 2 — scripts/fetch_telegram_jobs.py: pulls new messages per channel, applies the per-channel filter, and writes results to telegram_inbox.json. Accepts channels as positional args or as a JSON array on stdin (-).
Account: both scripts connect directly via Telethon using TELEGRAM_SESSION_STRING from .env — that must be the usulsu (main) session. The "Jobs" folder lives on that account. Do not put the samuishechka session there.
Constants in the fetch script
DEFAULT_LOOKBACK_DAYS = 30— first-time lookback window for new channels (no cursor yet).MAX_PER_CHANNEL = 500— hard cap on raw messages fetched per channel per run. A channel that posts >500 messages in the lookback window getstruncated: truein the output and we silently miss the tail. Tune per scenario (see "Truncation" below).
Trigger
Vacancy scans run only when Oleg explicitly asks (e.g. "забери свежее из Jobs", "что нового в каналах"). No background polling.
telegram_channels.json — schema
Each entry is keyed by username (or numeric id for private channels) and is an object:
{
"<channel_id>": {
"lang": "ru" | "en" | "...", // required
"priority": "p1" | "p2" | "p3", // required
"include": <filter_form>, // optional — absent = trust-all (no positive constraint)
"exclude": ["kw1", "kw2", ...] // optional — absent = no negative constraint
}
}
A message passes the filter when:
- No
excludekeyword (case-insensitive substring) is present, AND - Every
includeOR-group contributes at least one match.
If both include and exclude are absent → trust-all (every message passes; useful for low-volume personal/digest channels).
include — the four forms
| Form | Semantics | Example |
|---|---|---|
[] or absent |
trust-all | (no constraint) |
["a", "b"] |
flat OR — at least one matches | ["javascript", "react"] |
[["a", "b"], ["c", "d"]] |
AND of OR-groups — every group needs ≥1 hit | [["#vacancy","#вакансия"], ["#remote","#удаленка"]] |
[["a","b"], "c"] |
scalars auto-promoted to single-item groups | same as [["a","b"], ["c"]] |
exclude — flat list
If any keyword in exclude appears in the text → the message is rejected, even if include would have matched. Used to drop wrong-stack postings from generic channels.
Standard Oleg-stack excludes for jobs feeds:
["kafka", "golang", "kotlin", "android", "swift"]
For *_jobs channels with hashtag-based filters, add resume excludes too:
["kafka", "golang", "kotlin", "android", "swift", "#резюме", "#resume", "#cv", "#ищуработу"]
Pitfalls
- Case-insensitive substring matching, no word boundaries.
"go"matches "going" / "Goldbelt" / "google" — that's why we use"golang"instead. Same trap for"java"(matches "javascript"); use" java "with spaces, or"#java "for hashtag form. For multi-word excludes, pad:" rust "," ios ". react nativeinexcludewould also block"react native"mentions in fullstack postings. Prefer excludingkotlin/android/swift/flutterto block mobile, and only block React Native when the channel is mobile-only.- The same keyword can appear in
includefor one channel andexcludefor another — they're per-channel, independent.
Priority levels
Set on every channel. Assignment is judged by the best vacancy seen in a fresh fetch for that channel — not by volume or hashtag density.
| Level | Meaning | Triage attention |
|---|---|---|
| p1 | Very relevant — strong stack hits and global-remote culture. Posts that Oleg would actually apply to. | Read every kept message. |
| p2 | Stack OK but culture is internal market (Russian RUB/CIS-only roles), or culture OK but salary band typically misses Oleg's threshold (US-only with low pay, Netherlands on-site). Worth periodic scanning — occasional gems. Market-intel channels (recruiter content) live here too. | Skim, dive into interesting headlines. |
| p3 | Wrong stack (mobile-native, devops, QA), off-market (Nigeria with ₦ salaries, Netherlands junior on-site), pure chat/noise, founder lifestyle blogs, or dead channels (0 messages in lookback). Subscribed for completeness — Oleg may pivot or want occasional glance. | Glance only on request. |
When triaging the inbox, sort/group by priority first, then by lang.
Language codes
Free-form short ISO-style codes — pick what fits:
ru— Russian (most curated channels)en— Englishmixed— multi-language channel, when you can't pick a primarynl,de, etc. — for regional boards
This isn't strict; it's a hint for triage attention (Oleg reads ru and en fluently; everything else needs translation overhead).
Triaging a new channel — full procedure
A "new" channel = one that's in the Telegram "Jobs" folder but doesn't have an entry in telegram_channels.json. Detected automatically: the fetch script puts its raw messages into telegram_inbox.json unfiltered and writes a keyword-frequency scan to telegram_pending_channels.json.
Steps to graduate a channel out of pending:
- Read
telegram_pending_channels.json— for each new channel:keyword_counts_from_other_channels: how often every existing keyword (include + exclude across all channels) appears in this channel's recent messages. Quick signal of stack and posting style.messages_scanned,first_run,truncated: volume context.
- Open
telegram_inbox.jsonand sample 3–8 messages from this channel directly:
Look for: hashtag patterns, language, post structure (single role vs digest vs chat), recurring noise types.jq -r '.channels["<channel>"].messages[:5] | .[] | "── \(.date[0:16])\n\(.text[0:400])\n"' tracking/telegram_inbox.json - Decide
langandpriorityusing the rubrics above. Base priority on the best vacancy in the sample, not the average. - Decide filter shape:
- Channel posts proper
#vacancy/#вакансия+#remote/#удаленкаtags → use the standard hashtag AND-of-OR + Oleg-stack excludes (most *_jobs channels). - Channel posts vacancy text without consistent hashtags → use positive stack include (
["javascript", "typescript", "react", ...]) + the same Oleg-stack excludes. - Channel is low-volume personal/curated content (recruiter musings, market intel) where the value is the whole post → trust-all (omit
includeandexclude). - Channel is a digest that mixes resumes and vacancies (e.g.
javascript_jobs_feed) → trust-all is usually the right call; filteringрезюмеwould drop the whole digest. - Channel is mostly noise/wrong stack but worth keeping subscribed → strict positive filter, accept that most runs will return 0.
- Channel posts proper
- Add the entry to
telegram_channels.json. JSON is hand-edited; keep entries ordered byprioritythen alphabetically for readability. - Rerun the chain. The channel transitions out of pending. The
telegram_pending_channels.jsonfile is automatically deleted when no pending channels remain. - Validate — sample the new
keptmessages and verify nothing wrong is passing or being dropped. If the filter is wrong, edit and rerun (state cursor is fine to keep — incremental fetches re-filter only new messages, so to validate the filter on history you may want to clear state for that channel:jq 'del(.<channel>)' tracking/telegram_state.json).
Sanity-check existing filters
When tuning, always:
- Sample
keptmessages — are they all valid for Oleg? - For channels with
kept == 0, verify with an unfiltered pull (temporarily remove the channel's entry and rerun for it alone) that nothing legitimate is being thrown away. Don't assume 0 = correct without checking.
Truncation — when the 500-message cap bites
A channel with "truncated": true in telegram_inbox.json had >500 raw messages in the lookback window. We see the most-recent 500 and silently miss the tail (older portion of the window).
For *_jobs Russian channels truncation typically means we covered 1–10 days of a 30-day window. Strict hashtag filters then leave 1–7 kept messages — but the missed older messages could contain relevant vacancies.
Options:
- Bump
MAX_PER_CHANNELglobally (more API calls, longer run). - Narrow lookback for the busy channel (no per-channel knob today — would require a code change).
- Tune the filter to be stricter so fewer raw messages need processing — only useful if the filter applies at the API level, which substring filters don't.
For now, keep the cap and accept the tail loss for very busy channels; relax only when a specific channel justifies it.
Output of a fetch run
telegram_inbox.json structure (overwritten each run):
{
"generated_at": "2026-06-02T...",
"lookback_days_for_new_channels": 30,
"total_in_inbox": <int>,
"channels": {
"<channel>": {
"lang": "ru" | "en" | null, // null = channel is still "new" / pending
"priority": "p1" | "p2" | "p3" | null,
"seen": <int>, // raw messages fetched
"kept": <int>, // after filter
"filtered_out": <int>,
"first_run": <bool>, // no prior state cursor
"truncated": <bool>, // hit MAX_PER_CHANNEL
"filter_mode": "filtered (...)" | "trust-all (no filter)" | "unfiltered (new channel — not yet curated)",
"messages": [
{ "id": <int>, "date": "<ISO>", "text": "...", "has_media": <bool>, "link": "https://t.me/.../id" }
]
}
}
}
Messages are chronological per channel (oldest first within each channel).
Useful jq probes
# Per-channel summary sorted by kept desc
jq -r '.channels | to_entries | sort_by(.value.kept) | reverse | .[]
| "\(.key) → kept \(.value.kept)/\(.value.seen) [\(.value.priority // "—")/\(.value.lang // "—")]"' \
tracking/telegram_inbox.json
# All p1 kept messages
jq '.channels | to_entries | map(select(.value.priority == "p1")) | from_entries' \
tracking/telegram_inbox.json
# Truncated channels with depth analysis
jq -r '.channels | to_entries | map(select(.value.truncated))
| .[] | "\(.key): kept \(.value.kept)/\(.value.seen), priority \(.value.priority)"' \
tracking/telegram_inbox.json
After triage
Promising postings → append a row to applications.md. Don't accumulate a "seen but skipped" log — the state cursor already prevents re-reading.
For outreach (cold DMs, recruiter conversations) → outreach.md, one row per touchpoint.
If Oleg unsubscribes from a channel in Telegram, it disappears from the live folder list, the next run won't fetch it, and its entry in telegram_channels.json becomes dead weight. Periodic cleanup is fine but not required — dead entries cost ~150 bytes.