skill / process-datasheets
!

Not installable via adompkg

This skill has no published release. adompkg install kyle/process-datasheets will not work until a maintainer publishes a tarball with install.sh and uninstall.sh.

See the publishing docs for the package.json schema and tarball layout required to ship this skill.


name: process-datasheets
description: "Batch-process datasheets from the shared queue. Claims items, parses PDFs into wiki markdown, publishes to the Adom Wiki, and loops until a time budget expires or the queue is empty. Use with: /process-datasheets --until 17:00"
user-invocable: true

Process Datasheets

Batch-process datasheets from the shared parsing queue. Claims the next highest-priority item, parses it using the datasheet-parser workflow, publishes to the Adom Wiki, then loops until time runs out or the queue is empty.

Prerequisites

  • datasheet-queue MCP server configured (provides claim_datasheet, submit_result, release_datasheet, fail_datasheet, check_time_budget, heartbeat_datasheet tools)
  • poppler-utils and imagemagick installed
  • adom-wiki CLI available

Usage

/process-datasheets --until 17:00
/process-datasheets --until 17:00 --interval 30m
/process-datasheets --count 3

Arguments

  • --until <HH:MM or ISO> — Stop processing at this time (default: 1 hour from now)
  • --count <N> — Process at most N datasheets then stop
  • --interval <Nm> — (For use with /loop) Pause between runs; the skill processes one item per invocation when this is set

Agent Identity

Determine your agent ID from the container hostname:

AGENT_ID=$(hostname)

Use this as agent_id in all MCP tool calls.

Workflow

Step 0: Parse Arguments

Extract --until, --count, and --interval from the skill arguments. Defaults:

  • until: 1 hour from now
  • count: unlimited
  • interval: not set (process continuously)

Step 1: Check Time Budget

Before claiming anything, verify there's time left:

check_time_budget({ stop_time: "<until value>", min_minutes_needed: 10 })

If continue is false, stop immediately — do not claim an item you can't finish.

Step 2: Claim Next Datasheet

claim_datasheet({ agent_id: "<AGENT_ID>" })

If the queue is empty, stop — there's nothing to do.

Save the returned item.id, item.part_name, and item.pdf_url for the next steps.

Step 3: Parse the Datasheet

Follow the datasheet-parser skill workflow (Steps 1–7):

  1. Download the PDF — use item.pdf_url if provided, otherwise WebSearch for it
  2. Extract text with pdftotext
  3. Extract and classify images with pdfimages
  4. Optimize images for web upload (resize, compress)
  5. Generate wiki markdown — structured .md matching the wiki renderer format
  6. Prepare metadata JSON — manufacturer, part number, packages, etc.
  7. Publish to wikiadom-wiki page publish + metadata API call + asset uploads

During long parses, send a heartbeat every 10 minutes to prevent claim timeout:

heartbeat_datasheet({ item_id: <id>, agent_id: "<AGENT_ID>" })

Step 4: Submit Result

After successful publish:

submit_result({ item_id: <id>, agent_id: "<AGENT_ID>", wiki_slug: "datasheets/<partname>" })

Step 5: Handle Failures

If parsing fails at any step:

  • Recoverable (network timeout, temp file issue) → release back to queue:

    release_datasheet({ item_id: <id>, agent_id: "<AGENT_ID>" })
    
  • Permanent (corrupt PDF, no extractable data, part doesn't exist) → mark failed:

    fail_datasheet({ item_id: <id>, agent_id: "<AGENT_ID>", reason: "Detailed error description" })
    

Step 6: Loop or Stop

After completing (or failing) one datasheet:

  1. Decrement --count if set. If count reaches 0, stop.
  2. If --interval is set, stop (the /loop scheduler will re-invoke).
  3. Call check_time_budget again. If continue is false, stop.
  4. If the queue had items and time remains, go back to Step 2.

Error Recovery

Situation Action
PDF download fails (404, timeout) fail_datasheet with reason
pdftotext produces no output fail_datasheet — likely a scanned/image PDF
adom-wiki page publish fails release_datasheet — might be a transient wiki issue
Agent crashes mid-parse Queue auto-releases after 30min timeout
Time budget expired mid-parse release_datasheet — let another agent finish later

Example Session

> /process-datasheets --until 17:00

Checking time budget... 2h 45m remaining, 8 items pending.

Claiming next datasheet...
  Claimed #5 — STM32F103 [P10]
  PDF: https://www.st.com/resource/en/datasheet/stm32f103c8.pdf

Downloading PDF... done (1.2MB)
Extracting text... done (45 pages)
Extracting images... 23 images found
Classifying and optimizing... 12 key diagrams selected
Generating wiki markdown... done
Publishing to wiki... done (datasheets/stm32f103)
Uploading 12 diagram assets... done

Submitted result for #5.

Checking time budget... 2h 12m remaining, 7 items pending.
Claiming next datasheet...
  Claimed #8 — ESP32-S3 [P20]
  ...

Scheduling

Pattern A: Single session with /loop

/loop 30m /process-datasheets --until 17:00 --interval 30m

Every 30 minutes, the skill claims and processes one item. Stops when the clock hits 17:00.

Pattern B: Scheduled task (persistent)

/schedule every 30 minutes /process-datasheets --until 17:00 --interval 30m

Survives session restarts. Each trigger processes one item.

Pattern C: Continuous until done

/process-datasheets --until 17:00

Processes items back-to-back until time runs out or the queue is empty.

MCP Server Configuration

Users add this to their .mcp.json or settings:

{
  "mcpServers": {
    "datasheet-queue": {
      "url": "https://6kgcmtonzymg.adom.cloud/mcp"
    }
  }
}