Scripts at scale

Summary

Who it's for Technical leads running enterprise IT operations — especially during separations, migrations, or compliance programmes where you need to audit and act across multiple platforms simultaneously.

Key observations

The toolchain grew to 60+ scripts covering 8 enterprise platforms. Each script follows the same structural patterns: read-only by default, checkpoint on interrupt, structured output, explicit apply gate.
A local SQLite database coordinates across all scripts — tracking decisions, ownership, and status for every asset, ticket, and credential in scope.
PII detection runs on column names at crawl time, not on data content. That's enough to classify 90% of enterprise datasets without ever reading a row.
The scripts that matter most aren't the clever ones. They're the ones that produce a report a non-technical stakeholder can read and act on.

There's a class of operational work that doesn't fit in a CI pipeline or a scheduled Lambda. Audit 4,000 Entra app registrations for expiring credentials. Classify every SharePoint site by business ownership. Pull 13 months of AWS billing data and map it to cost centres. Catalogue every Power Platform flow, its owner, its trigger type, and its last 7 days of run history. Export a Confluence space. Reconcile Jira tickets against a separation board.

This work ends up in scripts. And if you're running an enterprise IT operation — especially one under time pressure — those scripts stop being utilities and start being the actual operational layer. The question is whether they're reliable enough for that role.

Over the course of a corporate separation programme, I built and ran 60+ Python scripts across 8 enterprise platforms. This is what the toolchain looked like, what patterns held up, and what I'd do differently.

Eight platforms, one coordination database, three output channels. Every script reads from and writes to the same SQLite instance.

The coordination layer

The first thing that broke was coordination. Script A discovers 400 SharePoint sites. Script B pulls membership for each. Script C classifies them by business ownership. Script D generates the separation report. Without a shared state, each script writes its own CSV, and you spend more time reconciling outputs than running audits.

The fix was a local SQLite database. Every script reads from it and writes to it. The schema is simple — teams, ticket decisions, app decisions, enrichment timestamps — but it means any script can answer "what's the current state of this asset?" without re-querying the source system.

-- Core coordination tables
CREATE TABLE ticket_decisions (
    ticket_key      TEXT PRIMARY KEY,   -- board reference
    summary         TEXT,
    bucket          TEXT,               -- must / day1 / tsa / defer / done
    owner_name      TEXT,
    due_date        TEXT,
    decision_notes  TEXT,
    updated_at      TEXT DEFAULT (datetime('now'))
);

CREATE TABLE app_decisions (
    app_id          TEXT PRIMARY KEY,   -- Entra appId (GUID)
    display_name    TEXT,
    decision        TEXT,               -- rotate / novate / decommission / retain
    owner_name      TEXT,
    expires         TEXT,               -- nearest credential expiry
    decision_notes  TEXT,
    updated_at      TEXT DEFAULT (datetime('now'))
);

The bucket field on ticket decisions is the key coordination mechanism. A Jira ticket classified as must shows up in every downstream report that filters by urgency. A credential classified as rotate shows up in the security remediation tracker. The classification happens once, in the database, and every script that needs it reads from the same source.

Microsoft Graph: the 14-script estate

The largest cluster of scripts talks to Microsoft Graph. Entra ID is the identity backbone of most enterprise M365 tenants, and during a separation you need to know everything: who's in which group, which app registrations have expiring secrets, which SharePoint sites belong to which business entity, who has Copilot licences, whether litigation holds are in place, and what the DLP forwarding rules look like.

Each of these is a separate script because each has different API permissions, different rate-limit profiles, and different output consumers. The org tree walker needs User.Read.All. The litigation hold audit needs eDiscovery.Read.All plus Exchange admin access. The DLP audit needs AuditLog.Read.All. Bundling them into one tool would mean requesting every permission for every run, which is both a security risk and a consent headache.

Four domains, 14+ Graph scripts, each with its own permission scope. Separation and reporting scripts read from the coordination database, not from APIs directly.

The two phases: check and act

Every script in the toolchain follows the same structural rule: separate the health-check phase from the remediation phase. They feel like one thing — "find the problem and fix it" — but coupling them is how scripts cause incidents.

A health check is safe to run at any time, on any environment. It reads, aggregates, and reports. A remediation phase makes changes. There's always a gate between them: a dry-run default, a report a human reviews, an explicit --apply flag.

# Every script defaults to read-only
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--write", action="store_true",
                        help="Actually insert into the database (default: dry-run)")
    args = parser.parse_args()

    findings = run_audit()
    print_report(findings)

    if args.write:
        inserted = write_to_db(findings)
        print(f"Wrote {inserted} records.")
    else:
        print(f"Dry run: {len(findings)} records would be written. Pass --write to apply.")

This isn't paranoia. During a separation, you're running scripts against production identity systems with real users. A script that accidentally disables 200 accounts because a filter was wrong is not a hypothetical — it's a Tuesday. The dry-run default means the first run of any script against a new dataset is always safe.

PII detection at crawl time

One of the more useful patterns in the data intelligence layer: detecting PII by column name rather than by scanning data content. When you're crawling SharePoint lists, SQL databases, or file metadata, you rarely need to read actual row data to know whether a dataset contains personal information.

PII_NAME_PATTERNS = {
    "ssn", "social_security", "national_insurance", "nin",
    "salary", "wage", "payroll", "email", "phone", "mobile",
    "address", "postcode", "zip", "name", "first_name",
    "last_name", "full_name", "dob", "date_of_birth",
    "guardian", "nhs_number", "passport", "bank_account",
    "sort_code",
}

def detect_pii_columns(schema: dict[str, str]) -> list[str]:
    """Flag columns whose names match known PII patterns."""
    return [
        col for col in schema
        if any(p in col.lower() for p in PII_NAME_PATTERNS)
    ]

This catches about 90% of PII-bearing datasets in a typical enterprise. The remaining 10% — columns named field_7 that happen to contain national insurance numbers — need manual review. But the automated pass reduces the manual review surface from thousands of datasets to dozens.

The classification feeds directly into a five-tier model: public, internal, confidential, restricted, highly_restricted. Any dataset with PII columns is automatically classified as restricted or above. Governed data operations — copy, move, purge, archive — require explicit approval for anything above internal.

Checkpointing for long-running jobs

The Power Platform health assessment is the best example of why checkpointing matters. It runs in five sequential steps — collect environments, list flows, enrich per-flow metadata, pull 7-day run history, generate report — and the middle steps can take 10–15 minutes each against a large tenant. Network interruptions, token expiry, and rate limits are not edge cases. They're the normal operating condition.

Each step writes a checkpoint file. If the script is interrupted during step 3, re-running it picks up from the last successfully enriched flow, not from the beginning. The checkpoint is a JSON file on disk — nothing sophisticated.

# Chunked into 5 independent steps — run one at a time,
# each writes a checkpoint. Kill any step safely;
# re-run it and it picks up from the checkpoint.
#
# Steps
#   collect   → environments + DLP policies + tenant settings   (~1 min)
#   flows     → list all flows across every environment          (~3-5 min)
#   enrich    → per-flow owner detail, trigger type, solution    (~10-15 min, resumes)
#   runs      → 7-day run history per enabled flow               (~10-15 min, resumes)
#   report    → build full markdown report from all checkpoints  (<1 min)

The same pattern applies to the SharePoint membership pull (hundreds of sites, each requiring a separate Graph API call), the app registration catalogue (thousands of service principals with credential expiry checks), and the AWS billing analysis (13 months of daily cost data across multiple accounts).

Rate limits and backpressure

Microsoft Graph throttles at roughly 10,000 requests per 10 minutes per app. That sounds generous until you're pulling membership for 400 SharePoint sites, each requiring 3–4 API calls. The Power Platform API is more aggressive — some endpoints throttle at 60 requests per minute.

The pattern that survived: exponential backoff with jitter on 429 responses, plus a throttle counter that reports at the end of the run. If a job triggered 200 retries, that's a signal to batch differently next time — not something you want to discover by reading logs after the fact.

def call_graph(url: str, token: str, retries: int = 5) -> dict:
    """GET with exponential backoff on 429 / 503."""
    for attempt in range(retries):
        resp = requests.get(url, headers={"Authorization": f"Bearer {token}"})
        if resp.status_code == 200:
            return resp.json()
        if resp.status_code in (429, 503):
            wait = min(2 ** attempt + random.uniform(0, 1), 60)
            retry_after = resp.headers.get("Retry-After")
            if retry_after:
                wait = max(wait, int(retry_after))
            logger.warning("Throttled (%d), waiting %.1fs", resp.status_code, wait)
            time.sleep(wait)
            continue
        resp.raise_for_status()
    raise RuntimeError(f"Failed after {retries} retries: {url}")

The enrichment pattern

The most common operational pattern across the toolchain: take a dataset from one system, fetch additional context from another, and write the enriched result to the coordination database. SharePoint sites get enriched with business ownership from Entra group metadata. Jira tickets get enriched with separation bucket classifications. App registrations get enriched with credential expiry dates and owner contact details.

Every enrichment writes to the same SQLite instance. Reports read from the database, never from APIs directly.

The design choices that matter for enrichment:

Idempotent writes — enriching the same record twice produces the same result. Upsert, don't insert.
Freshness tracking — a last_enriched_at timestamp on every record. Scripts target stale records rather than re-processing everything.
Source attribution — when a field is set by enrichment, record which system it came from. Conflicting data from multiple sources is a real situation.
Partial enrichment is fine — if one API call fails, write what you have and mark the failed field. Don't discard a whole record because one source timed out.

SharePoint classification at scale

One of the harder problems: classifying every M365 object — users, SharePoint sites, groups, teams — by business ownership during a corporate separation. The rules aren't complex individually, but applying them consistently across thousands of objects requires the classification to be codified, not manual.

# Classification rules (simplified)
#   Users       → ENTITY-OWNED if accountEnabled + matching company
#                  REVIEW if companyName is blank or unknown
#   SharePoint  → ENTITY-OWNED if ownership percentage >= 90%
#                  TSA-SHARED if ownership percentage 50-89%
#                  COUNTERPARTY if ownership percentage < 50%
#   Groups      → classified by member composition
#   Teams       → classified by owning group

def classify_site(site: dict, ownership_pct: float) -> str:
    if ownership_pct >= 0.90:
        return "ENTITY-OWNED"
    elif ownership_pct >= 0.50:
        return "TSA-SHARED"
    else:
        return "COUNTERPARTY"

The output is a CSV that feeds directly into the legal and commercial workstreams. The classification script runs in under a minute against the full inventory because it reads from the coordination database, not from Graph. The expensive API calls happened earlier, in the inventory and membership scripts. By the time classification runs, all the data is local.

AWS: Well-Architected from billing data

A different kind of script: deriving a Well-Architected Framework review from 13 months of billing data, without access to the AWS console or resource-level APIs. This happens when you're assessing an estate you don't administer — you get a cost export and need to produce findings.

The script assigns confidence tiers to each finding:

# Evidence confidence tiers
#   HIGH   — directly observable in billing data
#   MEDIUM — reasonably inferred from billing patterns
#   LOW    — cannot be assessed from billing alone

# Example findings from real billing analysis:
# HIGH:   No savings plans or reserved instances (visible in line items)
# HIGH:   NAT Gateway data processing > $X/month (specific line item)
# MEDIUM: No multi-AZ pattern (single-AZ charges only)
# LOW:    Backup strategy (not visible in billing — needs console access)

The EDP cost allocation script is related: it takes the same billing data and splits it across business entities using a configurable allocation model. Shared services get split by headcount ratio. Direct resources get attributed to the entity that owns them. The output is a PDF report with charts — rendered via Quarto — that goes directly to the finance team.

Power Platform: the 5-step health assessment

Power Platform is the system most likely to surprise you during an enterprise audit. Business users create flows, apps, and connectors without IT involvement. The health assessment script discovers what exists, who owns it, and whether it's governed.

Steps 3 and 4 (✓) resume from checkpoint on interrupt. Kill the script at any point; re-run picks up where it stopped.

The credit spike audit is a companion script that monitors Power Platform consumption credits. A sudden spike usually means someone deployed a flow that runs every minute against a premium connector. The audit catches it before the monthly bill does.

ITSM and Atlassian

Freshservice and Jira serve different roles in the toolchain. Freshservice is the ITSM platform — the scripts pull agent rosters and group memberships into the coordination database so that separation reports can show who handles what. Jira is the programme board — the scripts classify tickets by separation bucket and reconcile board state against the decision database.

Confluence is the publication target. Several scripts generate markdown reports locally, then push them to Confluence pages via the REST API. The local markdown is the source of truth; Confluence is the distribution channel. If Confluence goes down, the reports still exist as files.

# Auto-classify Jira tickets not yet in the decision database
# Strategy: parent epic → bucket mapping + keyword fallback
# All auto-classified rows flagged with decision_notes = "auto:[parent]"
# Override any with: db.py set-ticket XX-123 day1 "reviewed"

def classify_ticket(issue: dict, epic_map: dict) -> str:
    parent = issue.get("parent_key", "")
    if parent in epic_map:
        return epic_map[parent]
    summary = issue.get("summary", "").lower()
    if any(kw in summary for kw in ("credential", "secret", "rotate", "expire")):
        return "must"
    if any(kw in summary for kw in ("licence", "license", "subscription")):
        return "day1"
    return "defer"

The data intelligence layer

Sitting above the individual scripts is a data intelligence module that crawls enterprise data sources — SharePoint, SQL databases, file shares, Salesforce — and builds a catalogue of every dataset, its schema, its PII profile, its classification, and its age. The catalogue lives in DynamoDB with a single-table design, and the crawl results export to S3 as partitioned Parquet for Athena queries.

The valid source types tell the story of what the system connects to:

VALID_SOURCES = {"erp", "salesforce", "sis", "sharepoint", "onedrive", "sql", "files"}

VALID_CLASSIFICATIONS = {
    "public", "internal", "confidential", "restricted", "highly_restricted"
}

VALID_ACTION_TYPES = {"copy", "move", "purge", "archive"}

VALID_ACTION_STATUSES = {
    "pending", "approved", "rejected", "executing", "completed", "failed"
}

The action workflow is the governance layer: a data operation (move this dataset, purge these records, archive this SharePoint site) goes through a request → approval → execution → completion lifecycle. Nothing above internal classification moves without explicit approval. The audit trail is immutable — every state change is recorded with actor, timestamp, before-state, and after-state.

Classification is set at crawl time by PII column detection. Operations on restricted+ data require explicit approval with an immutable audit trail.

Reports that non-technical people can use

The scripts that matter most aren't the clever ones. They're the ones that produce output a non-technical stakeholder can read and act on. The separation report generator reads from the coordination database and produces three documents: a contract register, an IT cost model, and an asset register. Each is a markdown file that renders cleanly in Confluence or as a PDF.

The AWS EDP cost allocation report goes to the finance team as a PDF with charts. The M365 classification report goes to legal as a CSV they can filter in Excel. The Power Platform health assessment goes to the CTO as a markdown document with findings, recommendations, and risk ratings.

The pattern: scripts produce structured data. Report generators consume structured data and produce human-readable documents. The two are separate scripts, not one script that does both. This means you can re-run the report without re-running the audit, and you can change the report format without touching the data collection.

What I'd do differently

SQLite was the right choice for a single-operator toolchain. If more than one person needed to run scripts simultaneously, it would need to be PostgreSQL or DynamoDB. The schema would be the same; the coordination pattern would be the same; the concurrency model would be different.

I'd also invest earlier in a shared config module. Too many scripts had their own .env loading, their own token caching, their own output directory logic. A shared config.py that handles all of that — database path, API tokens, output directories, tenant IDs — would have saved time on every new script.

The agentic layer — using an LLM to help write these scripts — was genuinely useful for the boilerplate: argument parsing, checkpoint logic, rate-limit handling, Graph API pagination. The domain-specific logic — what constitutes a stale record, which classification rules apply, how to split costs across entities — still needed a human to define. The script is the instrument. The judgment about what to measure is still yours.

The honest count: 60+ scripts, 8 platforms, one SQLite database, three output formats. Built over the course of a corporate separation programme. Most of them are under 200 lines. The patterns — dry-run default, checkpoint on interrupt, enrichment via upsert, classification at crawl time — are the same in every one. That's the point. When the patterns are consistent, the scripts are trustworthy. When they're trustworthy, you can run them at 2am without worrying.

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →

The coordination layer

Microsoft Graph: the 14-script estate

The two phases: check and act

PII detection at crawl time

Checkpointing for long-running jobs

Rate limits and backpressure

The enrichment pattern

SharePoint classification at scale

AWS: Well-Architected from billing data

Power Platform: the 5-step health assessment

ITSM and Atlassian

The data intelligence layer

Reports that non-technical people can use

What I'd do differently

Related

Working on something like this?