There's a class of operational work that doesn't fit in a CI pipeline or a scheduled Lambda. Audit 4,000 Entra app registrations for expiring credentials. Classify every SharePoint site by business ownership. Pull 13 months of AWS billing data and map it to cost centres. Catalogue every Power Platform flow, its owner, its trigger type, and its last 7 days of run history. Export a Confluence space. Reconcile Jira tickets against a separation board.
This work ends up in scripts. And if you're running an enterprise IT operation — especially one under time pressure — those scripts stop being utilities and start being the actual operational layer. The question is whether they're reliable enough for that role.
Over the course of a corporate separation programme, I built and ran 60+ Python scripts across 8 enterprise platforms. This is what the toolchain looked like, what patterns held up, and what I'd do differently.
The coordination layer
The first thing that broke was coordination. Script A discovers 400 SharePoint sites. Script B pulls membership for each. Script C classifies them by business ownership. Script D generates the separation report. Without a shared state, each script writes its own CSV, and you spend more time reconciling outputs than running audits.
The fix was a local SQLite database. Every script reads from it and writes to it. The schema is simple — teams, ticket decisions, app decisions, enrichment timestamps — but it means any script can answer "what's the current state of this asset?" without re-querying the source system.
-- Core coordination tables
CREATE TABLE ticket_decisions (
ticket_key TEXT PRIMARY KEY, -- board reference
summary TEXT,
bucket TEXT, -- must / day1 / tsa / defer / done
owner_name TEXT,
due_date TEXT,
decision_notes TEXT,
updated_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE app_decisions (
app_id TEXT PRIMARY KEY, -- Entra appId (GUID)
display_name TEXT,
decision TEXT, -- rotate / novate / decommission / retain
owner_name TEXT,
expires TEXT, -- nearest credential expiry
decision_notes TEXT,
updated_at TEXT DEFAULT (datetime('now'))
);
The bucket field on ticket decisions is the key coordination mechanism.
A Jira ticket classified as must shows up in every downstream report
that filters by urgency. A credential classified as rotate shows up
in the security remediation tracker. The classification happens once, in the database,
and every script that needs it reads from the same source.
Microsoft Graph: the 14-script estate
The largest cluster of scripts talks to Microsoft Graph. Entra ID is the identity backbone of most enterprise M365 tenants, and during a separation you need to know everything: who's in which group, which app registrations have expiring secrets, which SharePoint sites belong to which business entity, who has Copilot licences, whether litigation holds are in place, and what the DLP forwarding rules look like.
Each of these is a separate script because each has different API permissions,
different rate-limit profiles, and different output consumers. The org tree walker
needs User.Read.All. The litigation hold audit needs
eDiscovery.Read.All plus Exchange admin access. The DLP audit needs
AuditLog.Read.All. Bundling them into one tool would mean requesting
every permission for every run, which is both a security risk and a consent headache.
The two phases: check and act
Every script in the toolchain follows the same structural rule: separate the health-check phase from the remediation phase. They feel like one thing — "find the problem and fix it" — but coupling them is how scripts cause incidents.
A health check is safe to run at any time, on any environment. It reads, aggregates,
and reports. A remediation phase makes changes. There's always a gate between them:
a dry-run default, a report a human reviews, an explicit --apply flag.
# Every script defaults to read-only
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--write", action="store_true",
help="Actually insert into the database (default: dry-run)")
args = parser.parse_args()
findings = run_audit()
print_report(findings)
if args.write:
inserted = write_to_db(findings)
print(f"Wrote {inserted} records.")
else:
print(f"Dry run: {len(findings)} records would be written. Pass --write to apply.")
This isn't paranoia. During a separation, you're running scripts against production identity systems with real users. A script that accidentally disables 200 accounts because a filter was wrong is not a hypothetical — it's a Tuesday. The dry-run default means the first run of any script against a new dataset is always safe.
PII detection at crawl time
One of the more useful patterns in the data intelligence layer: detecting PII by column name rather than by scanning data content. When you're crawling SharePoint lists, SQL databases, or file metadata, you rarely need to read actual row data to know whether a dataset contains personal information.
PII_NAME_PATTERNS = {
"ssn", "social_security", "national_insurance", "nin",
"salary", "wage", "payroll", "email", "phone", "mobile",
"address", "postcode", "zip", "name", "first_name",
"last_name", "full_name", "dob", "date_of_birth",
"guardian", "nhs_number", "passport", "bank_account",
"sort_code",
}
def detect_pii_columns(schema: dict[str, str]) -> list[str]:
"""Flag columns whose names match known PII patterns."""
return [
col for col in schema
if any(p in col.lower() for p in PII_NAME_PATTERNS)
]
This catches about 90% of PII-bearing datasets in a typical enterprise.
The remaining 10% — columns named field_7 that happen to contain
national insurance numbers — need manual review. But the automated pass
reduces the manual review surface from thousands of datasets to dozens.
The classification feeds directly into a five-tier model:
public, internal, confidential,
restricted, highly_restricted. Any dataset with
PII columns is automatically classified as restricted or above.
Governed data operations — copy, move, purge, archive — require explicit
approval for anything above internal.
Checkpointing for long-running jobs
The Power Platform health assessment is the best example of why checkpointing matters. It runs in five sequential steps — collect environments, list flows, enrich per-flow metadata, pull 7-day run history, generate report — and the middle steps can take 10–15 minutes each against a large tenant. Network interruptions, token expiry, and rate limits are not edge cases. They're the normal operating condition.
Each step writes a checkpoint file. If the script is interrupted during step 3, re-running it picks up from the last successfully enriched flow, not from the beginning. The checkpoint is a JSON file on disk — nothing sophisticated.
# Chunked into 5 independent steps — run one at a time,
# each writes a checkpoint. Kill any step safely;
# re-run it and it picks up from the checkpoint.
#
# Steps
# collect → environments + DLP policies + tenant settings (~1 min)
# flows → list all flows across every environment (~3-5 min)
# enrich → per-flow owner detail, trigger type, solution (~10-15 min, resumes)
# runs → 7-day run history per enabled flow (~10-15 min, resumes)
# report → build full markdown report from all checkpoints (<1 min)
The same pattern applies to the SharePoint membership pull (hundreds of sites, each requiring a separate Graph API call), the app registration catalogue (thousands of service principals with credential expiry checks), and the AWS billing analysis (13 months of daily cost data across multiple accounts).
Rate limits and backpressure
Microsoft Graph throttles at roughly 10,000 requests per 10 minutes per app. That sounds generous until you're pulling membership for 400 SharePoint sites, each requiring 3–4 API calls. The Power Platform API is more aggressive — some endpoints throttle at 60 requests per minute.
The pattern that survived: exponential backoff with jitter on 429 responses, plus a throttle counter that reports at the end of the run. If a job triggered 200 retries, that's a signal to batch differently next time — not something you want to discover by reading logs after the fact.
def call_graph(url: str, token: str, retries: int = 5) -> dict:
"""GET with exponential backoff on 429 / 503."""
for attempt in range(retries):
resp = requests.get(url, headers={"Authorization": f"Bearer {token}"})
if resp.status_code == 200:
return resp.json()
if resp.status_code in (429, 503):
wait = min(2 ** attempt + random.uniform(0, 1), 60)
retry_after = resp.headers.get("Retry-After")
if retry_after:
wait = max(wait, int(retry_after))
logger.warning("Throttled (%d), waiting %.1fs", resp.status_code, wait)
time.sleep(wait)
continue
resp.raise_for_status()
raise RuntimeError(f"Failed after {retries} retries: {url}")
The enrichment pattern
The most common operational pattern across the toolchain: take a dataset from one system, fetch additional context from another, and write the enriched result to the coordination database. SharePoint sites get enriched with business ownership from Entra group metadata. Jira tickets get enriched with separation bucket classifications. App registrations get enriched with credential expiry dates and owner contact details.
The design choices that matter for enrichment:
- Idempotent writes — enriching the same record twice produces the same result. Upsert, don't insert.
- Freshness tracking — a
last_enriched_attimestamp on every record. Scripts target stale records rather than re-processing everything. - Source attribution — when a field is set by enrichment, record which system it came from. Conflicting data from multiple sources is a real situation.
- Partial enrichment is fine — if one API call fails, write what you have and mark the failed field. Don't discard a whole record because one source timed out.
SharePoint classification at scale
One of the harder problems: classifying every M365 object — users, SharePoint sites, groups, teams — by business ownership during a corporate separation. The rules aren't complex individually, but applying them consistently across thousands of objects requires the classification to be codified, not manual.
# Classification rules (simplified)
# Users → ENTITY-OWNED if accountEnabled + matching company
# REVIEW if companyName is blank or unknown
# SharePoint → ENTITY-OWNED if ownership percentage >= 90%
# TSA-SHARED if ownership percentage 50-89%
# COUNTERPARTY if ownership percentage < 50%
# Groups → classified by member composition
# Teams → classified by owning group
def classify_site(site: dict, ownership_pct: float) -> str:
if ownership_pct >= 0.90:
return "ENTITY-OWNED"
elif ownership_pct >= 0.50:
return "TSA-SHARED"
else:
return "COUNTERPARTY"
The output is a CSV that feeds directly into the legal and commercial workstreams. The classification script runs in under a minute against the full inventory because it reads from the coordination database, not from Graph. The expensive API calls happened earlier, in the inventory and membership scripts. By the time classification runs, all the data is local.
AWS: Well-Architected from billing data
A different kind of script: deriving a Well-Architected Framework review from 13 months of billing data, without access to the AWS console or resource-level APIs. This happens when you're assessing an estate you don't administer — you get a cost export and need to produce findings.
The script assigns confidence tiers to each finding:
# Evidence confidence tiers
# HIGH — directly observable in billing data
# MEDIUM — reasonably inferred from billing patterns
# LOW — cannot be assessed from billing alone
# Example findings from real billing analysis:
# HIGH: No savings plans or reserved instances (visible in line items)
# HIGH: NAT Gateway data processing > $X/month (specific line item)
# MEDIUM: No multi-AZ pattern (single-AZ charges only)
# LOW: Backup strategy (not visible in billing — needs console access)
The EDP cost allocation script is related: it takes the same billing data and splits it across business entities using a configurable allocation model. Shared services get split by headcount ratio. Direct resources get attributed to the entity that owns them. The output is a PDF report with charts — rendered via Quarto — that goes directly to the finance team.
Power Platform: the 5-step health assessment
Power Platform is the system most likely to surprise you during an enterprise audit. Business users create flows, apps, and connectors without IT involvement. The health assessment script discovers what exists, who owns it, and whether it's governed.
The credit spike audit is a companion script that monitors Power Platform consumption credits. A sudden spike usually means someone deployed a flow that runs every minute against a premium connector. The audit catches it before the monthly bill does.
ITSM and Atlassian
Freshservice and Jira serve different roles in the toolchain. Freshservice is the ITSM platform — the scripts pull agent rosters and group memberships into the coordination database so that separation reports can show who handles what. Jira is the programme board — the scripts classify tickets by separation bucket and reconcile board state against the decision database.
Confluence is the publication target. Several scripts generate markdown reports locally, then push them to Confluence pages via the REST API. The local markdown is the source of truth; Confluence is the distribution channel. If Confluence goes down, the reports still exist as files.
# Auto-classify Jira tickets not yet in the decision database
# Strategy: parent epic → bucket mapping + keyword fallback
# All auto-classified rows flagged with decision_notes = "auto:[parent]"
# Override any with: db.py set-ticket XX-123 day1 "reviewed"
def classify_ticket(issue: dict, epic_map: dict) -> str:
parent = issue.get("parent_key", "")
if parent in epic_map:
return epic_map[parent]
summary = issue.get("summary", "").lower()
if any(kw in summary for kw in ("credential", "secret", "rotate", "expire")):
return "must"
if any(kw in summary for kw in ("licence", "license", "subscription")):
return "day1"
return "defer"
The data intelligence layer
Sitting above the individual scripts is a data intelligence module that crawls enterprise data sources — SharePoint, SQL databases, file shares, Salesforce — and builds a catalogue of every dataset, its schema, its PII profile, its classification, and its age. The catalogue lives in DynamoDB with a single-table design, and the crawl results export to S3 as partitioned Parquet for Athena queries.
The valid source types tell the story of what the system connects to:
VALID_SOURCES = {"erp", "salesforce", "sis", "sharepoint", "onedrive", "sql", "files"}
VALID_CLASSIFICATIONS = {
"public", "internal", "confidential", "restricted", "highly_restricted"
}
VALID_ACTION_TYPES = {"copy", "move", "purge", "archive"}
VALID_ACTION_STATUSES = {
"pending", "approved", "rejected", "executing", "completed", "failed"
}
The action workflow is the governance layer: a data operation (move this dataset,
purge these records, archive this SharePoint site) goes through a request → approval
→ execution → completion lifecycle. Nothing above internal classification
moves without explicit approval. The audit trail is immutable — every state change
is recorded with actor, timestamp, before-state, and after-state.
Reports that non-technical people can use
The scripts that matter most aren't the clever ones. They're the ones that produce output a non-technical stakeholder can read and act on. The separation report generator reads from the coordination database and produces three documents: a contract register, an IT cost model, and an asset register. Each is a markdown file that renders cleanly in Confluence or as a PDF.
The AWS EDP cost allocation report goes to the finance team as a PDF with charts. The M365 classification report goes to legal as a CSV they can filter in Excel. The Power Platform health assessment goes to the CTO as a markdown document with findings, recommendations, and risk ratings.
The pattern: scripts produce structured data. Report generators consume structured data and produce human-readable documents. The two are separate scripts, not one script that does both. This means you can re-run the report without re-running the audit, and you can change the report format without touching the data collection.
What I'd do differently
SQLite was the right choice for a single-operator toolchain. If more than one person needed to run scripts simultaneously, it would need to be PostgreSQL or DynamoDB. The schema would be the same; the coordination pattern would be the same; the concurrency model would be different.
I'd also invest earlier in a shared config module. Too many scripts had their own
.env loading, their own token caching, their own output directory logic.
A shared config.py that handles all of that — database path, API tokens,
output directories, tenant IDs — would have saved time on every new script.
The agentic layer — using an LLM to help write these scripts — was genuinely useful for the boilerplate: argument parsing, checkpoint logic, rate-limit handling, Graph API pagination. The domain-specific logic — what constitutes a stale record, which classification rules apply, how to split costs across entities — still needed a human to define. The script is the instrument. The judgment about what to measure is still yours.
If the articles or tools have been useful, a coffee helps keep things running.
☕ buy me a coffeeScan any public GitHub repo for dependency risk, secrets, and code quality issues — free, no account needed.
Scan a repo free See governance agents →