Data governance patterns

Data governance in a small serverless application is often treated as either non-applicable ("we're too small to worry about GDPR") or as a compliance theatre exercise ("we added a cookie banner, we're done"). Both approaches are wrong, and both carry real risk.

The practical reality is that GDPR compliance for a simple serverless app is not complicated. It requires knowing what personal data you hold, where it lives, how long it's retained, and how to delete it on request. That's it. This article covers the patterns that make these four requirements easy to fulfil.

Data classification: the foundation

Before you can protect data, you need to know what you have. A simple classification taxonomy for serverless apps:

Class	Examples	Requirements
Personal	Email addresses, IP addresses, user IDs	GDPR Article 6 lawful basis; deletion on request; retention limits
Sensitive personal	Health data, biometrics, political opinions	Explicit consent; separate storage; enhanced audit logging
Internal	Scan results, configuration, logs	Access control; retention policy; no external sharing
Public	Published articles, scan summaries	No special handling required

For ticketyboo.dev, we hold two categories of personal data: email addresses (newsletter subscribers) and IP addresses (rate limiting records). Both are documented, both have retention limits, and both have deletion mechanisms.

Lawful basis: know why you hold what you hold

Under GDPR Article 6, every piece of personal data must have a documented lawful basis. The most relevant bases for a developer tool:

Consent (6(1)(a)): Newsletter subscriptions. The subscriber explicitly opted in. Withdrawal of consent = hard delete.
Legitimate interests (6(1)(f)): IP addresses for rate limiting. Preventing abuse is a legitimate interest. TTL of 24 hours is proportionate.
Contract (6(1)(b)): Data necessary to perform a service the user requested (e.g. scan results tied to a scan_id).

The hard delete rule: When a user unsubscribes, delete their record entirely. Do not soft-delete, do not mark as "unsubscribed", do not retain for "analytics". A soft-delete that keeps the email address is not compliant with the right to erasure (Article 17). The DynamoDB pattern is: delete_item(PK="SUB#{email}", SK="META"). Done. No archive table, no audit log of the email itself.

The subscription model and hard-delete in actual production code (api/models.py, api/newsletter.py):

# api/models.py — subscription record; no TTL (retained until explicit delete)
@dataclass
class Subscription:
    email:         str
    subscribed_at: str   # ISO 8601
    status:        str = "active"  # only "active" records are stored

# api/newsletter.py — unsubscribe = hard delete, no archive
def unsubscribe(email: str) -> bool:
    """Hard-delete subscriber record.

    Returns True if deleted, False if record not found.
    Does NOT log the email address — just the event timestamp.
    """
    try:
        _table.delete_item(
            Key={"PK": f"SUB#{email}", "SK": "META"},
            ConditionExpression="attribute_exists(PK)",
        )
        logger.info("Unsubscribed at %s", datetime.now(timezone.utc).isoformat())
        return True
    except ClientError as e:
        if e.response["Error"]["Code"] == "ConditionalCheckFailedException":
            return False
        raise

Data residency in serverless

In a traditional multi-region architecture, data residency (keeping EU personal data in the EU) requires careful Lambda → database routing. In a single-region serverless architecture, it's simple: choose an EU region for all personal data storage, and document that choice.

ticketyboo.dev uses eu-north-1 (Stockholm) for all data storage. The ACM certificate is in us-east-1 (required for CloudFront), but the certificate stores no personal data — it's public key material. The CloudFront edge caches static assets at edge locations globally, but personal data only transits through edge nodes; it's only stored in eu-north-1.

Document this in your privacy notice. Users deserve to know where their data lives.

TTL-based retention

DynamoDB's TTL feature (automatic item deletion based on a Unix timestamp attribute) is the cleanest way to enforce retention limits in a serverless architecture. No cron jobs, no batch delete scripts, no forgetting to run the cleanup.

import time
from datetime import datetime, timezone

def ttl_days(days: int) -> int:
    """Return Unix timestamp N days from now, for DynamoDB TTL."""
    return int(time.time()) + (days * 86400)

# Scan records: retained for 90 days
item = {
    'PK': f'SCAN#{scan_id}',
    'SK': 'META',
    'ttl': ttl_days(90),
    # ...
}

# Rate limit records: retained for 24 hours
rate_item = {
    'PK': f'RATELIMIT#{ip}',
    'SK': f'REQ#{timestamp}',
    'ttl': ttl_days(1),
    # ...
}

# Newsletter subscriptions: NO TTL (retained until unsubscribe)
subscription = {
    'PK': f'SUB#{email}',
    'SK': 'META',
    # no 'ttl' attribute
}

IP address handling

IP addresses are personal data under GDPR (they can identify a natural person, especially combined with a timestamp). Rate limiting requires tracking IPs, but the retention period should be minimised.

Our approach: store IP addresses with a 24-hour TTL in DynamoDB. We don't hash them (hashing a 4-byte IPv4 address is not anonymisation — the hash space is too small). We store the IP in plaintext with a short TTL and no logs beyond CloudWatch default retention.

Scan results are stored with the requester IP for rate limiting correlation, but the IP is redacted from any externally-visible scan report.

Privacy notice: what to include

A GDPR-compliant privacy notice for a simple serverless app doesn't need to be long. It needs to answer five questions:

What personal data do you collect? (email, IP)
Why? (newsletter consent, rate limiting as legitimate interest)
Where is it stored? (AWS DynamoDB, eu-north-1 / Stockholm)
How long is it retained? (email: until unsubscribe; IP: 24 hours; scan results: 90 days)
How can they delete it? (unsubscribe link = hard delete; scan results expire automatically)

The ticketyboo.dev footer includes all five answers in under 100 words. That's all a simple data controller needs.

Data governance in the scanner

The scanner checks repositories for common data governance violations:

Personal data in log statements (email addresses, user IDs in debug logs)
Missing privacy notice in web-facing projects (no privacy policy link)
Database schemas with no retention/TTL mechanism for user data
Cookie usage without consent (detectable via document.cookie patterns in JS)

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →

Data classification: the foundation

Lawful basis: know why you hold what you hold

Data residency in serverless

TTL-based retention

IP address handling

Privacy notice: what to include

Data governance in the scanner

Related tools and articles

Working on something like this?