Data governance in a small serverless application is often treated as either non-applicable ("we're too small to worry about GDPR") or as a compliance theatre exercise ("we added a cookie banner, we're done"). Both approaches are wrong, and both carry real risk.
The practical reality is that GDPR compliance for a simple serverless app is not complicated. It requires knowing what personal data you hold, where it lives, how long it's retained, and how to delete it on request. That's it. This article covers the patterns that make these four requirements easy to fulfil.
Data classification: the foundation
Before you can protect data, you need to know what you have. A simple classification taxonomy for serverless apps:
| Class | Examples | Requirements |
|---|---|---|
| Personal | Email addresses, IP addresses, user IDs | GDPR Article 6 lawful basis; deletion on request; retention limits |
| Sensitive personal | Health data, biometrics, political opinions | Explicit consent; separate storage; enhanced audit logging |
| Internal | Scan results, configuration, logs | Access control; retention policy; no external sharing |
| Public | Published articles, scan summaries | No special handling required |
For ticketyboo.dev, we hold two categories of personal data: email addresses (newsletter subscribers) and IP addresses (rate limiting records). Both are documented, both have retention limits, and both have deletion mechanisms.
Lawful basis: know why you hold what you hold
Under GDPR Article 6, every piece of personal data must have a documented lawful basis. The most relevant bases for a developer tool:
- Consent (6(1)(a)): Newsletter subscriptions. The subscriber explicitly opted in. Withdrawal of consent = hard delete.
- Legitimate interests (6(1)(f)): IP addresses for rate limiting. Preventing abuse is a legitimate interest. TTL of 24 hours is proportionate.
- Contract (6(1)(b)): Data necessary to perform a service the user requested (e.g. scan results tied to a scan_id).
delete_item(PK="SUB#{email}", SK="META").
Done. No archive table, no audit log of the email itself.
Data residency in serverless
In a traditional multi-region architecture, data residency (keeping EU personal data in the EU) requires careful Lambda → database routing. In a single-region serverless architecture, it's simple: choose an EU region for all personal data storage, and document that choice.
ticketyboo.dev uses eu-north-1 (Stockholm) for all data storage.
The ACM certificate is in us-east-1 (required for CloudFront), but
the certificate stores no personal data — it's public key material.
The CloudFront edge caches static assets at edge locations globally, but
personal data only transits through edge nodes; it's only stored in eu-north-1.
Document this in your privacy notice. Users deserve to know where their data lives.
TTL-based retention
DynamoDB's TTL feature (automatic item deletion based on a Unix timestamp attribute) is the cleanest way to enforce retention limits in a serverless architecture. No cron jobs, no batch delete scripts, no forgetting to run the cleanup.
import time
from datetime import datetime, timezone
def ttl_days(days: int) -> int:
"""Return Unix timestamp N days from now, for DynamoDB TTL."""
return int(time.time()) + (days * 86400)
# Scan records: retained for 90 days
item = {
'PK': f'SCAN#{scan_id}',
'SK': 'META',
'ttl': ttl_days(90),
# ...
}
# Rate limit records: retained for 24 hours
rate_item = {
'PK': f'RATELIMIT#{ip}',
'SK': f'REQ#{timestamp}',
'ttl': ttl_days(1),
# ...
}
# Newsletter subscriptions: NO TTL (retained until unsubscribe)
subscription = {
'PK': f'SUB#{email}',
'SK': 'META',
# no 'ttl' attribute
}
IP address handling
IP addresses are personal data under GDPR (they can identify a natural person, especially combined with a timestamp). Rate limiting requires tracking IPs, but the retention period should be minimised.
Our approach: store IP addresses with a 24-hour TTL in DynamoDB. We don't hash them (hashing a 4-byte IPv4 address is not anonymisation — the hash space is too small). We store the IP in plaintext with a short TTL and no logs beyond CloudWatch default retention.
Scan results are stored with the requester IP for rate limiting correlation, but the IP is redacted from any externally-visible scan report.
Privacy notice: what to include
A GDPR-compliant privacy notice for a simple serverless app doesn't need to be long. It needs to answer five questions:
- What personal data do you collect? (email, IP)
- Why? (newsletter consent, rate limiting as legitimate interest)
- Where is it stored? (AWS DynamoDB, eu-north-1 / Stockholm)
- How long is it retained? (email: until unsubscribe; IP: 24 hours; scan results: 90 days)
- How can they delete it? (unsubscribe link = hard delete; scan results expire automatically)
The ticketyboo.dev footer includes all five answers in under 100 words. That's all a simple data controller needs.
Data governance in the scanner
The scanner checks repositories for common data governance violations:
- Personal data in log statements (email addresses, user IDs in debug logs)
- Missing privacy notice in web-facing projects (no privacy policy link)
- Database schemas with no retention/TTL mechanism for user data
- Cookie usage without consent (detectable via
document.cookiepatterns in JS)