Skip to content

Data Stack

Overview

The Data Stack provides the persistence layer for the HealthFlow NDP platform, including relational databases, NoSQL databases, caching, and message queues.

Stack Architecture

Components

1. PostgreSQL

Purpose: Primary relational database for core NDP data

Version: PostgreSQL 15.x

Usage:

  • Prescription records
  • Patient registry
  • Healthcare Provider Registry (HPR)
  • Dispense records
  • Transactional data

Key Features:

  • ACID compliance
  • Advanced indexing
  • Full-text search
  • JSON support (JSONB)
  • Row-level security
  • Streaming replication for HA

2. MySQL

Purpose: Legacy database for pharmacy and medicine data

Version: MySQL 8.0.x

Usage:

  • Pharmacy registry
  • Medicine directory (national drug database)
  • EDA integration data
  • Legacy system compatibility

Key Features:

  • InnoDB storage engine
  • ACID transactions
  • Master-slave replication
  • Point-in-time recovery

3. MongoDB

Purpose: Document database for audit and compliance logs

Version: MongoDB 6.x

Usage:

  • Audit trail storage
  • Compliance logging
  • Event history
  • Analytics data
  • Unstructured data storage

Key Features:

  • Flexible schema
  • High write throughput
  • Horizontal scalability
  • Aggregation pipeline
  • Time-series collections

4. Redis

Purpose: In-memory cache and session store

Version: Redis 7.x

Usage:

  • API response caching
  • Session management
  • Rate limiting counters
  • Real-time data
  • Pub/sub messaging
  • Distributed locks

Key Features:

  • Sub-millisecond latency
  • Multiple data structures
  • Persistence options (RDB/AOF)
  • Redis Sentinel for HA
  • Cluster mode support

5. Apache Kafka

Purpose: Distributed event streaming platform

Version: Kafka 3.x

Usage:

  • Event-driven architecture
  • Service-to-service messaging
  • Audit log streaming
  • Real-time data pipelines
  • Analytics event collection

Key Features:

  • High throughput
  • Fault tolerance
  • Durable storage
  • Stream processing
  • Multiple consumers

Kubernetes Manifests

Namespace

yaml
apiVersion: v1
kind: Namespace
metadata:
  name: data-stack
  labels:
    name: data-stack
    stack: infrastructure

PostgreSQL StatefulSet

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgresql
  namespace: data-stack
spec:
  serviceName: postgresql
  replicas: 1
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      containers:
      - name: postgresql
        image: postgres:15-alpine
        ports:
        - containerPort: 5432
          name: postgres
        env:
        - name: POSTGRES_DB
          value: "ndp"
        - name: POSTGRES_USER
          value: "ndp_user"
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgresql-secret
              key: password
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        volumeMounts:
        - name: postgresql-storage
          mountPath: /var/lib/postgresql/data
        resources:
          limits:
            cpu: "4"
            memory: 8Gi
          requests:
            cpu: "2"
            memory: 4Gi
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - pg_isready -U ndp_user
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - pg_isready -U ndp_user
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: postgresql-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3
      resources:
        requests:
          storage: 500Gi

PostgreSQL Service

yaml
apiVersion: v1
kind: Service
metadata:
  name: postgresql
  namespace: data-stack
spec:
  type: ClusterIP
  selector:
    app: postgresql
  ports:
  - port: 5432
    targetPort: 5432
    protocol: TCP

MySQL StatefulSet

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: data-stack
spec:
  serviceName: mysql
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
          name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        - name: MYSQL_DATABASE
          value: "pharmacy"
        - name: MYSQL_USER
          value: "pharmacy_user"
        - name: MYSQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
        resources:
          limits:
            cpu: "2"
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 2Gi
        livenessProbe:
          exec:
            command:
            - mysqladmin
            - ping
            - -h
            - localhost
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - mysql
            - -h
            - localhost
            - -u
            - root
            - -p${MYSQL_ROOT_PASSWORD}
            - -e
            - SELECT 1
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: mysql-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3
      resources:
        requests:
          storage: 100Gi

Redis StatefulSet

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
  namespace: data-stack
spec:
  serviceName: redis
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command:
        - redis-server
        - --requirepass
        - $(REDIS_PASSWORD)
        - --appendonly
        - "yes"
        ports:
        - containerPort: 6379
          name: redis
        env:
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: redis-secret
              key: password
        volumeMounts:
        - name: redis-storage
          mountPath: /data
        resources:
          limits:
            cpu: "2"
            memory: 4Gi
          requests:
            cpu: "500m"
            memory: 2Gi
        livenessProbe:
          exec:
            command:
            - redis-cli
            - -a
            - $(REDIS_PASSWORD)
            - ping
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - redis-cli
            - -a
            - $(REDIS_PASSWORD)
            - ping
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: redis-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3
      resources:
        requests:
          storage: 20Gi

MongoDB StatefulSet

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongodb
  namespace: data-stack
spec:
  serviceName: mongodb
  replicas: 1
  selector:
    matchLabels:
      app: mongodb
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      containers:
      - name: mongodb
        image: mongo:6
        ports:
        - containerPort: 27017
          name: mongodb
        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          value: "admin"
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: root-password
        - name: MONGO_INITDB_DATABASE
          value: "audit"
        volumeMounts:
        - name: mongodb-storage
          mountPath: /data/db
        resources:
          limits:
            cpu: "2"
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 2Gi
        livenessProbe:
          exec:
            command:
            - mongo
            - --eval
            - db.adminCommand('ping')
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - mongo
            - --eval
            - db.adminCommand('ping')
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: mongodb-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3
      resources:
        requests:
          storage: 100Gi

Service Dependencies

Environment Variables

PostgreSQL

VariableDescriptionRequired
POSTGRES_DBDatabase nameYes
POSTGRES_USERDatabase userYes
POSTGRES_PASSWORDUser passwordYes
PGDATAData directory pathYes

MySQL

VariableDescriptionRequired
MYSQL_ROOT_PASSWORDRoot passwordYes
MYSQL_DATABASEInitial databaseNo
MYSQL_USERApplication userNo
MYSQL_PASSWORDUser passwordNo

MongoDB

VariableDescriptionRequired
MONGO_INITDB_ROOT_USERNAMERoot usernameYes
MONGO_INITDB_ROOT_PASSWORDRoot passwordYes
MONGO_INITDB_DATABASEInitial databaseNo

Redis

VariableDescriptionRequired
REDIS_PASSWORDAuthentication passwordYes

Deployment Instructions

1. Create Namespace and Secrets

bash
# Create namespace
kubectl create namespace data-stack

# Create PostgreSQL secret
kubectl create secret generic postgresql-secret \
  --from-literal=password=$(openssl rand -base64 32) \
  --namespace data-stack

# Create MySQL secret
kubectl create secret generic mysql-secret \
  --from-literal=root-password=$(openssl rand -base64 32) \
  --from-literal=password=$(openssl rand -base64 32) \
  --namespace data-stack

# Create MongoDB secret
kubectl create secret generic mongodb-secret \
  --from-literal=root-password=$(openssl rand -base64 32) \
  --namespace data-stack

# Create Redis secret
kubectl create secret generic redis-secret \
  --from-literal=password=$(openssl rand -base64 32) \
  --namespace data-stack

2. Deploy Databases

bash
# Deploy PostgreSQL
kubectl apply -f postgresql/statefulset.yaml
kubectl apply -f postgresql/service.yaml

# Deploy MySQL
kubectl apply -f mysql/statefulset.yaml
kubectl apply -f mysql/service.yaml

# Deploy MongoDB
kubectl apply -f mongodb/statefulset.yaml
kubectl apply -f mongodb/service.yaml

# Deploy Redis
kubectl apply -f redis/statefulset.yaml
kubectl apply -f redis/service.yaml

3. Verify Deployment

bash
# Check pod status
kubectl get pods -n data-stack

# Check persistent volumes
kubectl get pvc -n data-stack

# Check services
kubectl get svc -n data-stack

# Test PostgreSQL connection
kubectl exec -it -n data-stack postgresql-0 -- psql -U ndp_user -d ndp

# Test MySQL connection
kubectl exec -it -n data-stack mysql-0 -- mysql -u root -p

# Test MongoDB connection
kubectl exec -it -n data-stack mongodb-0 -- mongosh -u admin -p

# Test Redis connection
kubectl exec -it -n data-stack redis-0 -- redis-cli -a <password> ping

Database Initialization

Schema Management

Database schemas are managed by the application services themselves and will evolve during the development cycle.

  • PostgreSQL schemas are managed by the application microservices using migration tools (e.g., Flyway, Liquibase)
  • MySQL schemas are versioned with the application code
  • MongoDB collections are created dynamically by the application

Refer to individual service documentation for current schema definitions.

Backup Strategy

Automated Backups

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgresql-backup
  namespace: data-stack
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:15-alpine
            command:
            - /bin/sh
            - -c
            - |
              pg_dump -h postgresql -U ndp_user ndp | \
              gzip > /backup/ndp-$(date +%Y%m%d-%H%M%S).sql.gz
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgresql-secret
                  key: password
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          restartPolicy: OnFailure
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc

Monitoring & Alerts

Key Metrics to Monitor

PostgreSQL:

  • Connection count
  • Transaction rate
  • Query duration
  • Cache hit ratio
  • Replication lag

MySQL:

  • Connection count
  • Slow query log
  • InnoDB buffer pool usage
  • Replication status

MongoDB:

  • Operation latency
  • Document count
  • Write concern
  • Replication lag

Redis:

  • Memory usage
  • Hit rate
  • Evicted keys
  • Connected clients

Kafka:

  • Message rate
  • Consumer lag
  • Partition count
  • Disk usage

Resource Requirements

Estimates

These are rough estimates. Monitor actual usage and adjust accordingly.

ServiceCPUMemoryStorageIOPS
PostgreSQL4 cores8 GB500 GB3000
MySQL2 cores4 GB100 GB1000
MongoDB2 cores4 GB100 GB1000
Redis2 cores4 GB20 GB500
Kafka2 cores4 GB50 GB1000

Troubleshooting

PostgreSQL Won't Start

bash
# Check logs
kubectl logs -n data-stack postgresql-0

# Check permissions
kubectl exec -n data-stack postgresql-0 -- ls -la /var/lib/postgresql/data

# Reset data directory (CAUTION: destroys data)
kubectl delete pvc -n data-stack postgresql-storage-postgresql-0

Redis Out of Memory

bash
# Check memory usage
kubectl exec -n data-stack redis-0 -- redis-cli -a <password> INFO memory

# Clear all keys (CAUTION)
kubectl exec -n data-stack redis-0 -- redis-cli -a <password> FLUSHALL

Next Steps