Choosing a Database Schema for Polymorphic Data (2024)
Modeling polymorphic data in relational databases presents fundamental challenges that reveal the limitations of traditional SQL approaches. The classic example of insured versus uninsured patients demonstrates how seemingly simple business requirements can create complex schema design problems that resist elegant solutions within relational constraints.
Traditional Inheritance Patterns and Their Limitations
The article explores established patterns from Martin Fowler’s enterprise architecture work: single table inheritance, class table inheritance, and concrete table inheritance. Each approach involves significant trade-offs between data integrity, query complexity, and schema maintainability.
Single table inheritance stores all variants in one table with nullable columns for type-specific fields, leading to sparse data and potential integrity issues. Class table inheritance splits data across multiple tables with foreign key relationships, creating complex joins and potential orphaned records. Concrete table inheritance duplicates common fields across separate tables, making cross-type queries difficult.
These patterns highlight a fundamental mismatch between object-oriented design concepts and relational database constraints. What seems natural in code—having different payment types with distinct properties—becomes awkward when forced into SQL’s rigid table structure.
The JSON Blob Compromise
JSON columns offer a pragmatic middle ground that can work surprisingly well for certain use cases. In SQLite with fast serializers, JSON blobs under a megabyte perform adequately even on hot paths, and updates to complex entities can actually be faster despite increased I/O overhead.
However, JSON approaches break down when you need to join across properties within the JSON data. The moment you find yourself using built-in JSON query functionality extensively, you’ve entered “NoSQL Narnia”—a sign that your data model has outgrown the relational paradigm.
This approach represents a compromise that trades query flexibility for schema simplicity, working best when the polymorphic data remains largely self-contained and doesn’t require complex relational operations.
The Constraint Enforcement Challenge
The core difficulty lies in enforcing business rules like “a record must be either type A or type B but not both” across multiple tables. SQL lacks elegant mechanisms for expressing these constraints, leading to complex check constraints or stored procedures that are difficult to maintain and understand.
The fundamental problem is that relational databases excel at enforcing constraints within tables but struggle with cross-table business rules. This limitation forces developers to choose between data integrity at the database level and schema complexity that becomes unmaintainable.
Some suggest that databases should support unique constraints across tables or foreign key exclusions, but these features don’t exist in standard SQL implementations, leaving developers with workarounds that feel like fighting the database rather than working with it.
The Anti-Polymorphism Argument
A compelling counter-argument suggests that polymorphism shouldn’t be used for data modeling at all—only for behavior. From this perspective, insured and uninsured payments represent fundamentally different business processes that should be handled separately rather than forced into an inheritance hierarchy.
This approach advocates for distinct procedural code paths and separate domain models for each payment type, using polymorphism only when extracting aggregate data across types. The argument is that trying to unify different business processes through inheritance creates more complexity than it solves.
The practical reality in healthcare billing supports this view: insured and uninsured processes have completely different workflows, compliance requirements, and data needs. Attempting to model them polymorphically may be solving the wrong problem.
Alternative Database Paradigms
The limitations of relational approaches have sparked interest in alternative database models. Entity-Attribute-Value (EAV) databases like Datomic and XTDB provide more flexible schema evolution by treating individual attributes as first-class entities rather than forcing data into rigid table structures.
These systems excel at handling polymorphic data because they don’t assume a fixed schema. However, they introduce their own complexities and may not fit all use cases, particularly those requiring strong consistency guarantees or complex analytical queries.
More ambitious platforms like Rama attempt to solve the broader distributed systems complexity by handling both data and compute distribution together, using flexible data primitives that eliminate impedance mismatch between application code and storage.
The Scaling Complexity Problem
When polymorphic constraints propagate through a data model, they create exponential complexity growth. If billing can be insured or uninsured, then related tables like billing_customer and billing_customer_details may also need to be split, creating 2^n table variants as more polymorphic dimensions are added.
This “async coloring problem” means that what should be a local schema decision poisons the entire data model. Tables that merely reference polymorphic entities become polymorphic themselves, creating a cascade of complexity that’s difficult to contain.
The ideal solution would allow local constraint enforcement without global schema fragmentation, but current SQL databases don’t provide mechanisms for this kind of selective constraint propagation.
Practical Recommendations
For most real-world scenarios, the best approach depends on specific requirements and constraints. JSON blobs work well for self-contained polymorphic data that doesn’t require complex queries. Traditional inheritance patterns remain viable when query complexity is manageable and the polymorphic variants are stable.
However, the strongest argument may be to avoid polymorphic data modeling entirely when possible. If different types represent genuinely different business processes, separate models and code paths often provide clearer, more maintainable solutions than forced unification through inheritance.
The choice ultimately reflects a broader tension between object-oriented design principles and relational database constraints—a mismatch that has driven much of the innovation in NoSQL and NewSQL database systems over the past decade.