How to Choose the Right Database for Your Architecture
Database selection has an outsized effect on the rest of your system design. The choice you make in the early stages of a project shapes your data model, your query capabilities, your scaling strategy, and your operational complexity for years to come. Yet many teams default to whatever they used last (PostgreSQL for everything, or MongoDB because "it's flexible") without rigorously evaluating whether the technology matches their actual requirements.
This database selector tool walks you through the key decision criteria (data model, access patterns, consistency requirements, and scale) and recommends database categories and specific technologies that fit. Whether you are designing a new system, preparing for a system design interview, or evaluating a migration, the tool helps you reason from first principles rather than habit.
SQL, NoSQL, and NewSQL: Understanding the Trade-offs
Relational databases (PostgreSQL, MySQL, SQL Server) store data in tables with a fixed schema and support SQL for complex queries, joins, and transactions. They are the right default for most applications where data relationships are important and consistency is non-negotiable. PostgreSQL in particular has evolved to handle JSON documents, full-text search, and geospatial data, making it surprisingly versatile.
NoSQL databases trade schema rigidity and join support for flexibility and horizontal scalability. Document stores (MongoDB, Couchbase) excel when your data is naturally hierarchical. Key-value stores (Redis, DynamoDB) deliver sub-millisecond latency for simple lookups. Wide-column stores (Cassandra, ScyllaDB) handle massive write throughput across distributed clusters. Graph databases (Neo4j, Amazon Neptune) make relationship traversal efficient for social networks, recommendation engines, and fraud detection.
NewSQL databases (CockroachDB, TiDB, Google Spanner) aim to combine the scalability of NoSQL with the transactional guarantees of SQL. They are well-suited for globally distributed applications that need strong consistency without sacrificing horizontal scale, though they come with higher operational complexity and cost.
The CAP Theorem in Practice
The CAP theorem (Consistency, Availability, Partition Tolerance) is a foundational concept for distributed database selection. In practice, every distributed system must tolerate network partitions, so the real question is whether you prioritise consistency (every read sees the latest write) or availability (every request gets a response) when a partition occurs. Modern databases offer tuneable consistency: DynamoDB lets you choose between eventually consistent and strongly consistent reads per request, and Cassandra allows you to set consistency levels per query. Understanding these knobs is more practical than memorising CAP categories.
When to Use Multiple Databases
Most production systems at scale use more than one database. A common pattern is to use PostgreSQL as the primary transactional store, Redis as a caching layer for hot data, and Elasticsearch for full-text search. This polyglot persistence approach lets each database do what it does best, but it introduces complexity around data synchronisation and operational overhead. As an engineering manager, you need to weigh the performance benefits against the cost of maintaining multiple data stores - and ensure your team has the expertise to operate them reliably.
Database Selection for Interviews
In system design interviews, how you select a database tells the interviewer a lot about how you think architecturally. They want to see that you reason from requirements (access patterns, scale, consistency) rather than defaulting to a favourite technology. Name specific databases, explain why they fit, and acknowledge the trade-offs. If the system has multiple distinct data access patterns, say transactional writes and analytics queries, propose different stores for each and explain how data flows between them. This kind of requirement-driven reasoning is what interviewers look for at senior and staff level.
Frequently Asked Questions
- How do I choose between SQL and NoSQL?
- The decision depends on your data model, query patterns, and consistency requirements. Choose SQL (PostgreSQL, MySQL) when you need strong consistency, complex joins across multiple tables, ACID transactions, and your schema is well-defined. Choose NoSQL when you need flexible schemas that evolve rapidly (document stores like MongoDB), extremely high write throughput (wide-column stores like Cassandra), real-time access by key (key-value stores like Redis), or deeply connected data traversal (graph databases like Neo4j). Many production systems use both: SQL for transactional data and NoSQL for caching, search, or analytics. Start with the query patterns your application requires and work backward to the data model, not the other way around.
- What is the CAP theorem?
- The CAP theorem states that a distributed data store can provide at most two of three guarantees simultaneously: Consistency (every read receives the most recent write), Availability (every request receives a response), and Partition Tolerance (the system continues operating despite network partitions). Since network partitions are inevitable in distributed systems, the real trade-off is between consistency and availability during a partition. CP systems (e.g. HBase, MongoDB with majority write concern) sacrifice availability to maintain consistency. AP systems (e.g. Cassandra, DynamoDB) sacrifice strict consistency to remain available. In practice, modern databases offer tuneable consistency levels, so the choice is a spectrum rather than a binary decision.
- Which database is best for high-traffic applications?
- There is no single best database. The answer depends on your access patterns. For high-traffic read-heavy workloads with simple key-based lookups, Redis or Memcached as a caching layer in front of a primary database is the standard approach. For high-throughput write-heavy workloads, Cassandra and ScyllaDB excel due to their distributed write paths. For mixed workloads with complex queries, PostgreSQL with read replicas handles millions of requests when properly tuned. For time-series data at scale, TimescaleDB or InfluxDB are purpose-built. The key is to benchmark against your specific access patterns rather than relying on generic throughput claims.
- When should I use a document database vs a relational database?
- Use a document database (MongoDB, Couchbase) when your data is naturally hierarchical or nested, your schema varies across records, you read and write entire documents at once, and you rarely need cross-document joins. Common use cases include content management systems, user profiles, product catalogues, and event logging. Use a relational database when your data has many relationships, you need to query across those relationships frequently, you require strict transactional guarantees, or your schema is stable and well-understood. E-commerce order systems, financial ledgers, and user permission models are classic relational use cases.
- How do I evaluate databases for system design interviews?
- In a system design interview, demonstrate your database selection by walking through a structured framework: first, clarify the data model (what entities exist and how they relate), then identify the primary access patterns (reads vs writes, simple lookups vs complex queries, latency requirements). Next, estimate scale (data volume, queries per second, growth rate) and state your consistency requirements (strong, eventual, or causal). Finally, map these requirements to a database category and name a specific technology. For example: 'We have a social graph with many-to-many relationships and need efficient traversal, so I would use Neo4j for the graph layer and Redis for caching frequently accessed profiles.' This approach shows interviewers you reason from requirements, not preferences.
Practice System Design Scenarios
Put your database knowledge into practice with realistic architecture challenges. The System Design Scenario Generator creates interview-style problems that test your ability to select the right technologies, plan for scale, and make sound trade-offs.
Try the Scenario Generator