Diagram Types
Flowchart ▶
A diagram that maps out a process or workflow using standardised symbols connected by arrows. Rectangles represent actions or steps, diamonds represent decisions with branching Yes/No paths, ovals mark start and end points. Flowcharts are language-agnostic and understood across disciplines — from software engineering to operations management.
UML — Unified Modeling Language ▶
A standardised visual language for describing software systems. UML has 14 diagram types divided into structural (class, component, deployment) and behavioural (use case, sequence, state machine, activity). In practice, class diagrams and sequence diagrams are the most commonly used. Defined by the OMG (Object Management Group).
Entity-Relationship Diagram (ERD) ▶
Models the data entities in a system, their attributes, and how they relate to each other. Entities become database tables; relationships become foreign key constraints or join tables. Crow's foot notation shows cardinality: a single line means "one", a forked triple line means "many". ERDs are created before or alongside database schema design.
Sequence Diagram ▶
A UML diagram that shows how objects interact in a particular scenario over time. Participants appear as vertical "lifelines" and messages flow left-to-right as horizontal arrows between them, ordered top-to-bottom chronologically. Essential for documenting API call chains, authentication flows, and microservice interactions.
Architecture Diagram ▶
A high-level diagram showing the major components of a system (services, databases, clients, external APIs) and how they communicate. Less standardised than UML — notation varies by team and tool. Common elements: boxes for components, arrows for data flow or network connections, clouds for external systems, cylinders for databases. This app produces architecture diagrams.
Data Flow Diagram (DFD) ▶
Shows how data moves through a system — inputs, transformations, storage, and outputs. DFDs have four symbols: rectangles (external entities), rounded rectangles or circles (processes), open-ended rectangles (data stores), and arrows (data flows with labels). Level 0 (Context Diagram) shows the whole system as one process; Level 1 breaks it into sub-processes.
Flowchart Symbols
Process — Rectangle ▶
The most common flowchart symbol. Represents a single action, operation, or step in the process — anything that transforms input into output. Examples: "Validate user credentials", "Calculate total price", "Write record to database". Each rectangle should describe one discrete action and have exactly one entry and one exit arrow.
Decision — Diamond ▶
Represents a branching point where a Yes/No or True/False question is asked. Has one input and two (or more) outputs, each labelled with the condition they represent. Nested decisions create complex logic — if you find yourself nesting more than two or three diamonds, it's often a sign the logic should be refactored into a separate subprocess.
Terminal — Oval / Rounded Rectangle ▶
Marks the start or end of a process. Every flowchart has exactly one Start terminal (at the top) and one or more End/Stop terminals. When a process can terminate in multiple ways (success, error, timeout), each termination path ends in its own terminal with a descriptive label. Also used to indicate a call to a separate sub-process flowchart.
Database — Cylinder ▶
Represents a data store — a database, file, or any persistent storage medium. The cylinder shape comes from the physical appearance of old magnetic disk drives. In architecture diagrams, annotate the cylinder with the type of database (SQL, MongoDB, Redis, S3 bucket) and what entity or domain it stores. A single service should ideally own its own data store (database-per-service pattern).
I/O — Parallelogram ▶
Represents an input or output operation — reading from a user, a file, an API, or writing to an output stream. The slanted sides visually suggest data moving in or out. In process flows, it marks the boundary where the system interacts with the outside world. In data flow diagrams, this boundary would instead be represented by an external entity rectangle.
Arrow — Control Flow ▶
Arrows show the direction of flow — where control goes next after each step. Every shape except the Terminal must have at least one outgoing arrow. Avoid crossing arrows; reroute the layout instead. In data flow diagrams, arrows carry data rather than control — they are labelled with the name of the data being passed.
System Architecture Concepts
Monolith vs Microservices ▶
A monolith packages all application modules (auth, billing, notifications, etc.) into one deployable unit — one codebase, one process, one database. Simple to develop and debug initially; becomes harder to scale and release independently as the codebase grows. Microservices splits those modules into independently deployable services that communicate over a network, enabling each to be scaled, deployed, and rewritten independently at the cost of distributed systems complexity.
API — Application Programming Interface ▶
A defined contract through which one component exposes its capabilities for another to consume, without exposing internal implementation. In web systems, APIs typically communicate over HTTP using JSON. A REST API treats resources as nouns addressed by URLs (GET /users/42). A GraphQL API lets the client specify exactly which fields to fetch in a single query. APIs are the arrows between boxes in an architecture diagram.
Load Balancer ▶
A component that distributes incoming requests across multiple instances of a service, preventing any single instance from becoming a bottleneck. Strategies include round-robin (distribute evenly in order), least-connections (route to the least-busy instance), and IP hash (route the same client to the same instance for session affinity). In diagrams, the load balancer sits in front of a service cluster.
Message Queue / Event Bus ▶
A component that decouples services by holding messages in transit. A producer pushes a message to the queue; a consumer reads and processes it asynchronously. This decoupling means the producer does not need to wait for the consumer and the system is resilient to consumer downtime. Common implementations: Kafka (high-throughput event streaming), RabbitMQ (task queues), AWS SQS. In diagrams, queues sit between services as intermediary nodes.
CDN — Content Delivery Network ▶
A geographically distributed network of servers that cache and serve static assets (images, CSS, JS, video) from locations close to the end user. Reduces latency (content is served from nearby nodes rather than your origin server) and reduces origin load. In architecture diagrams, the CDN sits in front of the origin server and intercepts cacheable requests.
Pub / Sub Pattern ▶
Publishers emit events to named topics without knowing who will receive them. Subscribers register interest in a topic and receive all events published to it. This many-to-many broadcast pattern enables loose coupling — adding a new subscriber (e.g., an analytics service) requires no change to the publisher. Compare with point-to-point queues, where each message has exactly one consumer.
Database & Data Modeling
Entity, Attribute, Relationship ▶
In data modeling, an entity is a real-world object or concept you want to track (User, Order, Product). An attribute is a property of that entity (email, created_at, price). A relationship describes how entities associate — "a User places many Orders" is a one-to-many relationship, mapped in SQL with a foreign key on the Orders table pointing to Users.
Primary Key & Foreign Key ▶
A primary key uniquely identifies each row in a table — often an auto-incrementing integer or a UUID. A foreign key is a column in one table that references the primary key of another, enforcing referential integrity (you cannot insert an Order for a User that doesn't exist). In an ERD, foreign keys represent the "many" side of a relationship.
Normalization ▶
The process of structuring a relational database to reduce redundancy and improve data integrity. 1NF: each column holds atomic values, no repeating groups. 2NF: every non-key attribute depends on the whole primary key (matters for composite keys). 3NF: every non-key attribute depends only on the primary key, not on other non-key columns (no transitive dependencies). Most production schemas target 3NF.
ACID Properties ▶
Atomicity: a transaction either fully succeeds or fully rolls back — no partial writes. Consistency: a transaction takes the database from one valid state to another, honouring all constraints. Isolation: concurrent transactions behave as if they ran serially. Durability: once committed, a transaction survives crashes. ACID is the guarantee offered by traditional SQL databases; many NoSQL databases relax some guarantees in exchange for availability and partition tolerance (CAP theorem).
SQL vs NoSQL ▶
SQL (relational) databases store data in tables with a fixed schema, enforce relationships, and provide ACID guarantees. Optimal when data relationships are well-understood and consistency is critical. NoSQL covers document stores (MongoDB), key-value stores (Redis), column-family stores (Cassandra), and graph databases (Neo4j). NoSQL trades relational integrity for flexible schemas, horizontal scalability, and high write throughput. Choose based on access patterns, not just popularity.
CRUD ▶
The four fundamental data operations: Create (INSERT), Read (SELECT), Update (UPDATE), Delete (DELETE). Most application features map onto CRUD — a "user profile editor" is Read + Update; a "delete account" button is Delete. REST APIs map CRUD to HTTP verbs: POST = Create, GET = Read, PUT/PATCH = Update, DELETE = Delete.
Design Patterns
MVC — Model, View, Controller ▶
A structural pattern that separates concerns into three roles: the Model manages data and business logic; the View renders the UI; the Controller handles user input, orchestrates the Model, and selects which View to render. Popularised by Rails and Django web frameworks. Variants include MVP (Model-View-Presenter) and MVVM (Model-View-ViewModel, common in React and Angular).
Repository Pattern ▶
Abstracts data access behind an interface so that business logic does not directly depend on the database technology. A UserRepository interface might define findById, save, and delete; the concrete implementation handles SQL queries. This makes it trivial to swap databases, add caching, or mock the repository in tests without touching business logic.
Singleton Pattern ▶
Ensures a class has only one instance and provides a global access point to it. Common uses: database connection pools, configuration registries, logging services. In modern dependency injection frameworks the pattern is largely replaced by registering a class as a singleton in the DI container rather than coding it explicitly — which avoids the testability problems of global mutable state.
Observer / Event-Driven Pattern ▶
An object (the subject or emitter) maintains a list of dependents (observers) and notifies them automatically when its state changes. The DOM event system is the canonical example: addEventListener registers an observer; dispatching a click event notifies all registered listeners. At system scale, this pattern becomes the Pub/Sub or Event Bus architecture.
Dependency Injection (DI) ▶
A technique where a component receives its dependencies from the outside rather than constructing them itself. Instead of const db = new PostgresClient() inside a service, the service declares constructor(db: Database) and the DI container provides the right implementation at runtime. This decoupling enables easy testing (inject a mock) and configuration (inject a different database for staging vs production).
Factory Pattern ▶
Delegates object creation to a dedicated factory function or class rather than using new directly. The factory decides which concrete class to instantiate based on runtime parameters. Useful when the exact type to create depends on configuration or user input, or when construction is complex. A NotificationFactory.create('email' | 'sms' | 'push') returns the appropriate notifier without the caller needing to know the class names.
Graph & Network Theory
Node / Vertex and Edge ▶
The two fundamental building blocks of a graph. A node (or vertex) represents an entity — a service, a person, a router, a webpage. An edge represents a relationship or connection between two nodes. In a network diagram: nodes are servers and edges are network links. In a social graph: nodes are users and edges are friendships.
Directed vs Undirected Graph ▶
In an undirected graph, edges have no direction — if A is connected to B, B is also connected to A (friendships, physical cables). In a directed graph (digraph), edges are arrows — A → B does not imply B → A. API call graphs, dependency trees, and data pipelines are directed graphs because data flows one way.
DAG — Directed Acyclic Graph ▶
A directed graph with no cycles — you cannot follow edges and return to a node you've already visited. DAGs model dependencies naturally: task scheduling (you can't start task B before A finishes), build systems (Makefile targets), and version control history. Topological sort — ordering nodes so all edges point forward — is only possible on a DAG.
Weighted Graph ▶
A graph where each edge carries a numeric weight representing cost, distance, latency, bandwidth, or any other metric. Shortest-path algorithms like Dijkstra and A* require weighted graphs. In network diagrams, edge weights might represent latency (ms), bandwidth (Gbps), or financial cost. An unweighted graph can be treated as a weighted graph where every edge has weight 1.
Degree, Path, and Cycle ▶
The degree of a node is the number of edges connected to it (in-degree and out-degree for directed graphs). A path is a sequence of nodes connected by edges with no repeated nodes. A cycle is a path that returns to its starting node. Detecting cycles is important for dependency resolution — a circular dependency (A needs B needs A) is a cycle and prevents proper initialisation order.