Skip to content

APIs From the Ground Up: Design, Protocols, and Security for Production Systems

APIs From the Ground Up: Design, Protocols, and Security for Production Systems

I've been designing APIs for over a decade, and the thing that strikes me most looking back is how many decisions feel obvious in hindsight but weren't at the time. This post is the reference I wish I'd had earlier -- not a tutorial, but a working mental model for how APIs fit together from the transport layer up through authentication and security.

What an API actually is (and isn't)

An API is a contract. The client says "I'll send you this shape of data at this URL," and the server says "I'll give you back this shape of response." Everything else -- the database, the business logic, the infrastructure -- is hidden behind that contract.

Two things make this useful in practice. First, it lets you change the internals without breaking clients. Swap Postgres for DynamoDB, rewrite a service in Go, move to a different cloud -- as long as the contract holds, nobody notices. Second, it defines where one team's responsibility ends and another's begins. In a microservices world, the API boundary is the team boundary.

Loading diagram...

The contract itself has three parts: endpoints (where), methods (what operation), and response shapes (what comes back). Get those right and most of the downstream problems go away. Get them wrong and you'll be writing migration guides for the next two years.

Picking the right API style

This is the decision that gets the most debate and matters less than people think -- until it matters a lot. Here's how I think about it.

Loading diagram...

REST is the default for public APIs, and for good reason. It maps cleanly onto HTTP (GET a resource, POST to create one, DELETE to remove it). Caching works out of the box because HTTP already has cache headers. Every developer on Earth knows how to call a REST endpoint. The downside is over-fetching -- you get the whole user object when you only needed the name, or you make three requests when one would do.

GraphQL solves the over-fetching problem by letting clients ask for exactly the fields they want. If your mobile app needs a user's name and their three most recent orders, that's one query. The trade-off is complexity: you lose HTTP caching (everything is POST), you need query depth limits to prevent abuse, and error handling is weird -- GraphQL returns 200 OK even when things fail, so you have to dig into the response body for errors.

gRPC is what you use between microservices when you need speed. It serializes data with protocol buffers (binary, much smaller than JSON), runs on HTTP/2 (multiplexing, streaming), and generates client libraries from a schema definition. I use it for service-to-service communication where both sides are under my control. Don't try to use it from a browser -- the support isn't there.

The network layer: TCP vs UDP

This doesn't come up often in API design discussions, but it should, because your reliability guarantees start here.

Loading diagram...

TCP does the handshake, tracks every packet, resends lost ones, and reorders anything that arrives out of sequence. It's mandatory for anything where losing data matters: payments, auth, database writes, API calls.

UDP skips all of that. No handshake, no tracking, no guarantees. A packet either arrives or it doesn't. This sounds reckless until you realize that video calls and online games don't need perfect delivery -- they need the most recent frame. A packet from 200ms ago is useless; the current one is all that matters.

For API work, you're almost always on TCP. The exception is if you're building real-time features (live cursors, presence indicators) where WebSockets or WebRTC might use UDP under the hood.

Application layer protocols

Once you've picked TCP or UDP, the next layer up determines how data is formatted and exchanged.

HTTP/HTTPS is the base for REST and GraphQL. The methods (GET, POST, PUT, PATCH, DELETE) map to CRUD operations. Status codes tell clients what happened: 2xx means success, 4xx means the client messed up, 5xx means the server messed up. HTTPS is non-negotiable in production -- TLS encrypts everything in transit.

WebSockets start as an HTTP request that gets "upgraded" to a persistent two-way connection. Instead of the client polling the server every few seconds ("any updates? how about now?"), the server pushes data the moment something changes. I use WebSockets for chat, notifications, and live dashboards.

Loading diagram...

AMQP (message queuing) is for async processing. A producer drops a message on a queue, and a consumer picks it up whenever it's ready. The three routing patterns matter: direct (one-to-one), fanout (broadcast to all), and topic (pattern matching). I reach for this when I need to decouple services or handle bursty workloads -- the queue absorbs the spike and the consumer processes at its own pace.

gRPC on HTTP/2 gives you multiplexed streams over a single connection. Multiple requests and responses can fly back and forth simultaneously without head-of-line blocking. The catch: browser support for HTTP/2's full feature set is inconsistent, which is why gRPC stays behind the API gateway.

Design principles that actually hold up

After years of building APIs, I've landed on four things that consistently matter:

Consistency over cleverness. Pick a naming convention (camelCase, snake_case, whatever) and stick to it everywhere. Use the same error format across every endpoint. When a developer learns one corner of your API, every other corner should feel familiar.

Simplicity as a feature. If someone needs to read docs to understand your endpoint, the endpoint is too complicated. The best APIs I've used felt like I already knew how they worked before I read anything.

Security from day one. Bolting security on later never works well. Auth, rate limiting, input validation -- these go in the first version, not "v2 when we have time."

Performance by default. Paginate list endpoints. Compress responses. Set cache headers. These aren't optimizations; they're baseline expectations.

The design process itself usually follows one of three patterns. Top-down (start from user workflows), bottom-up (start from the data model), or contract-first (define the request/response shapes before writing code). I prefer contract-first because it forces alignment before anyone writes a line of implementation.

REST resource modeling

The single most common REST mistake is putting verbs in URLs. The HTTP method already is the verb.

Bad:   GET  /getProducts
       POST /createOrder
       POST /deleteUser/123

Good:  GET    /products
       POST   /orders
       DELETE  /users/123

Your URLs should read like a file system of nouns. /users/123/orders means "orders belonging to user 123." The method tells you what you're doing with them.

For listing endpoints, query parameters handle filtering and pagination: /products?category=tech&sort=price_asc&page=2&limit=10. PUT replaces the entire resource. PATCH updates specific fields. Pick one and be consistent.

Versioning deserves a mention because you will need it eventually. I use URI prefixes (/v1/products, /v2/products). Some people use headers. The prefix approach is more visible and easier to debug, and I've never had a reason to regret it.

GraphQL: the schema is the contract

Where REST spreads its contract across many endpoints, GraphQL concentrates it in a typed schema. Queries read data, mutations write data, and subscriptions push real-time updates.

Loading diagram...

The piece that trips people up is error handling. A GraphQL response always comes back as HTTP 200. If a field fails, that field returns null and the error goes in an errors array alongside the partial data. Your client code needs to handle both the data and the errors in every response.

The security concern specific to GraphQL is query depth. A malicious client can send a deeply nested query (user -> friends -> friends -> friends -> posts -> comments -> ...) that explodes your server's memory. Set a depth limit. Most GraphQL servers have a plugin for this.

Authentication: proving who you are

Loading diagram...

The progression from worst to best, roughly:

Basic auth sends username:password as base64 in every request. It's only acceptable over HTTPS, and even then, only for internal tools or machine-to-machine calls.

API keys are strings you pass in a header. Simple, but dangerous if leaked -- there's no built-in expiration, no scoping, no rotation. Fine for read-only public APIs. Not for anything with write access.

Session-based auth creates a session on the server after login and sends a session ID as a cookie. The server stores session state, usually in Redis because of its speed and built-in TTL (time-to-live) for automatic expiration. The downside is statefulness -- every request needs to hit the session store.

JWT tokens are the current standard. A signed JSON object containing the user's identity and claims. The server doesn't store anything -- it just verifies the signature. Access tokens are short-lived (15 minutes), refresh tokens are long-lived (days to weeks). Store refresh tokens in HTTP-only cookies, not localStorage, because JavaScript can't read HTTP-only cookies and that blocks XSS attacks.

Authorization: deciding what you can do

Authentication tells you who someone is. Authorization tells you what they're allowed to do.

RBAC (role-based) assigns permissions to roles. Admin can do everything, Editor can create and update, Viewer can only read. Simple, covers 80% of use cases.

ABAC (attribute-based) makes decisions dynamically based on attributes: department, location, time of day, risk score. More flexible, much more complex.

ACL (access control lists) attaches permissions to specific resources. Google Drive does this -- each file has its own list of who can view, edit, or share it.

Loading diagram...

OAuth2 is an authorization framework, not an authentication protocol -- a distinction people constantly get wrong. It lets a user grant a third-party app limited access to their resources. "Allow Vercel to read my GitHub repos" is OAuth2. It gives you an access token for permission, but it doesn't tell you who the user is.

OpenID Connect (OIDC) is the authentication layer on top of OAuth2. It adds an ID token (a JWT with the user's identity). If OAuth2 is a hotel key card (access to the room), OIDC is the key card plus the guest register (who's in the room).

SSO (single sign-on) is a UX pattern, not a protocol. It means one login works across multiple services. Usually built with OIDC or SAML under the hood.

Security: the layers that keep things running

Security isn't a feature you add. It's a quality of the entire system. Here's what a production API needs:

Rate limiting caps how many requests a client can make per time window. Without it, a single client can overwhelm your server, whether intentionally (DDoS) or accidentally (a bug in a retry loop). I usually set limits per API key and per IP.

CORS (Cross-Origin Resource Sharing) controls which browser domains can call your API. By default, browsers block cross-origin requests. Your API needs to explicitly allow the domains that should have access. Set this too loose (Access-Control-Allow-Origin: *) and any website can make requests to your API.

Injection prevention is about never trusting user input. Use parameterized queries or an ORM -- never concatenate user input into a SQL string. This applies to NoSQL too; MongoDB has its own injection patterns.

CSRF tokens prevent a malicious site from making requests on behalf of a logged-in user. The server generates a unique token for each form/session, and the request must include it. Since the attacker's site can't read the token, it can't forge the request.

Input validation blocks XSS (cross-site scripting) by sanitizing user-submitted content. Any field that accepts text and renders it back to other users (comments, profile bios, forum posts) is an attack surface. Strip or escape HTML and JavaScript before storing or rendering.

WAFs (web application firewalls) sit at the edge and filter traffic based on patterns. They catch common attack signatures before the request reaches your application.

VPNs keep internal admin tools off the public internet entirely. Your admin dashboard, your database management UI, your monitoring tools -- these should only be accessible through a VPN.

Loading diagram...

None of these layers alone is sufficient. The point is depth -- if one layer fails or gets bypassed, the next one catches it. The attacker has to get through all of them, and that's a much harder problem than getting through any single one.

The uncomfortable truth

Most API security breaches don't happen because of sophisticated attacks. They happen because someone forgot to validate input on one endpoint, or left an API key in a GitHub commit, or set CORS to allow everything during development and never tightened it. The boring, repetitive work of applying these patterns consistently across every endpoint is what actually keeps systems secure. There's no shortcut for that.