Guides
Blackbox Atlas is a technical how-to and infrastructure guide library. Each guide has prerequisites, steps, verification, and troubleshooting. Use it when you need to install, configure, secure, or fix systems—databases, containers, monitoring, cloud, networking, and servers.
271 guides across 9 topics. Difficulty: 184 easy, 81 medium, 6 hard.
By topic
Accounts access (31)
- How to create access for applications without IAM users
Grant applications access to AWS without IAM user access keys: use IAM roles for EC2, Lambda, ECS, and other services so workloads assume a role and get temporary credentials. Use this for all new and existing apps to avoid long-lived keys and meet least privilege.
- How to connect AWS to an external identity provider
Connect AWS IAM Identity Center to an external identity provider (IdP) such as Active Directory, Okta, or Azure AD: configure SAML 2.0 or OIDC, set attribute mapping for user and group, and set Identity Center as the identity source. Use this so users sign in with corporate credentials and access AWS via SSO.
- How to perform emergency break-glass access safely
Execute controlled emergency access to the AWS root account when IAM or IAM Identity Center is unavailable. Covers when to use break-glass, how to sign in as root with MFA, and how to restore normal access and audit the event.
- How to enable AWS IAM Identity Center (SSO)
Enable AWS IAM Identity Center (SSO) in your organization so users sign in once and access assigned AWS accounts and applications. Configure the identity source, create permission sets, and assign users or groups to accounts. Use this for centralized access without creating IAM users per account.
- How to enable and test MFA on the AWS root account
Enable multi-factor authentication on the AWS root user, verify the MFA device works, and confirm sign-in requires the second factor. Use this after securing the root account and before any break-glass procedure.
- How to create and rotate IAM user access keys
Create IAM user access keys for CLI and API use, rotate them on a schedule, and deactivate or delete old keys. Use this for human or script access that cannot use IAM roles; prefer roles for applications.
Backups recovery (5)
- Backup automation basics
Automate backup jobs with cron, systemd timers, or cloud schedulers so backups run on a schedule. Use scripts or managed services; alert on failure; verify restores periodically. Use this when moving from manual backups to reliable automated runs.
- Disaster recovery basics
Define RTO and RPO; choose a DR strategy (backup and restore, pilot light, warm standby, or multi-site). Use backups and runbooks to recover from total loss of a system or site. Use this when planning DR or when explaining options to stakeholders.
- Ransomware response (backup and restore)
When ransomware encrypts or destroys data, isolate affected systems, determine scope, and restore from a backup that is known to be clean and immutable. Do not pay the ransom without legal and executive decision; focus on recovery from backups. Use this when building a ransomware response plan or during an incident.
- Backups vs snapshots (when to use which)
Snapshots are point-in-time copies of a volume or disk, often in the same system or cloud; backups are copies stored separately, often with retention and restore verification. Use both for different recovery scenarios. Use this when designing backup strategy or explaining the difference to stakeholders.
- How to verify backups with restore tests
A backup is only useful if restore works. Run periodic restore tests: restore to a test environment, verify data and application integrity, and document the process. Use this when setting up a backup schedule or when improving recovery confidence.
Cloud aws core (26)
- Cost and blast radius control in AWS
Limit cost overruns and blast radius with billing alerts, quotas, and organizational boundaries. Use billing alarms, service quotas, and separate accounts or OUs for prod vs non-prod. Use this when designing multi-account or when preventing runaway cost or impact.
- EBS basics (volumes, types, attach to EC2)
EBS provides block storage for EC2 instances. Create a volume in an AZ, attach it to an instance in the same AZ, and mount it inside the OS. Use this when you need persistent disk for an instance or when sizing or changing the root or data volume.
- How to connect to EC2 via SSH
Connect to a Linux EC2 instance using the key pair you chose at launch. Set permissions on the private key, use the correct user name for the AMI, and fix security group or network if connection fails. Use this when you cannot SSH to a new or existing instance.
- EC2 instance types and when to use them
EC2 instance types (t3, m5, c5, r5, etc.) offer different CPU, memory, and storage profiles. Choose by workload: general purpose, compute-optimized, memory-optimized, or storage-optimized. Use this when sizing a new instance or right-sizing for cost.
- How to launch an EC2 instance
Launch an Amazon EC2 instance from the console or CLI: choose AMI, instance type, key pair, and network. Use this when you need a new Linux or Windows server in AWS and want to get it running with the right size and access (SSH or RDP key).
- How to block S3 public access
Keep S3 buckets private by enabling Block Public Access at the account and bucket level. Prevents accidental public read or write from bucket policy or ACLs. Use this when creating or auditing S3 buckets so data is not exposed to the internet.
Containers core (30)
- Docker in CI (build and push images)
In CI, build Docker images with docker build, tag with registry and version, and push with docker push. Use a registry (Docker Hub, ECR, GCR) and authenticate with a token or role. Use this when automating image builds in a pipeline.
- Docker Compose basics (multi-container stack)
Define a multi-container stack in a compose file (docker-compose.yml): services, networks, volumes. Run with docker compose up -d; manage with docker compose down and docker compose logs. Use this when running an app with a database, cache, or multiple services on one host.
- How to debug a Docker container
Inspect a running or exited container with docker logs, docker exec, and docker inspect. Check exit code, environment, and resource usage. Use this when a container fails to start, exits unexpectedly, or when you need to see what is running inside.
- Dockerfile basics (build an image)
Write a Dockerfile with FROM, RUN, COPY, and CMD to build a container image. Use multi-stage builds to keep the final image small. Use this when creating a custom image for your application or when optimizing build time and image size.
- Docker image and container cleanup
Remove unused images, containers, volumes, and networks with docker prune. Free disk space and avoid accumulation of dangling images and stopped containers. Use this when the Docker disk usage is high or when you want to keep the host clean.
- How to install Docker on Linux
Install Docker Engine on Debian, Ubuntu, or RHEL using the official Docker repository. Add your user to the docker group so you can run containers without root. Use this when setting up a host for containers or when you need a specific Docker version.
Databases core (30)
- Database backup verification checklist
Use this checklist to verify database backups are configured, running, and restorable. Covers backup schedule, retention, restore test, and access control. Run periodically and before major changes or go-live.
- Database connection pooling basics
Use a connection pool (PgBouncer, ProxySQL, or application-level) so many application threads share a smaller number of database connections. Reduces connection churn and stays under max_connections. Use this when you have many app instances or high concurrency and hit connection limits.
- Database disaster recovery basics
Define RPO and RTO for databases; use backups and optionally replication to meet them. Restore from backup or fail over to a replica; document and test the procedure. Use this when planning or executing database recovery after a failure or data loss.
- Database migrations basics (schema changes safely)
Apply schema and data changes in versioned, reversible steps using migration scripts or tools (e.g. Flyway, Liquibase, or custom SQL). Test on staging first; backup before production migration. Use this when introducing or evolving schema in a way that is auditable and rollback-safe.
- MySQL and MariaDB backup basics (mysqldump and physical)
Back up MySQL or MariaDB with mysqldump for logical backups or use filesystem snapshots with FLUSH TABLES WITH READ LOCK for consistent physical backup. Use this when setting up backup jobs or when you need to restore a database.
- How to create a MySQL or MariaDB user and grant permissions
Create a MySQL user with CREATE USER and grant privileges with GRANT on databases, tables, or global. Restrict by host (e.g. 'app'@'10.0.0.%'). Use this when onboarding an application or implementing least privilege access to MySQL or MariaDB.
Monitoring basics (27)
- Capacity planning basics
Use historical metrics and growth trends to plan for future capacity: when will disk, CPU, or memory be exhausted? Use this when sizing new systems or when deciding when to scale or upgrade to avoid running out of resources.
- How to set up disk, CPU, and memory alerts
Define alert rules for disk space, CPU usage, and memory (or swap) so you are notified before outages. Use thresholds and hysteresis to avoid flapping. Use this when configuring a monitoring system (e.g. Prometheus and Alertmanager, or cloud monitoring).
- Incident triage (when an alert fires)
When an alert fires, triage quickly: confirm the alert is real, identify scope and impact, and start the right runbook or escalation. Use this as the standard process for handling monitoring alerts and reducing MTTR.
- Logs and journald for monitoring
Use journald (journalctl) to query and forward logs; use log aggregation to centralize logs from multiple hosts for search and alerting. Use this when setting up log-based monitoring or when correlating events across services.
- Monitoring checklist (before go-live)
Use this checklist before putting a system into production: metrics collected, key alerts defined, logs centralized, health checks in place, runbooks written, and on-call knows how to respond. Ensures you can detect and respond to incidents.
- System metrics basics (CPU, memory, disk)
Collect and interpret basic system metrics: CPU usage, memory (used, available, swap), and disk usage. Use top, free, df, and similar tools or an agent (e.g. Node Exporter) for monitoring. Use this when setting up monitoring or when diagnosing resource-related issues.
Networking basics (46)
- How to allow a port safely
Allow inbound traffic to a specific port (e.g. 80, 443) without locking yourself out: allow your admin port (SSH) first, then allow the new port, and verify from a second session or after a short test. Use UFW, nftables, or iptables; document the rule so it can be audited. Use this when opening a service to the network.
- Client vs server networking
Clients initiate connections to a server's IP and port; servers listen on a port and accept connections. Learn the roles so you can configure listen addresses, firewall rules, and NAT correctly and debug 'connection refused' vs 'no route to host.' Use this when deploying or troubleshooting any service.
- Common firewall mistakes
Avoid locking yourself out, allowing too much, or misordering rules: allow SSH before enabling or before default deny; do not allow 0.0.0.0/0 to all ports; put allow before deny for the same traffic; allow established/related for outbound. Use this as a checklist so you do not repeat these errors when configuring host or network firewalls.
- How to deny traffic safely
Deny specific traffic (by port, source, or protocol) without breaking admin access or established connections. Add deny rules after allow rules for required traffic; use default-deny for inbound when possible. Use this when you need to block a port or a hostile source while keeping the host manageable.
- DNS checklist
Use this checklist when configuring or troubleshooting DNS on a host: confirm nameservers in resolv.conf or the managing source, ensure resolvers are reachable, test with getent and dig, and allow DNS in the firewall if needed. Ensures resolution works for the system and applications. References DNS concept, test, and fix guides.
- How to change a system IP address
Change the IP address of a Linux host by updating netplan, NetworkManager, or /etc/network/interfaces, then applying the config and optionally restarting networking. Use a temporary change with ip addr to test before making it persistent. Have console access if changing the address you are connected from. Use this when renumbering or moving a host to another subnet.
Security basics (23)
- Backup security considerations
Backups contain the same sensitive data as production; protect them with access control, encryption, and integrity checks. Ensure backups are not writable by the same threat that could corrupt production. Use this when designing or auditing backup and restore.
- Encryption at rest vs in transit
Data in transit is encrypted between client and server (e.g. TLS); data at rest is encrypted on disk or in storage. Both are needed for full protection. Use this when designing or auditing where encryption is required.
- Encryption key management basics
Encryption keys must be stored and used securely: separate from the data, access-controlled, and rotated per policy. Use a KMS or vault; avoid storing keys in config or code. Use this when enabling encryption or designing key lifecycle.
- File permissions explained (Linux)
Linux file permissions are read, write, execute for owner, group, and others. Use chmod and chown to set them; restrict sensitive files (keys, config) to owner-only read. Use this when fixing permission denied or securing config and keys.
- How to audit who did what
Use logs and audit trails to see who accessed what and when. Enable audit logging for auth, admin actions, and data access; centralize and protect logs so they survive and are usable after an incident. Use this when investigating an incident or building auditability.
- How to check for suspicious logins
Review auth logs for failed logins, logins from unexpected IPs or times, and new device or location. Use centralized logs and alerts to detect takeover or abuse. Use this when investigating a reported incident or building detection.
Servers linux (53)
- How to use cron and scheduled tasks
Schedule recurring commands with cron: edit crontab -e, use the five fields (minute hour day month weekday), and log output to a file or to journal. Use systemd timers as an alternative for services. Verify the job runs and check logs when it fails.
- Disk full: how to recover
Recover when the root or critical partition is full: free space quickly by clearing logs and caches, find and remove or move large files, then fix rotation and monitoring so it does not recur. Use this when the system is read-only or services fail with no space.
- Linux filesystem hierarchy and essential paths
Learn the standard Linux directory layout: /etc for config, /var for variable data, /home for users, /tmp for temporary files. Find where config, logs, and binaries live so you can back up, restore, and troubleshoot without guessing.
- How to back up files with tar and rsync
Create full or incremental backups with tar (archives) and rsync (mirror or incremental), store them off the host, and verify with listing or checksum. Use this for config and data backup before changes or for scheduled retention.
- Corrupt package database or package: how to recover
Fix broken apt or dnf state: repair dpkg with apt --fix-broken or dpkg --configure -a; clear lock files; reinstall the package or restore the package list from backup. Use this when apt or dnf fails with dependency or status errors.
- How to check disk usage and clean up space
Find what is using disk space with du and df, locate large files and dirs, and free space by removing logs, caches, and old packages. Use this when the disk is full or you need to prevent full before it happens.
All guides (newest first)
- How to create access for applications without IAM users
Grant applications access to AWS without IAM user access keys: use IAM roles for EC2, Lambda, ECS, and other services so workloads assume a role and get temporary credentials. Use this for all new and existing apps to avoid long-lived keys and meet least privilege.
- How to connect AWS to an external identity provider
Connect AWS IAM Identity Center to an external identity provider (IdP) such as Active Directory, Okta, or Azure AD: configure SAML 2.0 or OIDC, set attribute mapping for user and group, and set Identity Center as the identity source. Use this so users sign in with corporate credentials and access AWS via SSO.
- How to perform emergency break-glass access safely
Execute controlled emergency access to the AWS root account when IAM or IAM Identity Center is unavailable. Covers when to use break-glass, how to sign in as root with MFA, and how to restore normal access and audit the event.
- How to enable AWS IAM Identity Center (SSO)
Enable AWS IAM Identity Center (SSO) in your organization so users sign in once and access assigned AWS accounts and applications. Configure the identity source, create permission sets, and assign users or groups to accounts. Use this for centralized access without creating IAM users per account.
- How to enable and test MFA on the AWS root account
Enable multi-factor authentication on the AWS root user, verify the MFA device works, and confirm sign-in requires the second factor. Use this after securing the root account and before any break-glass procedure.
- How to create and rotate IAM user access keys
Create IAM user access keys for CLI and API use, rotate them on a schedule, and deactivate or delete old keys. Use this for human or script access that cannot use IAM roles; prefer roles for applications.
- How to attach managed policies to an IAM user
Attach AWS managed or customer-managed policies to an IAM user via console or CLI, and verify effective permissions. Use this to grant or change permissions without editing inline policies; prefer groups for multiple users with the same role.
- How to attach policies to an IAM role
Attach managed or inline policies to an IAM role so the role has the permissions needed when assumed by a service or principal. Use the console or CLI to attach and verify; prefer managed policies and least privilege.
- How to audit IAM role trust policies
Review and tighten IAM role trust policies: who can assume the role, under what conditions, and whether trust is least privilege. Use get-role and inspect AssumeRolePolicyDocument; remove overly broad principals and add conditions (e.g. MFA, source ARN) where appropriate.
- How to assume an IAM role using AWS CLI
Assume an IAM role from the AWS CLI to get temporary credentials: use assume-role (or assume-role-with-saml/web-identity), set the returned credentials in the environment or profile, and run commands as the role. Use this for cross-account or delegated access without long-lived keys for the role.
- How to audit IAM user permissions
Audit effective permissions for an IAM user: list attached and group policies, simulate actions with the IAM policy simulator, and use last-used for access keys. Use this to verify least privilege and before revoking or changing access.
- How to enforce MFA for IAM users
Require multi-factor authentication for IAM users signing in to the console or calling sensitive APIs. Use an IAM policy condition that allows actions only when MFA is present, and assign MFA devices to every human user.
- How to find leaked or compromised AWS credentials
Detect AWS access keys or credentials that may be leaked or compromised: search code and public repos, check CloudTrail for anomalous use, use AWS credentials report and last-used, and revoke keys immediately when found. Use this when you suspect a key was exposed or for periodic audits.
- How to find and remove unused IAM users
Identify IAM users that have not signed in or used access keys recently using last-used timestamps and CloudTrail, then safely remove or deactivate them. Use this to reduce attack surface and meet compliance; avoid removing users that own critical resources.
- How to create an IAM role for EC2
Create an IAM role that EC2 instances can assume via an instance profile: set the trust policy to ec2.amazonaws.com, attach least-privilege policies, and attach the instance profile to the instance. Use this so applications on EC2 access AWS APIs without access keys.
- How to create an IAM role for AWS services
Create an IAM role that an AWS service can assume to perform actions on your behalf: set the trust policy to the service principal, attach least-privilege permissions, and use the role in the service configuration. Use this for service-to-service access without long-lived keys.
- How to create an IAM role for Lambda
Create an IAM role for AWS Lambda so the function can call AWS APIs: trust policy for lambda.amazonaws.com, attach execution and resource policies, and set the role as the function's execution role. Use this so Lambda runs without access keys.
- How to create an IAM user with least privilege
Create an IAM user with only the permissions needed for their role: no full admin unless required, use groups and managed policies, and enable MFA. Use this for human operators who need console or CLI access without using root.
- How to lock down long-lived AWS credentials
Reduce risk from IAM user access keys and long-lived credentials: enforce MFA, restrict with conditions, scope policies to specific resources, and plan migration to IAM roles. Use this when you must keep some long-lived keys but want to minimize blast radius and misuse.
- How to migrate from access keys to IAM roles
Migrate applications from IAM user access keys to IAM roles so workloads use temporary credentials and keys can be removed. Use instance profiles for EC2, execution roles for Lambda, and OIDC or assume-role for external/on-prem; then rotate off and delete the old keys.
- How to remove or decommission an IAM role
Safely remove or decommission an IAM role: detach all policies, delete inline policies, remove the role from instance profiles, then delete the role. Use this when a workload is retired or the role is consolidated; ensure no resources still assume or reference the role.
- How to revoke federated access immediately
Revoke a user's access to AWS when they use IAM Identity Center (SSO) or another federated identity: remove the user from IdP groups or disable the user in the IdP, remove Identity Center assignments, and invalidate existing sessions. Use this when someone leaves or when federated access must be cut off immediately.
- How to revoke an IAM user immediately
Revoke all access for an IAM user without deleting the user: deactivate console password, delete all access keys, and detach MFA. Use this when someone leaves or credentials are compromised; optionally delete the user after revoking.
- How to rotate access keys used by applications
Safely rotate IAM access keys used by applications and automation: create a second key, update all consumers, verify they use the new key, then deactivate and delete the old key. Use this on a schedule or when a key may be compromised; prefer migrating to IAM roles to avoid long-lived keys.
- How to secure the AWS root account
Lock down the AWS root account: enable MFA, remove access keys, create an IAM admin user for daily use, and apply a root-usage alert. Use this guide before using root for anything except account-level tasks and break-glass.
- How to use cron and scheduled tasks
Schedule recurring commands with cron: edit crontab -e, use the five fields (minute hour day month weekday), and log output to a file or to journal. Use systemd timers as an alternative for services. Verify the job runs and check logs when it fails.
- Disk full: how to recover
Recover when the root or critical partition is full: free space quickly by clearing logs and caches, find and remove or move large files, then fix rotation and monitoring so it does not recur. Use this when the system is read-only or services fail with no space.
- Linux filesystem hierarchy and essential paths
Learn the standard Linux directory layout: /etc for config, /var for variable data, /home for users, /tmp for temporary files. Find where config, logs, and binaries live so you can back up, restore, and troubleshoot without guessing.
- How to back up files with tar and rsync
Create full or incremental backups with tar (archives) and rsync (mirror or incremental), store them off the host, and verify with listing or checksum. Use this for config and data backup before changes or for scheduled retention.
- Corrupt package database or package: how to recover
Fix broken apt or dnf state: repair dpkg with apt --fix-broken or dpkg --configure -a; clear lock files; reinstall the package or restore the package list from backup. Use this when apt or dnf fails with dependency or status errors.
- How to check disk usage and clean up space
Find what is using disk space with du and df, locate large files and dirs, and free space by removing logs, caches, and old packages. Use this when the disk is full or you need to prevent full before it happens.
- How to configure a firewall with ufw or nftables
Enable and configure a host firewall with ufw (Ubuntu/Debian) or nftables (RHEL/modern): allow SSH first, then HTTP/HTTPS or app ports; deny by default. Verify with ufw status or nft list ruleset so the server is protected and still reachable.
- How to add and remove Linux users
Create and delete user accounts with useradd and userdel (or adduser on Debian), set passwords and SSH keys, and assign users to groups. Use this to grant shell or SFTP access without sharing root and to clean up when someone leaves.
- How to find and diagnose processes on Linux
List and filter processes with ps, pgrep, and top; find what is using CPU or memory; send signals (kill, killall) and interpret state (D, Z, R). Use this to debug high load, stuck processes, or what is bound to a port.
- How to configure networking: static and DHCP
Set a static IP or use DHCP on Linux using netplan (Ubuntu), nmcli (NetworkManager), or /etc/network/interfaces. Verify with ip addr and ping; ensure DNS and default route work so the server is reachable and can reach the internet.
- High CPU: how to diagnose and fix
Find which process is using CPU with top, ps, or pidstat; identify the thread or code path; fix the cause (loop, leak, or load) or throttle and scale. Use this when the system is slow or load average is high and you need to pinpoint the consumer.
- How to install a package with apt or dnf
Install a single package on Debian/Ubuntu (apt install) or RHEL/Fedora (dnf install), resolve dependencies, and verify the binary and config location. Use this when you need a specific tool or service and the distro provides it.
- Incident response checklist for Linux servers
When something is wrong: preserve logs and state, identify scope (one host or many), restore service or isolate, then root-cause and fix. Use this so incidents are handled consistently and evidence is kept for post-mortem.
- Log rotation and journald configuration
Prevent logs from filling the disk: configure logrotate for app logs and limit journald size (SystemMaxUse, RuntimeMaxUse). Verify rotation runs and disk usage stays bounded so the system does not crash from a full disk.
- How to manage services with systemd
Start, stop, restart, and enable systemd units; check status and read logs with journalctl. Use this to bring services up after boot, restart after config changes, and diagnose why a unit failed.
- How to mount disks and configure fstab
Mount a disk or partition manually with mount, add an entry to /etc/fstab for boot mount, and verify with mount and df. Use this when adding data disks or moving data to a new volume so the system mounts it correctly after reboot.
- How to kill and send signals to processes
Send SIGTERM or SIGKILL to a process with kill or killall; use signals to stop, reload, or force-stop. Understand what each signal does so you can stop runaway or stuck processes safely.
- Network connectivity failure: how to diagnose
When the server cannot reach the network or others cannot reach it: check interface, IP, route, DNS, and firewall. Use ping, ip, ss, and traceroute to isolate whether the problem is local, routing, or remote.
- LVM basics: create and extend volume groups
Create physical volumes, volume groups, and logical volumes with pvcreate, vgcreate, lvcreate; extend an LV when you add disk or need more space. Use this when you want flexible storage that can grow or span disks.
- Out of memory (OOM): how to diagnose and fix
Diagnose OOM: check dmesg and journalctl for oom-killer, identify the process killed and what was using memory; fix by adding RAM, limiting process memory, or fixing leaks. Use this when the system kills processes or becomes unresponsive and logs show out-of-memory.
- Package dependency conflicts: how to resolve
When apt or dnf reports dependency conflicts or broken packages: identify the conflicting packages, choose to remove or replace, use --fix-broken or manual dependency resolution. Use this when install or upgrade fails due to dependency loops or version conflicts.
- Linux packages and system updates
Install, upgrade, and remove packages on Debian/Ubuntu (apt) and RHEL/Fedora (dnf/yum). Run updates safely: check changelogs, take backups, and verify services after reboot. Use this to keep servers patched without breaking running workloads.
- How to partition and format a disk
Create a partition table and partitions with fdisk or parted, format with mkfs.ext4 or mkfs.xfs, then mount and add to fstab. Use this when adding a new data disk so the system can use it safely after reboot.
- Permission denied: how to fix it on Linux
Diagnose and fix permission denied on files, directories, and SSH. Check owner, group, and bits with ls; fix with chmod and chown; for SSH check authorized_keys and server-side permissions. Never use chmod 777 as a fix.
- Pre-deployment checklist for Linux servers
Before going live: verify hostname, time, network, firewall, SSH, updates, backups, monitoring, and app config. Use this so nothing is left in default or broken state when the server is put in production.
- Read-only filesystem: how to recover
When a filesystem is mounted read-only due to errors or full disk: free space if full, run fsck to repair errors, then remount read-write. Use this when writes fail with 'read-only file system' or after an unclean shutdown.
- How to resolve DNS and use /etc/hosts
Configure and test DNS resolution: set nameservers in resolv.conf or netplan, use getent and dig to verify; add static entries in /etc/hosts when you need a fixed name without DNS. Use this when the server cannot resolve hostnames or you need to override a name.
- How to restore files from a backup
Restore from a tar archive or rsync backup to the original or a new path; verify permissions and service config; bring services back up and confirm the app works. Use this after data loss or to roll back a bad change.
- How to secure an SSH server
Harden sshd: disable password auth and root login, allow only key-based auth, use a non-default port if desired, and restrict users with AllowUsers. Verify with a new session before closing the current one so you do not lock yourself out.
- Linux server hardening checklist
Checklist for securing a new or existing Linux server: SSH key-only and no root login, firewall default deny, updates, non-root service user, minimal packages, and logging. Use this before putting a server in production or during a security review.
- Servers Linux topic checklist
Convergence checklist for the servers-linux topic: ensure you have covered fundamentals, operations, failure recovery, and hardening. Use this to confirm you have the right guides for your role and to find the next step when you are stuck.
- Service won't start: how to fix it
Diagnose why a systemd unit fails to start: read journalctl -u UNIT, check config and paths, fix permissions and dependencies (sockets, mounts), then restart. Use this when systemctl start fails or the unit is in failed state.
- Linux shell and essential commands
Use the Linux shell to navigate, inspect, and change the system: cd, ls, cat, grep, find, ps, systemctl, and redirection. Verify commands with exit codes and output so you can operate and debug servers without a GUI.
- Slow performance: how to diagnose
When the server is slow: check load average, CPU, memory, disk I/O, and network; use top, iostat, and vmstat to find the bottleneck. Use this to decide whether to scale, optimize, or fix a bug.
- Sudo and privilege escalation on Linux
Grant and revoke sudo access: add users to the sudo or wheel group, or add rules in /etc/sudoers and /etc/sudoers.d. Use visudo to avoid syntax errors. Restrict commands or require a password so only authorized users run as root.
- What is a Linux server and when to use one
Define a Linux server as a headless machine or VM that runs services and accepts network requests. Learn when to choose a server over a workstation, and what access (SSH, console) and skills you need before provisioning or installing Linux for services.
- Understanding Linux users, groups, and permissions
Understand how Linux file and process permissions work: owner, group, and others; read, write, execute; numeric modes and chmod/chown. Use this to fix permission denied errors and to grant least privilege to services and users without using chmod 777.
- Why can't I SSH into the server
Diagnose SSH connection failures: check network (ping, port open), sshd running, firewall allows port 22, and key or password accepted. Use this when SSH connection is refused, times out, or permission denied so you can fix or use console access.
- How to allow a port safely
Allow inbound traffic to a specific port (e.g. 80, 443) without locking yourself out: allow your admin port (SSH) first, then allow the new port, and verify from a second session or after a short test. Use UFW, nftables, or iptables; document the rule so it can be audited. Use this when opening a service to the network.
- Client vs server networking
Clients initiate connections to a server's IP and port; servers listen on a port and accept connections. Learn the roles so you can configure listen addresses, firewall rules, and NAT correctly and debug 'connection refused' vs 'no route to host.' Use this when deploying or troubleshooting any service.
- Common firewall mistakes
Avoid locking yourself out, allowing too much, or misordering rules: allow SSH before enabling or before default deny; do not allow 0.0.0.0/0 to all ports; put allow before deny for the same traffic; allow established/related for outbound. Use this as a checklist so you do not repeat these errors when configuring host or network firewalls.
- How to deny traffic safely
Deny specific traffic (by port, source, or protocol) without breaking admin access or established connections. Add deny rules after allow rules for required traffic; use default-deny for inbound when possible. Use this when you need to block a port or a hostile source while keeping the host manageable.
- DNS checklist
Use this checklist when configuring or troubleshooting DNS on a host: confirm nameservers in resolv.conf or the managing source, ensure resolvers are reachable, test with getent and dig, and allow DNS in the firewall if needed. Ensures resolution works for the system and applications. References DNS concept, test, and fix guides.
- How to change a system IP address
Change the IP address of a Linux host by updating netplan, NetworkManager, or /etc/network/interfaces, then applying the config and optionally restarting networking. Use a temporary change with ip addr to test before making it persistent. Have console access if changing the address you are connected from. Use this when renumbering or moving a host to another subnet.
- How to check listening ports on Linux
List which ports are listening and which process owns them using ss or netstat. Use this to confirm a service is bound to the expected address and port, to find what is using a port, or to verify before opening a firewall. Use ss -tlnp for TCP and -ulnp for UDP.
- How to diagnose no internet access
When a host cannot reach the internet: check default route, DNS resolution, and connectivity to the gateway and a public IP. Use ping, ip route, getent, and traceroute to isolate whether the failure is routing, DNS, or local firewall. Use this before changing config so you fix the right layer.
- How to enable or disable IPv6
Enable or disable IPv6 on Linux via sysctl (net.ipv6.conf.all.disable_ipv6 and per-interface), netplan, or NetworkManager. Use this when you need to turn off IPv6 for compatibility or security, or to turn it back on after it was disabled. Test with ping6 and ip -6 addr.
- How to configure a static IP
Set a static IPv4 address on Linux using netplan, NetworkManager, or /etc/network/interfaces. Include address, prefix length, default gateway, and nameservers so the host has stable addressing and can reach other networks and resolve names. Use this when deploying a server or when DHCP is not desired.
- How to fix broken DNS on Linux
When resolution fails on Linux: fix /etc/resolv.conf or the source that manages it (netplan, NetworkManager, systemd-resolved). Ensure nameservers are reachable and that firewall allows outbound DNS. Use this when getent or dig fails and you need the system to resolve names again.
- Firewall checklist
Use this checklist when enabling or changing a host firewall: allow SSH (or admin port) first, set default deny inbound, allow only required ports and established/related, verify rule order, test in a second session, and document rules. Avoid lockout and over-permissive rules by following the steps in order. References UFW, allow/deny, verify, and lock-down guides.
- Fix no route to host
No route to host means the host has no route to the destination (or the destination is down/unreachable). Check the routing table, default route, and that the destination or gateway is reachable. Use this when you get 'No route to host' or 'Network is unreachable' so you can fix routing or connectivity.
- Fix connection refused
Connection refused means the destination host received the connection attempt but no process is listening on that port (or the kernel rejected it). Find what should be listening, start the service or fix the port, and ensure the firewall is not dropping before the listener. Use this when you get 'Connection refused' for SSH, HTTP, or any TCP service.
- How UFW works (conceptual)
UFW (Uncomplicated Firewall) is a front end to iptables or nftables that uses allow/deny rules and a default policy. Learn how rules are ordered, how default deny works, and how to allow a port or subnet so you can use UFW correctly and avoid locking yourself out. Use this before enabling UFW on a server.
- How to inspect the routing table
View the kernel routing table with ip route or route -n so you can verify default route, directly connected networks, and static routes. Use this when debugging 'no route to host,' adding a new network, or confirming what path traffic will take. Interpret the output to see gateway and interface.
- IP addresses and CIDR notation
IPv4 addresses are 32-bit; IPv6 are 128-bit. CIDR (e.g. 192.168.1.0/24) denotes an address and its network prefix length so you can compute the network, broadcast, and host range. Use this when configuring interfaces, routing, or firewall rules that use subnets.
- iptables explained conceptually
iptables is the legacy Linux packet filter: tables (filter, nat), chains (INPUT, OUTPUT, FORWARD), and rules (match criteria and target accept/drop/reject). Learn the model so you can read rules and understand how UFW or other tools map to it. Prefer nftables or UFW for new config. Use this when debugging or migrating from iptables.
- Latency vs bandwidth vs packet loss
Latency is round-trip or one-way delay; bandwidth is throughput capacity; packet loss is the fraction of packets that do not arrive. Learn how each affects applications and how to measure them so you can diagnose slow or unreliable connections. Use this when tuning or debugging network performance.
- Lock down a server to SSH only
Restrict inbound host firewall access to SSH (and optionally a management port) so no other services are reachable from the network. Use default-deny inbound and allow only TCP 22 (or your SSH port); allow established/related for outbound. Use this as a baseline for minimal exposure before adding other services.
- NAT explained
NAT (Network Address Translation) rewrites source or destination IP and port so many hosts can share one public IP or so internal addresses are hidden. Learn how outbound NAT (SNAT/NAPT) and port forwarding (DNAT) work so you can debug connectivity and configure routers. Use this when traffic from or to a private network fails.
- nftables explained conceptually
nftables is the modern Linux packet filter: tables, chains, and rules in a unified syntax. It replaces iptables and can translate iptables rules. Learn the model (table, chain, rule) and how to read nft list ruleset so you can verify rules and migrate. Use this when configuring or debugging nftables or when moving from iptables.
- Network troubleshooting checklist
When connectivity fails, work through this order: link and IP, default route, DNS, service listening and firewall, then application. Use ping, ip route, getent/dig, ss and firewall rules to isolate whether the problem is local, path, or remote. References routing, DNS, ports, firewall, and fix guides so you do not guess at the wrong layer.
- Public vs private networks
Public IPs are globally routable on the internet; private IPs (RFC 1918) are for use inside a network and are not routed on the public internet. Learn the private ranges and why NAT is used so you can configure and debug connectivity between internal and external hosts. Use this when planning addressing or fixing 'no route' across the internet.
- Ports explained (1–65535, well-known vs ephemeral)
Ports are 16-bit numbers that identify which service or application gets traffic on a host. Learn well-known (0–1023), registered, and ephemeral (dynamic) ranges so you can open the right port, debug 'connection refused,' and understand listen vs connect. Use this when configuring firewalls or services.
- How to restart networking safely
Restart networking on Linux without losing SSH by using systemd to restart the networking service or NetworkManager, or by bringing interfaces down and up. Prefer restarting only the affected service; have console or out-of-band access in case the session drops. Use this after changing IP, DNS, or routes so the new config is applied.
- Reverse SSH tunnels explained
A reverse SSH tunnel lets a host behind NAT or a firewall initiate an SSH connection to a reachable server and expose a port on that server that forwards back to the host. Use when the target host cannot accept inbound connections (e.g. no public IP, firewall blocks). Conceptual only; server must allow remote forwards.
- Routing basics
Routing is how a host or router decides where to send a packet: it consults a routing table (destination prefix and next-hop or interface) and forwards the packet. Learn how default route and longest prefix match work so you can fix 'no route to host' and configure gateways. Use this when a host cannot reach certain networks.
- Server networking checklist before go-live
Before putting a server in production, verify IP and DNS, routing, firewall, and SSH: static or DHCP, resolv.conf or resolved, default route, listening ports, firewall allow SSH and required services only, and a second session test. Use this so you do not ship with wrong IP, open ports, or locked-out SSH. References multiple networking-basics guides.
- Site-to-site vs client VPNs
Site-to-site VPNs connect two networks (e.g. office to cloud); client VPNs connect a single device to a network (e.g. laptop to corporate LAN). Choose site-to-site for always-on network links and client for remote users. Use this when designing or choosing VPN topology. Conceptual only; no vendor config.
- SSH tunnels explained
An SSH tunnel forwards a local or remote port over the SSH connection so you can reach a service as if it were on your machine or expose a local service through the SSH server. Use for one-off secure access to a single port (e.g. DB, admin UI) without a full VPN. Conceptual only; assumes you can SSH to the target host.
- Subnets explained simply
A subnet is a logical subdivision of an IP network defined by a prefix (CIDR). Learn how subnetting splits address space, why you use it for organization and routing, and how to derive the range and size from the prefix. Use this when planning or changing network layout.
- How to test DNS resolution
Verify that hostnames resolve to the expected IPs using getent, dig, or nslookup. Use this when debugging 'can't reach host,' after changing resolvers or DNS records, or to confirm split-horizon or override. Test both the name and the reverse (IP to name) if needed.
- TCP vs UDP and when to use each
TCP is connection-oriented and reliable; UDP is connectionless and low-overhead. Learn the trade-offs so you can choose the right transport, debug timeouts vs packet loss, and configure firewalls and services correctly. Use this when designing or troubleshooting protocols and ports.
- How to troubleshoot SSH connectivity
When SSH connection fails: check that the client can reach the host (ping, port open), that sshd is listening on the expected port, and that authentication succeeds (key or password). Use this to isolate network vs service vs auth failures and to fix 'connection refused,' 'no route to host,' or 'permission denied.'
- How to verify firewall rules
Confirm that the firewall is active and that rules match your intent: list rules (ufw status, nft list ruleset, iptables -L), test connectivity from a client (nc, curl, telnet), and compare with what should be allowed or denied. Use this after changing rules or before go-live so you do not ship with wrong or missing rules.
- What DNS actually does
DNS maps hostnames to IP addresses and back. Learn how queries and responses work, what resolvers and authoritative servers do, and when DNS is the cause of 'can't reach host' or slow connections. Use this before debugging connectivity or configuring resolvers.
- What a firewall actually does
A firewall filters packets by rules (source, destination, port, protocol) and allows or drops them. It can be on the host (host firewall) or on a network device. Learn how allow and deny work, stateful vs stateless, and where to place rules so you can configure and debug access. Use this before opening or closing ports.
- What a VPN actually is
A VPN is an encrypted tunnel between endpoints so traffic appears to come from the tunnel exit. It provides confidentiality, integrity, and often a different IP or network path. Use when you need private access to a remote network or to carry traffic over untrusted networks; do not rely on it alone for anonymity. This guide is conceptual only; no vendor config.
- When you do NOT need a VPN
A VPN is not always the answer. Skip it when traffic is already encrypted (e.g. HTTPS to a public API), when the problem is DNS or firewall policy, when you only need a single port (SSH tunnel may suffice), or when you are on a trusted network. Use this to avoid over-engineering and to pick the right tool. Conceptual only.
- WireGuard concepts (no vendor config)
WireGuard is a simple, fast VPN: each peer has a key pair and an allowed-IPs list that acts as both routing and ACL. Tunnels are stateless; crypto is modern. Use this to understand the model (peer, allowed-IPs, no central server required) before deploying. Conceptual only; no step-by-step vendor or distro-specific config.
- How to add an APT or DNF package repository
Add a third-party or vendor repository to APT (sources.list or .list in sources.list.d) or DNF (.repo in /etc/yum.repos.d) so you can install packages from it. Use this when you need software not in the default distro repos, and to keep repo config auditable and reversible.
- How to set environment variables persistently
Set environment variables for a user in ~/.profile, ~/.bashrc, or ~/.bash_profile, and for a systemd service in the unit file or EnvironmentFile. Use this when an app or script needs VAR=value across logins or at service start.
- How to check disk I/O and identify heavy usage
Use iostat, iotop, and /proc/diskstats to see disk throughput and which processes are doing the most I/O. Use this when the system is slow and CPU and memory look fine, or when tuning storage or diagnosing latency.
- How to check memory usage
Use free, /proc/meminfo, and top or smem to see total RAM, used, free, buffers, cache, and swap. Identify processes by RSS or VSZ. Use this when diagnosing high memory use, before adding swap, or when tuning application memory limits.
- How to set and check file descriptor limits
Check and raise the open file limit (nofile) for the shell with ulimit, and for a systemd service or system-wide with limits.conf or the unit file. Use this when you see 'too many open files' or when running high-connection services.
- How to find large files and free space
Use du, find, and ncdu to locate the largest files and directories so you can free disk space or move data. Use this when a filesystem is full or you need to identify what is consuming space before cleanup or resizing.
- How to add swap and size it
Create a swap file or partition, enable it with swapon, and add it to fstab so it survives reboot. Size swap based on workload: often equal to RAM for small servers, or less when you have plenty of RAM. Use this when you see OOM or need to reduce memory pressure.
- How to change the default boot target
Set the default systemd target (multi-user vs graphical) with systemctl set-default so the system boots to the desired run level. Use this when converting a desktop install to headless, or to force graphical or console-only boot.
- How to read and filter journald logs
Use journalctl to view systemd journal logs by unit, time, priority, or boot. Follow logs in real time, filter by service name, and export for debugging. Use this when diagnosing service failures, boot issues, or security events.
- When and how to reboot safely
Schedule a reboot during a maintenance window; notify users, stop or drain services if needed, run reboot or shutdown -r, and verify the system and services after boot. Use this when applying kernel or critical updates, or recovering from a hung state.
- How to check and interpret SELinux status
Check whether SELinux is enabled and in enforcing or permissive mode with getenforce and sestatus; read denials in the audit log or ausearch. Use this when access is denied and permissions look correct, or when hardening or debugging a RHEL/CentOS/Fedora system.
- How to set and change the hostname
Set the system hostname persistently with hostnamectl or by editing /etc/hostname and /etc/hosts. Use this when provisioning a server, cloning a VM, or fixing duplicate hostnames so the machine has a unique, predictable name for logs and SSH.
- How to set timezone and sync time with NTP
Set the system timezone with timedatectl or a symlink in /usr/share/zoneinfo, and enable NTP (systemd-timesyncd or chrony) so the clock stays correct. Use this when deploying a server, fixing certificate or log ordering issues, or after restoring a snapshot.
- How to use SSH config and key agent
Configure SSH client with ~/.ssh/config for hosts, keys, and options; use ssh-agent to hold keys so you do not type passphrases repeatedly. Use this to simplify SSH and SCP to servers and to avoid exposing keys to every command.
- How to troubleshoot boot failure
When Linux fails to boot, use rescue or recovery mode from the installer or a live image to mount the root filesystem, fix fstab or initramfs, and repair broken config. Use this when the system hangs at boot, drops to emergency mode, or fails to mount root.
- Cost and blast radius control in AWS
Limit cost overruns and blast radius with billing alerts, quotas, and organizational boundaries. Use billing alarms, service quotas, and separate accounts or OUs for prod vs non-prod. Use this when designing multi-account or when preventing runaway cost or impact.
- EBS basics (volumes, types, attach to EC2)
EBS provides block storage for EC2 instances. Create a volume in an AZ, attach it to an instance in the same AZ, and mount it inside the OS. Use this when you need persistent disk for an instance or when sizing or changing the root or data volume.
- How to connect to EC2 via SSH
Connect to a Linux EC2 instance using the key pair you chose at launch. Set permissions on the private key, use the correct user name for the AMI, and fix security group or network if connection fails. Use this when you cannot SSH to a new or existing instance.
- EC2 instance types and when to use them
EC2 instance types (t3, m5, c5, r5, etc.) offer different CPU, memory, and storage profiles. Choose by workload: general purpose, compute-optimized, memory-optimized, or storage-optimized. Use this when sizing a new instance or right-sizing for cost.
- How to launch an EC2 instance
Launch an Amazon EC2 instance from the console or CLI: choose AMI, instance type, key pair, and network. Use this when you need a new Linux or Windows server in AWS and want to get it running with the right size and access (SSH or RDP key).
- How to block S3 public access
Keep S3 buckets private by enabling Block Public Access at the account and bucket level. Prevents accidental public read or write from bucket policy or ACLs. Use this when creating or auditing S3 buckets so data is not exposed to the internet.
- S3 bucket basics (create, configure, access)
Create an S3 bucket in a region; set bucket policy and block public access; upload and download objects. Use this when you need object storage for backups, static assets, or data lake and want to do it securely with the right permissions.
- S3 encryption (server-side and keys)
Enable server-side encryption (SSE) for S3 so objects are encrypted at rest. Use SSE-S3 (AWS-managed keys) or SSE-KMS (customer or AWS KMS key). Use this when storing sensitive data in S3 and when you need to meet encryption compliance.
- Security groups basics (EC2 and VPC)
Security groups are stateful firewalls for EC2 instances and other VPC resources. Rules allow inbound and outbound by port, protocol, and source/destination. Use this when you cannot reach an instance or when locking down access to a service.
- VPC basics (what it is and why it matters)
A VPC is your isolated network in AWS: you control IP ranges, subnets, route tables, and gateways. Use it to place instances in public or private subnets and to control inbound and outbound traffic. Use this when designing or troubleshooting EC2 networking.
- How to set up public and private subnets in a VPC
Create subnets with the right route tables so some are public (route to Internet Gateway) and others private (route to NAT for outbound only). Place load balancers and bastions in public subnets; app and DB in private. Use this when building a layered network in AWS.
- Backup automation basics
Automate backup jobs with cron, systemd timers, or cloud schedulers so backups run on a schedule. Use scripts or managed services; alert on failure; verify restores periodically. Use this when moving from manual backups to reliable automated runs.
- Disaster recovery basics
Define RTO and RPO; choose a DR strategy (backup and restore, pilot light, warm standby, or multi-site). Use backups and runbooks to recover from total loss of a system or site. Use this when planning DR or when explaining options to stakeholders.
- Ransomware response (backup and restore)
When ransomware encrypts or destroys data, isolate affected systems, determine scope, and restore from a backup that is known to be clean and immutable. Do not pay the ransom without legal and executive decision; focus on recovery from backups. Use this when building a ransomware response plan or during an incident.
- Backups vs snapshots (when to use which)
Snapshots are point-in-time copies of a volume or disk, often in the same system or cloud; backups are copies stored separately, often with retention and restore verification. Use both for different recovery scenarios. Use this when designing backup strategy or explaining the difference to stakeholders.
- How to verify backups with restore tests
A backup is only useful if restore works. Run periodic restore tests: restore to a test environment, verify data and application integrity, and document the process. Use this when setting up a backup schedule or when improving recovery confidence.
- Network debugging methodology
When connectivity fails, work through layers in order: link and IP, routing, DNS, then service and firewall. Use ping, ip route, getent/dig, and ss/firewall rules to isolate the failure so you fix the right layer instead of guessing. Use this as the standard order for any network troubleshooting.
- DNS debugging methodology
When resolution fails, isolate the problem: check local resolver config, then query a specific server, then check network path to the server. Use dig, getent, and nslookup in a consistent order so you know whether the issue is config, server, or network. Use this when name resolution fails and you need to find the cause quickly.
- MTU and fragmentation explained
MTU is the maximum size of a packet on a link; larger packets may be fragmented or dropped. Use this when you see connectivity that works for small packets but fails for large (e.g. large uploads or specific sites) or when tuning performance and path MTU.
- Packet flow basics (how traffic moves through a host)
Understand how packets are processed: interface, routing, firewall (input/output/forward), and application. Use this when debugging why traffic is dropped or a service is not reachable so you check the right layer.
- traceroute explained (how it works and how to use it)
traceroute shows the path packets take to a destination by using TTL expiry or ICMP/UDP probes. Use it to see where a path fails or which hop adds latency. Use this when debugging connectivity or path issues and when you need to interpret traceroute output.
- Backup security considerations
Backups contain the same sensitive data as production; protect them with access control, encryption, and integrity checks. Ensure backups are not writable by the same threat that could corrupt production. Use this when designing or auditing backup and restore.
- Encryption at rest vs in transit
Data in transit is encrypted between client and server (e.g. TLS); data at rest is encrypted on disk or in storage. Both are needed for full protection. Use this when designing or auditing where encryption is required.
- Encryption key management basics
Encryption keys must be stored and used securely: separate from the data, access-controlled, and rotated per policy. Use a KMS or vault; avoid storing keys in config or code. Use this when enabling encryption or designing key lifecycle.
- File permissions explained (Linux)
Linux file permissions are read, write, execute for owner, group, and others. Use chmod and chown to set them; restrict sensitive files (keys, config) to owner-only read. Use this when fixing permission denied or securing config and keys.
- How to audit who did what
Use logs and audit trails to see who accessed what and when. Enable audit logging for auth, admin actions, and data access; centralize and protect logs so they survive and are usable after an incident. Use this when investigating an incident or building auditability.
- How to check for suspicious logins
Review auth logs for failed logins, logins from unexpected IPs or times, and new device or location. Use centralized logs and alerts to detect takeover or abuse. Use this when investigating a reported incident or building detection.
- How to revoke access quickly
When someone leaves or an account is compromised, revoke all access immediately: disable or delete the account, revoke tokens and keys, remove from groups and roles, and invalidate sessions. Use this during offboarding or incident response.
- How to rotate secrets safely
Rotate passwords, API keys, and certificates on a schedule or after a suspected leak without causing outages. Add the new secret first, update consumers, then revoke the old one. Use this when implementing rotation or responding to a compromise.
- How to secure API keys
Store API keys in a secret manager or env; never in code or public config. Use short-lived tokens where the API supports it; scope keys to the minimum permissions; rotate and revoke when leaked or when no longer needed. Use this when adding or hardening API access.
- Incident response basics
When a security incident occurs, contain impact, preserve evidence, eradicate the cause, and recover. Have a plan and roles defined in advance; use runbooks for common scenarios. Use this when building or executing an incident response process.
- Least privilege explained
Least privilege means granting only the minimum permissions needed for a task. Apply it to users, services, and roles to limit blast radius when an account is compromised. Use this when designing roles or reviewing who has access to what.
- Logging for security
Log auth events, admin actions, and access to sensitive data so you can detect and investigate incidents. Send logs to a central, protected store; retain per policy and alert on high-risk patterns. Use this when designing or improving security visibility.
- Password policy basics
Set minimum length, complexity, and expiry for passwords where they are still used; prefer MFA and passwordless where possible. Use this when configuring IdP or application password rules so users cannot choose weak or reused passwords.
- Passwords vs keys vs tokens (when to use which)
Choose the right credential type for each use case: passwords for human login, keys for SSH and automation, tokens for APIs and short-lived access. Use this when designing auth for a service or when replacing passwords with stronger methods.
- Secure defaults checklist
Use this checklist when deploying a new system or reviewing an existing one: strong auth, least privilege, encryption, logging, and no unnecessary exposure. Covers auth, secrets, permissions, network, and backup security in one pass.
- Security incident checklist
When a security incident is declared, use this checklist: contain impact, preserve evidence, notify, eradicate, recover, and document. Ensures nothing is missed during a high-stress response. Use this in parallel with incident response basics.
- SSH key security
Protect SSH private keys with passphrases and correct file permissions; use one key per purpose or environment; rotate and revoke keys when people leave or keys are compromised. Use this when hardening SSH access or responding to a key leak.
- Understanding security headers (HTTP)
HTTP security headers tell browsers how to behave: HSTS, CSP, X-Frame-Options, and others reduce clickjacking, XSS, and protocol downgrade. Use this when hardening a web application or API so you know which headers to set and what they do.
- What are secrets and where to store them
Secrets are credentials (passwords, keys, tokens) that must stay confidential. Store them in a dedicated secret manager or vault, not in code or config files in repo. Use this when adding a new service or moving secrets out of config.
- What is authentication and why it matters
Authentication is verifying identity (who you are) before granting access. Learn the difference between authentication and authorization, and why strong auth (passwords, keys, MFA) is the first line of defense. Use this when designing access or explaining auth to others.
- What is authorization and how it works
Authorization is deciding what an authenticated identity can do (read, write, delete, admin). Learn role-based and attribute-based models, least privilege, and how to apply authorization after authentication. Use this when designing permissions or debugging access denied.
- What is TLS and when to use it
TLS (Transport Layer Security) encrypts and authenticates traffic between client and server. Use it for all HTTP, APIs, mail, and database connections that carry sensitive data. Use this when enabling HTTPS or securing any network service.
- When to use MFA (multi-factor authentication)
Use MFA wherever a single stolen password or key could cause serious harm: admin accounts, production access, and sensitive data. Learn what counts as a second factor and how to enforce it. Use this when deciding where to require MFA.
- EC2 Auto Scaling basics
Use an Auto Scaling group (ASG) to maintain a desired number of instances; scale on demand or on a schedule. Attach to a load balancer target group for traffic distribution. Use this when you need high availability or when scaling instance count based on load or schedule.
- Backups in AWS (EBS snapshots and AMIs)
Back up EBS volumes with snapshots (incremental, stored in S3 by AWS). Create snapshots manually or with Data Lifecycle Manager (DLM). AMIs include root volume snapshots. Use this when implementing backup strategy for EC2 or when automating snapshot creation and retention.
- EC2 AMI basics
An AMI is a template for an EC2 instance. Use a public AMI (Amazon Linux, Ubuntu) or create your own from an instance for consistent deployments. AMI IDs are region-specific. Use this when launching instances or building a custom AMI.
- Load balancer basics (ALB and NLB)
Use an Application Load Balancer (ALB) or Network Load Balancer (NLB) to distribute traffic to EC2 or other targets. ALB is layer 7 (HTTP/HTTPS); NLB is layer 4 (TCP/UDP). Use this when exposing a multi-instance service or when you need TLS termination or path-based routing.
- Security groups vs NACLs (when to use which)
Security groups are stateful and apply to instances; NACLs are stateless and apply at the subnet level. Use security groups for most rules; add NACLs for subnet-level allow/deny or when you need rule numbers for order. Use this when designing VPC network security.
- Database backup verification checklist
Use this checklist to verify database backups are configured, running, and restorable. Covers backup schedule, retention, restore test, and access control. Run periodically and before major changes or go-live.
- Database connection pooling basics
Use a connection pool (PgBouncer, ProxySQL, or application-level) so many application threads share a smaller number of database connections. Reduces connection churn and stays under max_connections. Use this when you have many app instances or high concurrency and hit connection limits.
- Database disaster recovery basics
Define RPO and RTO for databases; use backups and optionally replication to meet them. Restore from backup or fail over to a replica; document and test the procedure. Use this when planning or executing database recovery after a failure or data loss.
- Database migrations basics (schema changes safely)
Apply schema and data changes in versioned, reversible steps using migration scripts or tools (e.g. Flyway, Liquibase, or custom SQL). Test on staging first; backup before production migration. Use this when introducing or evolving schema in a way that is auditable and rollback-safe.
- MySQL and MariaDB backup basics (mysqldump and physical)
Back up MySQL or MariaDB with mysqldump for logical backups or use filesystem snapshots with FLUSH TABLES WITH READ LOCK for consistent physical backup. Use this when setting up backup jobs or when you need to restore a database.
- How to create a MySQL or MariaDB user and grant permissions
Create a MySQL user with CREATE USER and grant privileges with GRANT on databases, tables, or global. Restrict by host (e.g. 'app'@'10.0.0.%'). Use this when onboarding an application or implementing least privilege access to MySQL or MariaDB.
- How to install MySQL or MariaDB on Linux
Install MySQL 8 or MariaDB on Debian, Ubuntu, or RHEL using the official or distro repository. Secure the installation with mysql_secure_installation; create a database and user. Use this when setting up a new database server or when you need MySQL-compatible storage.
- How to restore MySQL or MariaDB from backup
Restore a MySQL or MariaDB database from a mysqldump file using the mysql client. Create the database first if needed. Use this when recovering from a failure or when cloning a database from a logical backup.
- PostgreSQL backup basics (pg_dump and pg_basebackup)
Back up PostgreSQL with pg_dump for logical backups (single DB or full cluster) or pg_basebackup for physical backups. Use WAL archiving for point-in-time recovery. Use this when setting up backup jobs or when you need to restore a database or cluster.
- PostgreSQL config basics (postgresql.conf and pg_hba.conf)
Configure PostgreSQL via postgresql.conf (memory, connections, WAL) and pg_hba.conf (who can connect and how). Reload or restart after changes. Use this when tuning performance, enabling remote access, or locking down authentication.
- How to create a PostgreSQL user and grant permissions
Create a PostgreSQL role (user) with CREATE USER or CREATE ROLE; grant privileges with GRANT on databases, schemas, and tables. Use LOGIN for interactive users and NOLOGIN for app roles. Use this when onboarding a new user or service account or when implementing least privilege.
- How to install PostgreSQL on Linux
Install PostgreSQL on Debian, Ubuntu, or RHEL using the official or distro package repository. Configure the data directory, init the cluster, and start the server so you can create databases and users. Use this when setting up a new database server or when you need a specific PostgreSQL version.
- PostgreSQL replication basics (streaming replica)
Set up a streaming replica for high availability or read scaling. Configure the primary for replication (wal_level, pg_hba); use pg_basebackup to clone; add primary_conninfo and recovery target on the replica. Use this when you need a standby or read replica.
- How to restore a PostgreSQL database
Restore from a pg_dump custom-format file with pg_restore, or from a plain SQL dump with psql. Drop or create the target database first if replacing; restore globals with psql if you have pg_dumpall output. Use this when recovering from a failure or when cloning a database.
- PostgreSQL slow query basics
Find and fix slow PostgreSQL queries using log_min_duration_statement and pg_stat_statements. Use EXPLAIN ANALYZE to see the plan and add indexes or rewrite queries. Use this when the database is slow or when optimizing after enabling query logging.
- PostgreSQL tuning basics (memory, connections, checkpoints)
Tune PostgreSQL for your workload by setting shared_buffers, work_mem, effective_cache_size, and checkpoint-related parameters. Use EXPLAIN ANALYZE to find slow queries; add indexes as needed. Use this when the database is slow or when sizing a new server.
- Pre-production database checklist
Use this checklist before putting a database into production: backups, users, security, config, monitoring, and restore test. Ensures nothing is missed and the database is ready for production load and recovery.
- Database security basics
Harden database access: least-privilege users, network restriction, encryption in transit and at rest, and audit logging. Use this when deploying a new database or when reviewing security for an existing PostgreSQL or MySQL instance.
- Docker in CI (build and push images)
In CI, build Docker images with docker build, tag with registry and version, and push with docker push. Use a registry (Docker Hub, ECR, GCR) and authenticate with a token or role. Use this when automating image builds in a pipeline.
- Docker Compose basics (multi-container stack)
Define a multi-container stack in a compose file (docker-compose.yml): services, networks, volumes. Run with docker compose up -d; manage with docker compose down and docker compose logs. Use this when running an app with a database, cache, or multiple services on one host.
- How to debug a Docker container
Inspect a running or exited container with docker logs, docker exec, and docker inspect. Check exit code, environment, and resource usage. Use this when a container fails to start, exits unexpectedly, or when you need to see what is running inside.
- Dockerfile basics (build an image)
Write a Dockerfile with FROM, RUN, COPY, and CMD to build a container image. Use multi-stage builds to keep the final image small. Use this when creating a custom image for your application or when optimizing build time and image size.
- Docker image and container cleanup
Remove unused images, containers, volumes, and networks with docker prune. Free disk space and avoid accumulation of dangling images and stopped containers. Use this when the Docker disk usage is high or when you want to keep the host clean.
- How to install Docker on Linux
Install Docker Engine on Debian, Ubuntu, or RHEL using the official Docker repository. Add your user to the docker group so you can run containers without root. Use this when setting up a host for containers or when you need a specific Docker version.
- Docker networking basics
Containers can use the default bridge, a user-defined bridge, or the host network. Use bridge networks so containers resolve each other by name; publish ports with -p to expose services to the host. Use this when connecting containers or when debugging connectivity between containers and host.
- Docker pre-production checklist
Use this checklist before running containers in production: image source and scan, non-root and read-only, resource limits, secrets handling, logging, and health checks. Ensures containers are built and run in a production-ready way.
- How to run a Docker container
Run a container with docker run: specify image, command, ports, volumes, and env. Use -d for detached, -p to publish ports, -v for volumes, -e for env vars. Use this when starting a single container or when testing an image before composing a stack.
- Docker security basics
Run containers as non-root when possible; use read-only root filesystem and drop capabilities; scan images for vulnerabilities; keep the host and Docker updated. Use this when hardening containerized workloads or when reviewing container security.
- Docker volumes and bind mounts
Persist container data with volumes (managed by Docker) or bind mounts (host path). Use named volumes for database data; use bind mounts for config or source code in dev. Use this when you need data to survive container removal or when mounting host files into a container.
- Capacity planning basics
Use historical metrics and growth trends to plan for future capacity: when will disk, CPU, or memory be exhausted? Use this when sizing new systems or when deciding when to scale or upgrade to avoid running out of resources.
- How to set up disk, CPU, and memory alerts
Define alert rules for disk space, CPU usage, and memory (or swap) so you are notified before outages. Use thresholds and hysteresis to avoid flapping. Use this when configuring a monitoring system (e.g. Prometheus and Alertmanager, or cloud monitoring).
- Incident triage (when an alert fires)
When an alert fires, triage quickly: confirm the alert is real, identify scope and impact, and start the right runbook or escalation. Use this as the standard process for handling monitoring alerts and reducing MTTR.
- Logs and journald for monitoring
Use journald (journalctl) to query and forward logs; use log aggregation to centralize logs from multiple hosts for search and alerting. Use this when setting up log-based monitoring or when correlating events across services.
- Monitoring checklist (before go-live)
Use this checklist before putting a system into production: metrics collected, key alerts defined, logs centralized, health checks in place, runbooks written, and on-call knows how to respond. Ensures you can detect and respond to incidents.
- System metrics basics (CPU, memory, disk)
Collect and interpret basic system metrics: CPU usage, memory (used, available, swap), and disk usage. Use top, free, df, and similar tools or an agent (e.g. Node Exporter) for monitoring. Use this when setting up monitoring or when diagnosing resource-related issues.
- Uptime and health checks
Monitor service availability with HTTP, TCP, or script-based checks from one or more locations. Use this when you need to know when a service is down or degraded and to measure uptime and response time.
- Amazon CloudWatch basics
CloudWatch provides metrics, logs, and alarms in AWS. EC2 and many services send metrics automatically. Use for monitoring and alerting on AWS resources.
- AWS cost controls basics
Control AWS spend with budgets, alerts, and tags. Set a budget and get alerts at thresholds. Use tags for allocation and cleanup. Use when you want to avoid surprise bills or allocate cost.
- EC2 instance lifecycle basics
EC2 instances move through pending, running, stopping, stopped, terminated. Use stop to save cost without losing EBS; terminate to delete. Use this when managing instance state and cost.
- EC2 placement groups
Placement groups control how instances are placed: cluster for low latency, spread for isolation, partition for large distributed apps. Use when you need low latency or fault isolation.
- EC2 stop and start
Stop an EC2 instance to save cost; EBS is kept. Start again; public IP usually changes unless Elastic IP. Use when saving cost or pausing workloads.
- Elastic IP basics
Elastic IP is a static public IP you attach to an EC2 instance. Survives stop and start. You are charged if allocated but not attached. Use when you need a fixed public IP.
- IAM hardening follow-up
After basic IAM setup, reduce risk with permission boundaries, SCPs, and regular audit. Use when you want to tighten IAM beyond least privilege and MFA.
- Amazon RDS basics
RDS is managed relational database (PostgreSQL, MySQL, MariaDB, etc). Create a DB instance in a VPC; connect with endpoint. Use when you want managed DB without handling backups and patches yourself.
- S3 lifecycle rules basics
Use S3 lifecycle rules to transition objects to cheaper storage classes or expire them. Reduces cost for old or temporary data. Use when you have buckets with objects that age or are temporary.
- VPC Flow Logs
Enable VPC Flow Logs to capture accepted and rejected traffic at ENI or subnet level. Send to CloudWatch Logs or S3. Use for security and network troubleshooting.
- Database backup retention policy
Define how long to keep database backups based on RTO, RPO, and compliance. Use full plus incremental or differential; test restore regularly. Use this when setting or reviewing backup retention.
- Secure database connection strings
Store database connection strings in environment variables or a secrets manager; never commit them to source control. Use least-privilege users and SSL. Use this when deploying apps that connect to a database.
- MySQL or MariaDB config basics
MySQL and MariaDB use my.cnf or my.ini for configuration. Key settings include datadir, port, bind-address, max_connections, and buffer pool. Use this when tuning or securing the server.
- MySQL or MariaDB replication basics
Set up MySQL replication with binary logging on the primary and CHANGE REPLICATION SOURCE on the replica. Use for read scaling or HA. Use this when you need a replica for reads or failover.
- MySQL or MariaDB slow query basics
Enable slow query log and use EXPLAIN to fix slow MySQL queries. Add indexes on WHERE and JOIN columns. Use this when the database is slow.
- MySQL or MariaDB SSL/TLS basics
Enable SSL for MySQL or MariaDB with ssl_cert, ssl_key, and ssl_ca in my.cnf. Require SSL for users with REQUIRE SSL. Use this for encrypted connections and compliance.
- MySQL or MariaDB tuning basics
Tune MySQL or MariaDB with innodb_buffer_pool_size, query cache (if available), connection limits, and slow query log. Use this when improving performance or after measuring bottlenecks.
- PostgreSQL logging basics
Configure PostgreSQL logging with log_destination, log_directory, and log_filename. Use log_min_duration_statement for slow queries and log_connections for audit. Use this when debugging or meeting audit requirements.
- PostgreSQL SSL/TLS basics
Enable SSL for PostgreSQL with ssl = on and server cert and key in postgresql.conf. Clients use sslmode=require or verify-full. Use this for encrypted connections and compliance.
- PostgreSQL upgrade basics
Upgrade PostgreSQL by using pg_dump and pg_restore (logical) or pg_upgrade (in-place). Logical is safer and works across major versions; pg_upgrade is faster but same major. Use this when moving to a new major version.
- PostgreSQL VACUUM basics
VACUUM reclaims dead tuple space and updates visibility for the planner. Run VACUUM ANALYZE after bulk changes. Rely on autovacuum; tune if needed. Use this when you see bloat or stale stats.
- When to use read replicas
Use read replicas when read load exceeds primary capacity or you need geographic distribution. Replicas add lag and eventual consistency. Use this when deciding whether to add replication for scaling or HA.
- Backup Docker volumes
Back up a volume with a temp container that mounts the volume and tars to backup location. Restore by mounting volume and extracting. Use when you need to preserve volume data.
- Docker build args
Use ARG in Dockerfile to pass build-time variables; set with --build-arg. Use for version pins or build variants. Do not use ARG for runtime secrets. Use this when you need parameterized builds.
- Docker build cache and layers
Docker caches layers; change one line and everything after rebuilds. Put rarely changed steps first and frequently changed steps last. Use this when optimizing build speed.
- Docker Compose networks
Compose creates a default network so services resolve by name. Define custom networks for isolation. Use when you need service discovery or isolation.
- Docker Compose for production
Use Compose in production with limits, restarts, health checks, and secrets. Prefer orchestrators at scale. Use when running a small set of services on one or few hosts.
- Docker Compose scaling
Scale a Compose service with docker compose up -d --scale app=3. Use for dev or load testing. For production use an orchestrator. Use when you need multiple replicas.
- Docker .dockerignore basics
Add a .dockerignore file next to Dockerfile to exclude files from build context. Speeds build and avoids leaking secrets. Use when build context is large or you want to exclude git or local files.
- Docker env vars and secrets
Pass config with -e or env_file in compose; use Docker secrets or a secrets manager for sensitive data. Never bake secrets into images. Use this when configuring containers or handling secrets.
- docker exec vs attach
docker exec runs a new command in a running container. docker attach attaches to the main process stdin/stdout. Use exec for debugging or one-off commands; avoid attach for long-running or interactive processes. Use this when you need to run a command inside a container.
- Docker HEALTHCHECK
Add HEALTHCHECK to Dockerfile so Docker reports container health. Use a command that exits 0 when healthy. Use when orchestration or load balancers need health status.
- Docker logging drivers
Configure how container stdout/stderr are handled with --log-driver. Default is json-file. Use json-file with max-size and max-file to limit disk. Use this when managing container log growth or forwarding logs.
- Docker multi-stage builds
Use multiple FROM stages to build in one stage and copy artifacts into a smaller final image. Reduces size and keeps build tools out of production. Use when build needs compilers not needed at runtime.
- Inspect Docker networks
List and inspect Docker networks with docker network ls and docker network inspect. See which containers are on a network and their IPs. Use when debugging connectivity.
- Private Docker registry
Run a private Docker registry with the official registry image. Push and pull with docker tag and docker push. Use when you need to store images privately or in CI.
- Docker resource limits
Limit container CPU and memory with --cpus and --memory. Prevents one container from starving others. Set in docker run or compose. Use when running multiple containers on one host.
- Docker restart policies
Set container restart policy with --restart (no, on-failure, always, unless-stopped). Use always or unless-stopped for long-running services so they come back after reboot or crash. Use this when you want containers to restart automatically.
- Scan Docker images for vulnerabilities
Use docker scan or a registry scanner to find known vulnerabilities in image layers. Fix by updating base image and dependencies. Use before deploying to production.
- Docker image tagging and versioning
Tag images with meaningful versions: myapp:1.0.0 or myapp:latest. Use semantic versions for releases; avoid relying only on latest in production. Use this when publishing or deploying images.
- Troubleshoot Docker build failures
When docker build fails check the failing step, cache, and context. Use --no-cache to rule out cache. Check Dockerfile syntax and paths. Use when a build fails or is slow.
- Alerting basics
Define alerts on metrics or log patterns; route to on-call or ticketing. Use clear thresholds and runbooks. Use when you need to be notified of failures or anomalies.
- APM and tracing basics
Application Performance Monitoring and distributed tracing show request flow and latency across services. Use when you need to find slow or failed requests across a distributed system.
- Baseline and anomaly detection
Detect anomalies by comparing current metrics to baseline or using ML. Alert on unusual behavior. Use when threshold-based alerts miss subtle issues.
- Monitoring cost optimization
Reduce monitoring cost by trimming cardinality, retention, and sampling. Keep what you need for alerts and debugging. Use when monitoring cost is high.
- Monitoring dashboards basics
Build dashboards with key metrics per service or host. Use for ops and incident response. Keep panels focused and avoid clutter. Use when you need a single view of health and metrics.
- Error budget and burn rate
Error budget is 1 minus SLO. Burn rate is how fast you consume it. Alert on high burn rate to prevent exhausting budget. Use when you have SLOs and want to alert before breach.
- Grafana basics
Grafana connects to Prometheus and other data sources. Build dashboards and panels. Use for visualization and exploration of metrics and logs.
- Incident response flow
When an alert fires: acknowledge, assess impact, mitigate or fix, communicate, and write postmortem. Use when defining how to respond to incidents.
- Log aggregation basics
Collect logs from many hosts or containers into one system. Search and alert on patterns. Use when you need central search and retention for logs.
- logrotate configuration
Configure logrotate to rotate application or system logs by size or date. Prevents disk full. Use when logs grow without bound.
- Metrics retention and storage
Set retention for metrics based on storage and query needs. Long retention uses more storage; downsample or archive for cost. Use when configuring or scaling a metrics system.
- On-call basics
Set up on-call rotation and escalation. Route alerts to primary and secondary. Use when you need someone to respond to incidents 24/7 or during business hours.
- Notification and alert routing
Route alerts to the right people or channels by service, severity, or time. Use routing rules and escalation. Use when you have multiple teams or services.
- Postmortem basics
Write a postmortem after significant incidents. Include timeline, root cause, impact, and actions. Blameless culture. Use when you need to learn from outages and prevent recurrence.
- Pre-incident monitoring checklist
Checklist before going live: metrics, alerts, runbooks, on-call, and dashboards. Use when preparing a new service or before a launch.
- Prometheus basics
Prometheus scrapes metrics from targets on an interval. Store time series; query with PromQL. Use for metrics and alerting in many environments.
- RED and USE metrics
RED for services: Rate, Errors, Duration. USE for resources: Utilization, Saturation, Errors. Use these to choose what to measure and alert on.
- Runbook basics
Write runbooks for alerts and common operations. Include steps, commands, and escalation. Keep them updated. Use when you need consistent response to incidents.
- SLO basics
Define Service Level Objectives as target availability or latency. Use for alerting and capacity. Example: 99.9 percent uptime or p99 under 500ms. Use when you need to formalize reliability targets.
- Uptime and availability monitoring
Monitor endpoint availability from external or synthetic checks. Use HTTP or TCP checks from multiple regions. Use when you need to know if users can reach your service.
- How to revoke access when someone leaves
Systematically revoke a departing user’s access: remove or rotate their SSH keys, revoke API tokens and sessions, remove them from organizations and team apps, and rotate shared credentials. Includes verification steps and a short audit so you can confirm access is gone.
- How to add your SSH key to a server or GitHub
Install your existing SSH public key on a remote server (via ssh-copy-id or authorized_keys) or in GitHub so you can authenticate without a password. Includes verification and common pitfalls: permissions, wrong key, and ssh-agent.
- How to set up two-factor authentication (2FA)
Enable 2FA on an account using an authenticator app (TOTP), save recovery or backup codes in a safe place, and verify that the next login requires the second factor. Includes what to do before turning 2FA on and how to recover if locked out or if the app clock is wrong.
- Access denied: how to fix permission errors
Diagnose and fix 'permission denied' and 'access denied' errors on Unix-like systems: SSH publickey and file permission issues. Learn how to read error messages, run minimal checks, and apply safe chmod/chown without using chmod 777 or weakening security.
- Accounts and access checklist before going live
Use this checklist before going live or handing off a system: confirm SSH keys and 2FA are in place, passwords and API tokens are managed safely, access is least-privilege, and you have a way to recover and revoke access. Reduces lockout and security risk at launch.
- How to create and secure an SSH key pair
Create an Ed25519 SSH key pair, set correct permissions and optional passphrase, and verify passwordless login to a remote server. Use this guide before adding keys to servers or GitHub.