Technical SEO Handbook - SEO Reference Guide

Chapter 01

How Google Crawls and Indexes the Web

Before any SEO tactic works, you need to understand the three-stage pipeline that governs every URL on the internet: Crawl → Render → Index. A failure at any stage means zero rankings, regardless of how good your content is.

01

Crawl

Googlebot discovers URLs through sitemaps and by following links. It downloads the HTML of each page and places it in a crawl queue. Pages blocked by robots.txt are never downloaded.

02

Render

Google renders the downloaded HTML using a headless Chrome instance to execute JavaScript. This is why JavaScript-heavy SPAs can be problematic — rendering is resource-intensive and may be delayed by days or weeks.

03

Index

After rendering, Google decides whether to index the page. Thin content, duplicate content, noindex directives, or low-quality signals can all cause a page to be crawled but never indexed.

04

Rank

Indexed pages compete in Google's ranking algorithm across hundreds of signals. Ranking is only possible after successful crawl, render, and indexation — yet most SEO work starts here and ignores the prior three steps.

"If Googlebot can't crawl your page, no amount of link building or content optimization will move rankings. Fix the foundation first."

Chapter 02

Crawl Budget — What It Is and Why It Matters

Crawl budget is the number of pages Googlebot will crawl on your site within a given time frame. It is determined by two factors: crawl rate limit (how fast Googlebot crawls without overloading your server) and crawl demand (how much Google wants to crawl your site based on its perceived value).

For most small sites under 1,000 pages, crawl budget is not a concern. For large e-commerce, news, or enterprise sites with hundreds of thousands of URLs, wasted crawl budget directly costs rankings.

Wasting Budget

Common Crawl Budget Leaks

Infinite scroll parameters, faceted navigation, session IDs in URLs, paginated pages beyond page 3, and staging environments accidentally exposed to Googlebot.

Saving Budget

How to Protect Budget

Block low-value URLs in robots.txt, consolidate faceted URLs with canonicals, use noindex on thin pages, and reduce redirect chains that waste crawl steps.

Monitoring

Track in Search Console

Search Console's Crawl Stats report shows daily Googlebot requests, average response time, and breakdown by file type — your primary crawl budget diagnostic tool.

URL Type	Crawl Budget Impact	Fix
Faceted navigation URLs	Severe — can generate millions of thin pages	noindex or robots.txt block
URL parameters (session IDs)	High — creates duplicate content at scale	Canonicals + Search Console parameter handling
Redirect chains (3+ hops)	Medium — wastes crawl steps, dilutes link equity	Update to point directly to final destination
Soft 404 pages	Medium — Google wastes budget on dead ends	Return proper 404/410 HTTP status
Pagination beyond page 5	Low–Medium — diminishing indexed value	noindex deep pagination pages

Chapter 03

Site Architecture and Internal Linking

Site architecture determines how link equity flows through your site and how easily Googlebot and users can navigate it. The gold standard is the flat architecture: every important page reachable within 3 clicks of the homepage.

"Your internal linking structure is a vote cast for which pages matter most. Every link is a signal — use them with intent."

01

Silo Structure

Group related pages into topic clusters. A central pillar page links to supporting cluster pages; clusters link back to the pillar. This signals topical depth to Google and concentrates authority.

02

XML Sitemaps

Submit a sitemap containing only indexable, canonical URLs. Keep individual sitemaps under 50,000 URLs and 50 MB. Use a sitemap index file for large sites. Resubmit after major structural changes.

03

Breadcrumbs

Breadcrumb navigation reinforces URL hierarchy, provides internal links, and can appear as rich results in Google SERPs. Implement them with BreadcrumbList schema markup for maximum benefit.

04

Orphan Pages

Pages with zero internal links pointing to them are effectively invisible to Googlebot. Run a regular crawl audit (Screaming Frog) to identify and link orphan pages back into your architecture.

Chapter 04

Core Web Vitals — Google's Page Experience Signals

Core Web Vitals are Google's standardised metrics for measuring real-world user experience. They became a confirmed ranking factor in 2021 and are measured in the field (real user data via Chrome UX Report) and in the lab (Lighthouse, PageSpeed Insights).

Metric	What It Measures	Good Threshold	Key Fix
LCP — Largest Contentful Paint	Loading performance — how fast the main content appears	≤ 2.5 seconds	Optimise hero images, reduce server response time, use CDN
INP — Interaction to Next Paint	Responsiveness — how fast the page responds to input	≤ 200 ms	Reduce JavaScript execution time, break up long tasks
CLS — Cumulative Layout Shift	Visual stability — do elements move as the page loads?	≤ 0.1	Add explicit width/height to images and embeds, avoid late-loading ads above the fold

Diagnosis

PageSpeed Insights

Shows both field data (real users, 28-day window) and lab data (controlled Lighthouse test). Field data is what Google uses for ranking. Lab data diagnoses specific issues.

Bulk Analysis

Search Console CWV Report

Shows which URLs have Poor, Needs Improvement, or Good status across your entire site. Group URLs by template type to fix issues at scale rather than one page at a time.

LCP Quick Win

Image Optimisation

Convert images to WebP or AVIF format. Add fetchpriority="high" to your LCP image element. Use responsive images with srcset. Lazy-load below-the-fold images only.

Chapter 05

Mobile-First Indexing

Since 2023, Google exclusively uses the mobile version of your content for indexing and ranking. If your desktop site has more content than your mobile site, you are being ranked on the stripped-down version. Desktop SEO is effectively dead.

Ensure mobile and desktop show identical primary content and structured data
Use responsive design (single URL, CSS adapts) rather than separate m. subdomain
Never hide important text or headings behind "Read more" toggles on mobile only
Verify Google can render your mobile pages using the URL Inspection tool
Tap targets (buttons, links) must be at least 48×48 CSS pixels with 8px spacing
Font size must be legible without zooming — minimum 16px for body text
Avoid interstitials and intrusive pop-ups on mobile that obscure main content

Chapter 06

Structured Data and Schema Markup

Structured data is code you add to your HTML (using JSON-LD, the recommended format) that explicitly tells Google what your content means — not just what it says. It powers rich results: star ratings, FAQs, recipes, events, and more in the SERP.

"Structured data doesn't directly improve rankings — but rich results dramatically improve CTR, which is a ranking signal in itself."

Schema Type	Rich Result Unlocked	Best For
Product	Price, availability, star ratings in SERP	E-commerce product pages
Article / BlogPosting	Article rich results, Top Stories carousel	Blog, news, editorial content
FAQPage	Expandable FAQ accordion in SERP	Support pages, guides with Q&A
HowTo	Step-by-step rich result with images	Tutorial and instructional content
LocalBusiness	Knowledge panel, maps integration	Location-based businesses
BreadcrumbList	Breadcrumb trail in URL shown in SERP	Any site with hierarchical content
Event	Event cards with date/location	Event pages and listings

Always use JSON-LD. It sits in a separate script block, is easier to maintain, and is Google's recommended format. Microdata is embedded inline in HTML, making templates messy. RDFa is rarely used and should be avoided.

Use Google's Rich Results Test (search.google.com/test/rich-results) to check if your markup is valid and eligible for rich results. The Schema Markup Validator (validator.schema.org) checks for broader schema.org conformance. After deployment, monitor the Enhancements section in Search Console for errors.

Marking up content that isn't visible on the page (Google calls this "misleading markup"). Using review schema to add self-written reviews to your own products. Missing required properties (e.g., Product schema without name or image). These violate Google's guidelines and can result in manual actions.

Chapter 07

JavaScript SEO

JavaScript is the single most common source of technical SEO problems for modern websites. When content is rendered by JavaScript rather than served in the initial HTML response, Googlebot may not see it — or see it only after a significant delay.

Problem

Client-Side Rendering

The server returns a near-empty HTML shell. The browser downloads and executes JavaScript to build the page. Googlebot has to do the same — and may defer rendering by hours or days.

Better

Server-Side Rendering (SSR)

The server renders the full HTML before sending it to the browser. Googlebot receives fully formed content immediately. Best for SEO-critical pages with dynamic content (Next.js, Nuxt.js).

Best

Static Generation (SSG)

Pages are pre-rendered at build time as static HTML. Zero rendering delay for Googlebot. Ideal for content that doesn't change frequently. Fastest for Core Web Vitals too.

To diagnose JavaScript SEO issues: use the URL Inspection tool in Search Console and compare the "View Crawled Page" HTML against the live rendered page. Any content that appears in the live page but not the crawled HTML is invisible to Google at crawl time.

Chapter 08

Canonicalisation — Solving Duplicate Content

Duplicate content occurs when multiple URLs serve the same or very similar content. Google must choose one version to index — and often chooses incorrectly. The canonical tag tells Google explicitly which URL is the "master" version.

01

Self-Referencing Canonicals

Every indexable page should carry a self-referencing canonical tag — even if no duplicate exists. This future-proofs against URL parameter injection and syndication.

02

HTTP vs HTTPS / www vs non-www

Choose one canonical version of your domain and redirect all others. Mixed signals between HTTP and HTTPS, or www and non-www, split your link equity and confuse indexation.

03

Trailing Slash Consistency

Pick a convention (with or without trailing slash) and canonicalise the other. /page/ and /page are treated as different URLs. Consistency across all internal links is just as important as the canonical tag.

04

Syndicated Content

If your content appears on third-party sites, ask them to point a canonical back to your original URL. This ensures you receive indexation credit rather than the syndication partner.

Chapter 09

Redirects and Hreflang

Redirects control how link equity and crawl budget flow when URLs change. Hreflang tells Google which language and region variant of a page to serve to which audience. Both are frequently misconfigured and silently damage rankings.

Redirect Type	Use Case	Link Equity Passed
301 Permanent	Page has moved permanently, old URL retired	~99% (full equity)
302 Temporary	Temporary redirect, old URL will return	Minimal — Google may stop crawling source
307 Temporary	Same as 302 but method-preserving	Minimal
410 Gone	Page permanently deleted, should be removed from index	None — removes from index faster than 404

Hreflang is required when you publish content in multiple languages or for multiple regional audiences. It must be implemented as a complete reciprocal set — every page must reference all its variants, including itself. Errors in just one tag in the set invalidate the entire cluster.

Chapter 10

Google Search Console Mastery

Search Console is the most authoritative SEO tool available — because it's direct communication from Google about how they see your site. Most practitioners use 20% of its capabilities.

Filter by page, then sort by impressions descending. Find pages with high impressions but low CTR (under 2%) — these have ranking but weak titles or meta descriptions. Rewrite them. Then find pages with good CTR but average position 8–15 — these are ranking just off page one. Improve content depth or add internal links to push them to position 1–3.

Error URLs need immediate fixing: server errors, redirect errors, submitted-but-blocked URLs. Excluded URLs need judgement: "Crawled, currently not indexed" is the most important bucket — Google crawled but chose not to index these pages, usually due to thin content or low quality. "Discovered, currently not indexed" means Google hasn't even crawled them yet, often indicating crawl budget issues.

Paste any URL to see: whether it's indexed, the last crawl date, the canonical Google chose, what the rendered HTML looked like to Googlebot, and any structured data detected. Use "Request indexing" after publishing new content or making significant updates — this prioritises the URL in the crawl queue.

The CWV report groups URLs by similar template type — fixing one instance of a template type often fixes thousands of URLs simultaneously. Focus on the "Poor" URLs first; "Needs Improvement" has lower ranking impact. Once fixed, it takes 28 days for field data to update, so plan improvements well in advance of deadlines.

Chapter 11

The Technical SEO Audit — Full Checklist

Run this audit quarterly for established sites and after any major site migration, redesign, or CMS change. Use Screaming Frog or Sitebulb for crawl data, Search Console for field data, and PageSpeed Insights for performance.

Crawlability

Access & Indexation

robots.txt accessible · XML sitemap submitted · No important pages blocked · No noindex on pages that should rank · URL inspection passes for key pages

Architecture

Structure & Links

All key pages reachable in ≤3 clicks · No orphan pages · Breadcrumbs implemented · Internal links use descriptive anchor text · No broken internal links

Performance

Core Web Vitals

LCP ≤ 2.5s · INP ≤ 200ms · CLS ≤ 0.1 · Images in WebP/AVIF · Render-blocking resources eliminated · CDN in use

Content Signals

On-Page Technical

Unique title tags on all pages · Unique meta descriptions · One H1 per page · Canonical tags self-referencing · hreflang correct if multilingual · Schema validated

"A technical audit isn't a one-time event. It's a quarterly habit that compounds — each fix you make now is one fewer problem compounding against you in six months."

Technical SEOHandbook