Handbook 02 — 11 Chapters
Technical SEO
Handbook
The complete engineering layer of SEO — from how Googlebot crawls your site to Core Web Vitals, structured data, and resolving every indexation issue in Search Console.
Chapter 01
How Google Crawls and Indexes the Web
Before any SEO tactic works, you need to understand the three-stage pipeline that governs every URL on the internet: Crawl → Render → Index. A failure at any stage means zero rankings, regardless of how good your content is.
Crawl
Googlebot discovers URLs through sitemaps and by following links. It downloads the HTML of each page and places it in a crawl queue. Pages blocked by robots.txt are never downloaded.
Render
Google renders the downloaded HTML using a headless Chrome instance to execute JavaScript. This is why JavaScript-heavy SPAs can be problematic — rendering is resource-intensive and may be delayed by days or weeks.
Index
After rendering, Google decides whether to index the page. Thin content, duplicate content, noindex directives, or low-quality signals can all cause a page to be crawled but never indexed.
Rank
Indexed pages compete in Google's ranking algorithm across hundreds of signals. Ranking is only possible after successful crawl, render, and indexation — yet most SEO work starts here and ignores the prior three steps.
Chapter 02
Crawl Budget — What It Is and Why It Matters
Crawl budget is the number of pages Googlebot will crawl on your site within a given time frame. It is determined by two factors: crawl rate limit (how fast Googlebot crawls without overloading your server) and crawl demand (how much Google wants to crawl your site based on its perceived value).
For most small sites under 1,000 pages, crawl budget is not a concern. For large e-commerce, news, or enterprise sites with hundreds of thousands of URLs, wasted crawl budget directly costs rankings.
Common Crawl Budget Leaks
Infinite scroll parameters, faceted navigation, session IDs in URLs, paginated pages beyond page 3, and staging environments accidentally exposed to Googlebot.
How to Protect Budget
Block low-value URLs in robots.txt, consolidate faceted URLs with canonicals, use noindex on thin pages, and reduce redirect chains that waste crawl steps.
Track in Search Console
Search Console's Crawl Stats report shows daily Googlebot requests, average response time, and breakdown by file type — your primary crawl budget diagnostic tool.
| URL Type | Crawl Budget Impact | Fix |
|---|---|---|
| Faceted navigation URLs | Severe — can generate millions of thin pages | noindex or robots.txt block |
| URL parameters (session IDs) | High — creates duplicate content at scale | Canonicals + Search Console parameter handling |
| Redirect chains (3+ hops) | Medium — wastes crawl steps, dilutes link equity | Update to point directly to final destination |
| Soft 404 pages | Medium — Google wastes budget on dead ends | Return proper 404/410 HTTP status |
| Pagination beyond page 5 | Low–Medium — diminishing indexed value | noindex deep pagination pages |
Chapter 03
Site Architecture and Internal Linking
Site architecture determines how link equity flows through your site and how easily Googlebot and users can navigate it. The gold standard is the flat architecture: every important page reachable within 3 clicks of the homepage.
Silo Structure
Group related pages into topic clusters. A central pillar page links to supporting cluster pages; clusters link back to the pillar. This signals topical depth to Google and concentrates authority.
XML Sitemaps
Submit a sitemap containing only indexable, canonical URLs. Keep individual sitemaps under 50,000 URLs and 50 MB. Use a sitemap index file for large sites. Resubmit after major structural changes.
Breadcrumbs
Breadcrumb navigation reinforces URL hierarchy, provides internal links, and can appear as rich results in Google SERPs. Implement them with BreadcrumbList schema markup for maximum benefit.
Orphan Pages
Pages with zero internal links pointing to them are effectively invisible to Googlebot. Run a regular crawl audit (Screaming Frog) to identify and link orphan pages back into your architecture.
Chapter 04
Core Web Vitals — Google's Page Experience Signals
Core Web Vitals are Google's standardised metrics for measuring real-world user experience. They became a confirmed ranking factor in 2021 and are measured in the field (real user data via Chrome UX Report) and in the lab (Lighthouse, PageSpeed Insights).
| Metric | What It Measures | Good Threshold | Key Fix |
|---|---|---|---|
| LCP — Largest Contentful Paint | Loading performance — how fast the main content appears | ≤ 2.5 seconds | Optimise hero images, reduce server response time, use CDN |
| INP — Interaction to Next Paint | Responsiveness — how fast the page responds to input | ≤ 200 ms | Reduce JavaScript execution time, break up long tasks |
| CLS — Cumulative Layout Shift | Visual stability — do elements move as the page loads? | ≤ 0.1 | Add explicit width/height to images and embeds, avoid late-loading ads above the fold |
PageSpeed Insights
Shows both field data (real users, 28-day window) and lab data (controlled Lighthouse test). Field data is what Google uses for ranking. Lab data diagnoses specific issues.
Search Console CWV Report
Shows which URLs have Poor, Needs Improvement, or Good status across your entire site. Group URLs by template type to fix issues at scale rather than one page at a time.
Image Optimisation
Convert images to WebP or AVIF format. Add fetchpriority="high" to your LCP image element. Use responsive images with srcset. Lazy-load below-the-fold images only.
Technical SEO — Performance Impact
Chapter 05
Mobile-First Indexing
Since 2023, Google exclusively uses the mobile version of your content for indexing and ranking. If your desktop site has more content than your mobile site, you are being ranked on the stripped-down version. Desktop SEO is effectively dead.
- Ensure mobile and desktop show identical primary content and structured data
- Use responsive design (single URL, CSS adapts) rather than separate m. subdomain
- Never hide important text or headings behind "Read more" toggles on mobile only
- Verify Google can render your mobile pages using the URL Inspection tool
- Tap targets (buttons, links) must be at least 48×48 CSS pixels with 8px spacing
- Font size must be legible without zooming — minimum 16px for body text
- Avoid interstitials and intrusive pop-ups on mobile that obscure main content
Chapter 06
Structured Data and Schema Markup
Structured data is code you add to your HTML (using JSON-LD, the recommended format) that explicitly tells Google what your content means — not just what it says. It powers rich results: star ratings, FAQs, recipes, events, and more in the SERP.
| Schema Type | Rich Result Unlocked | Best For |
|---|---|---|
| Product | Price, availability, star ratings in SERP | E-commerce product pages |
| Article / BlogPosting | Article rich results, Top Stories carousel | Blog, news, editorial content |
| FAQPage | Expandable FAQ accordion in SERP | Support pages, guides with Q&A |
| HowTo | Step-by-step rich result with images | Tutorial and instructional content |
| LocalBusiness | Knowledge panel, maps integration | Location-based businesses |
| BreadcrumbList | Breadcrumb trail in URL shown in SERP | Any site with hierarchical content |
| Event | Event cards with date/location | Event pages and listings |
Chapter 07
JavaScript SEO
JavaScript is the single most common source of technical SEO problems for modern websites. When content is rendered by JavaScript rather than served in the initial HTML response, Googlebot may not see it — or see it only after a significant delay.
Client-Side Rendering
The server returns a near-empty HTML shell. The browser downloads and executes JavaScript to build the page. Googlebot has to do the same — and may defer rendering by hours or days.
Server-Side Rendering (SSR)
The server renders the full HTML before sending it to the browser. Googlebot receives fully formed content immediately. Best for SEO-critical pages with dynamic content (Next.js, Nuxt.js).
Static Generation (SSG)
Pages are pre-rendered at build time as static HTML. Zero rendering delay for Googlebot. Ideal for content that doesn't change frequently. Fastest for Core Web Vitals too.
To diagnose JavaScript SEO issues: use the URL Inspection tool in Search Console and compare the "View Crawled Page" HTML against the live rendered page. Any content that appears in the live page but not the crawled HTML is invisible to Google at crawl time.
Chapter 08
Canonicalisation — Solving Duplicate Content
Duplicate content occurs when multiple URLs serve the same or very similar content. Google must choose one version to index — and often chooses incorrectly. The canonical tag tells Google explicitly which URL is the "master" version.
— Place in the <head> of every page, including the canonical page itself.
Self-Referencing Canonicals
Every indexable page should carry a self-referencing canonical tag — even if no duplicate exists. This future-proofs against URL parameter injection and syndication.
HTTP vs HTTPS / www vs non-www
Choose one canonical version of your domain and redirect all others. Mixed signals between HTTP and HTTPS, or www and non-www, split your link equity and confuse indexation.
Trailing Slash Consistency
Pick a convention (with or without trailing slash) and canonicalise the other. /page/ and /page are treated as different URLs. Consistency across all internal links is just as important as the canonical tag.
Syndicated Content
If your content appears on third-party sites, ask them to point a canonical back to your original URL. This ensures you receive indexation credit rather than the syndication partner.
Chapter 09
Redirects and Hreflang
Redirects control how link equity and crawl budget flow when URLs change. Hreflang tells Google which language and region variant of a page to serve to which audience. Both are frequently misconfigured and silently damage rankings.
| Redirect Type | Use Case | Link Equity Passed |
|---|---|---|
| 301 Permanent | Page has moved permanently, old URL retired | ~99% (full equity) |
| 302 Temporary | Temporary redirect, old URL will return | Minimal — Google may stop crawling source |
| 307 Temporary | Same as 302 but method-preserving | Minimal |
| 410 Gone | Page permanently deleted, should be removed from index | None — removes from index faster than 404 |
Hreflang is required when you publish content in multiple languages or for multiple regional audiences. It must be implemented as a complete reciprocal set — every page must reference all its variants, including itself. Errors in just one tag in the set invalidate the entire cluster.
<link rel="alternate" hreflang="en-us" href="https://example.com/en-us/page/">
<link rel="alternate" hreflang="x-default" href="https://example.com/page/">
Chapter 10
Google Search Console Mastery
Search Console is the most authoritative SEO tool available — because it's direct communication from Google about how they see your site. Most practitioners use 20% of its capabilities.
Chapter 11
The Technical SEO Audit — Full Checklist
Run this audit quarterly for established sites and after any major site migration, redesign, or CMS change. Use Screaming Frog or Sitebulb for crawl data, Search Console for field data, and PageSpeed Insights for performance.
Access & Indexation
robots.txt accessible · XML sitemap submitted · No important pages blocked · No noindex on pages that should rank · URL inspection passes for key pages
Structure & Links
All key pages reachable in ≤3 clicks · No orphan pages · Breadcrumbs implemented · Internal links use descriptive anchor text · No broken internal links
Core Web Vitals
LCP ≤ 2.5s · INP ≤ 200ms · CLS ≤ 0.1 · Images in WebP/AVIF · Render-blocking resources eliminated · CDN in use
On-Page Technical
Unique title tags on all pages · Unique meta descriptions · One H1 per page · Canonical tags self-referencing · hreflang correct if multilingual · Schema validated