← Back to SEO Overview

Handbook 02 — 11 Chapters

Technical SEO
Handbook

The complete engineering layer of SEO — from how Googlebot crawls your site to Core Web Vitals, structured data, and resolving every indexation issue in Search Console.

Chapter 01

How Google Crawls and Indexes the Web

Before any SEO tactic works, you need to understand the three-stage pipeline that governs every URL on the internet: Crawl → Render → Index. A failure at any stage means zero rankings, regardless of how good your content is.

01

Crawl

Googlebot discovers URLs through sitemaps and by following links. It downloads the HTML of each page and places it in a crawl queue. Pages blocked by robots.txt are never downloaded.

02

Render

Google renders the downloaded HTML using a headless Chrome instance to execute JavaScript. This is why JavaScript-heavy SPAs can be problematic — rendering is resource-intensive and may be delayed by days or weeks.

03

Index

After rendering, Google decides whether to index the page. Thin content, duplicate content, noindex directives, or low-quality signals can all cause a page to be crawled but never indexed.

04

Rank

Indexed pages compete in Google's ranking algorithm across hundreds of signals. Ranking is only possible after successful crawl, render, and indexation — yet most SEO work starts here and ignores the prior three steps.

"If Googlebot can't crawl your page, no amount of link building or content optimization will move rankings. Fix the foundation first."

Chapter 02

Crawl Budget — What It Is and Why It Matters

Crawl budget is the number of pages Googlebot will crawl on your site within a given time frame. It is determined by two factors: crawl rate limit (how fast Googlebot crawls without overloading your server) and crawl demand (how much Google wants to crawl your site based on its perceived value).

For most small sites under 1,000 pages, crawl budget is not a concern. For large e-commerce, news, or enterprise sites with hundreds of thousands of URLs, wasted crawl budget directly costs rankings.

Wasting Budget

Common Crawl Budget Leaks

Infinite scroll parameters, faceted navigation, session IDs in URLs, paginated pages beyond page 3, and staging environments accidentally exposed to Googlebot.

Saving Budget

How to Protect Budget

Block low-value URLs in robots.txt, consolidate faceted URLs with canonicals, use noindex on thin pages, and reduce redirect chains that waste crawl steps.

Monitoring

Track in Search Console

Search Console's Crawl Stats report shows daily Googlebot requests, average response time, and breakdown by file type — your primary crawl budget diagnostic tool.

URL TypeCrawl Budget ImpactFix
Faceted navigation URLsSevere — can generate millions of thin pagesnoindex or robots.txt block
URL parameters (session IDs)High — creates duplicate content at scaleCanonicals + Search Console parameter handling
Redirect chains (3+ hops)Medium — wastes crawl steps, dilutes link equityUpdate to point directly to final destination
Soft 404 pagesMedium — Google wastes budget on dead endsReturn proper 404/410 HTTP status
Pagination beyond page 5Low–Medium — diminishing indexed valuenoindex deep pagination pages

Chapter 03

Site Architecture and Internal Linking

Site architecture determines how link equity flows through your site and how easily Googlebot and users can navigate it. The gold standard is the flat architecture: every important page reachable within 3 clicks of the homepage.

"Your internal linking structure is a vote cast for which pages matter most. Every link is a signal — use them with intent."
01

Silo Structure

Group related pages into topic clusters. A central pillar page links to supporting cluster pages; clusters link back to the pillar. This signals topical depth to Google and concentrates authority.

02

XML Sitemaps

Submit a sitemap containing only indexable, canonical URLs. Keep individual sitemaps under 50,000 URLs and 50 MB. Use a sitemap index file for large sites. Resubmit after major structural changes.

03

Breadcrumbs

Breadcrumb navigation reinforces URL hierarchy, provides internal links, and can appear as rich results in Google SERPs. Implement them with BreadcrumbList schema markup for maximum benefit.

04

Orphan Pages

Pages with zero internal links pointing to them are effectively invisible to Googlebot. Run a regular crawl audit (Screaming Frog) to identify and link orphan pages back into your architecture.

Chapter 04

Core Web Vitals — Google's Page Experience Signals

Core Web Vitals are Google's standardised metrics for measuring real-world user experience. They became a confirmed ranking factor in 2021 and are measured in the field (real user data via Chrome UX Report) and in the lab (Lighthouse, PageSpeed Insights).

MetricWhat It MeasuresGood ThresholdKey Fix
LCP — Largest Contentful PaintLoading performance — how fast the main content appears≤ 2.5 secondsOptimise hero images, reduce server response time, use CDN
INP — Interaction to Next PaintResponsiveness — how fast the page responds to input≤ 200 msReduce JavaScript execution time, break up long tasks
CLS — Cumulative Layout ShiftVisual stability — do elements move as the page loads?≤ 0.1Add explicit width/height to images and embeds, avoid late-loading ads above the fold
Diagnosis

PageSpeed Insights

Shows both field data (real users, 28-day window) and lab data (controlled Lighthouse test). Field data is what Google uses for ranking. Lab data diagnoses specific issues.

Bulk Analysis

Search Console CWV Report

Shows which URLs have Poor, Needs Improvement, or Good status across your entire site. Group URLs by template type to fix issues at scale rather than one page at a time.

LCP Quick Win

Image Optimisation

Convert images to WebP or AVIF format. Add fetchpriority="high" to your LCP image element. Use responsive images with srcset. Lazy-load below-the-fold images only.

Technical SEO — Performance Impact

Sites failing Core Web Vitals that rank in top 318%
Bounce rate increase per 1-second delay in load time+32%
Organic traffic gain after passing Core Web Vitals+12%
Large sites with critical crawl errors at any time67%

Chapter 05

Mobile-First Indexing

Since 2023, Google exclusively uses the mobile version of your content for indexing and ranking. If your desktop site has more content than your mobile site, you are being ranked on the stripped-down version. Desktop SEO is effectively dead.

  • Ensure mobile and desktop show identical primary content and structured data
  • Use responsive design (single URL, CSS adapts) rather than separate m. subdomain
  • Never hide important text or headings behind "Read more" toggles on mobile only
  • Verify Google can render your mobile pages using the URL Inspection tool
  • Tap targets (buttons, links) must be at least 48×48 CSS pixels with 8px spacing
  • Font size must be legible without zooming — minimum 16px for body text
  • Avoid interstitials and intrusive pop-ups on mobile that obscure main content

Chapter 06

Structured Data and Schema Markup

Structured data is code you add to your HTML (using JSON-LD, the recommended format) that explicitly tells Google what your content means — not just what it says. It powers rich results: star ratings, FAQs, recipes, events, and more in the SERP.

"Structured data doesn't directly improve rankings — but rich results dramatically improve CTR, which is a ranking signal in itself."
Schema TypeRich Result UnlockedBest For
ProductPrice, availability, star ratings in SERPE-commerce product pages
Article / BlogPostingArticle rich results, Top Stories carouselBlog, news, editorial content
FAQPageExpandable FAQ accordion in SERPSupport pages, guides with Q&A
HowToStep-by-step rich result with imagesTutorial and instructional content
LocalBusinessKnowledge panel, maps integrationLocation-based businesses
BreadcrumbListBreadcrumb trail in URL shown in SERPAny site with hierarchical content
EventEvent cards with date/locationEvent pages and listings
Always use JSON-LD. It sits in a separate script block, is easier to maintain, and is Google's recommended format. Microdata is embedded inline in HTML, making templates messy. RDFa is rarely used and should be avoided.
Use Google's Rich Results Test (search.google.com/test/rich-results) to check if your markup is valid and eligible for rich results. The Schema Markup Validator (validator.schema.org) checks for broader schema.org conformance. After deployment, monitor the Enhancements section in Search Console for errors.
Marking up content that isn't visible on the page (Google calls this "misleading markup"). Using review schema to add self-written reviews to your own products. Missing required properties (e.g., Product schema without name or image). These violate Google's guidelines and can result in manual actions.

Chapter 07

JavaScript SEO

JavaScript is the single most common source of technical SEO problems for modern websites. When content is rendered by JavaScript rather than served in the initial HTML response, Googlebot may not see it — or see it only after a significant delay.

Problem

Client-Side Rendering

The server returns a near-empty HTML shell. The browser downloads and executes JavaScript to build the page. Googlebot has to do the same — and may defer rendering by hours or days.

Better

Server-Side Rendering (SSR)

The server renders the full HTML before sending it to the browser. Googlebot receives fully formed content immediately. Best for SEO-critical pages with dynamic content (Next.js, Nuxt.js).

Best

Static Generation (SSG)

Pages are pre-rendered at build time as static HTML. Zero rendering delay for Googlebot. Ideal for content that doesn't change frequently. Fastest for Core Web Vitals too.

To diagnose JavaScript SEO issues: use the URL Inspection tool in Search Console and compare the "View Crawled Page" HTML against the live rendered page. Any content that appears in the live page but not the crawled HTML is invisible to Google at crawl time.

Chapter 08

Canonicalisation — Solving Duplicate Content

Duplicate content occurs when multiple URLs serve the same or very similar content. Google must choose one version to index — and often chooses incorrectly. The canonical tag tells Google explicitly which URL is the "master" version.

<link rel="canonical" href="https://example.com/the-correct-url/">
— Place in the <head> of every page, including the canonical page itself.
01

Self-Referencing Canonicals

Every indexable page should carry a self-referencing canonical tag — even if no duplicate exists. This future-proofs against URL parameter injection and syndication.

02

HTTP vs HTTPS / www vs non-www

Choose one canonical version of your domain and redirect all others. Mixed signals between HTTP and HTTPS, or www and non-www, split your link equity and confuse indexation.

03

Trailing Slash Consistency

Pick a convention (with or without trailing slash) and canonicalise the other. /page/ and /page are treated as different URLs. Consistency across all internal links is just as important as the canonical tag.

04

Syndicated Content

If your content appears on third-party sites, ask them to point a canonical back to your original URL. This ensures you receive indexation credit rather than the syndication partner.

Chapter 09

Redirects and Hreflang

Redirects control how link equity and crawl budget flow when URLs change. Hreflang tells Google which language and region variant of a page to serve to which audience. Both are frequently misconfigured and silently damage rankings.

Redirect TypeUse CaseLink Equity Passed
301 PermanentPage has moved permanently, old URL retired~99% (full equity)
302 TemporaryTemporary redirect, old URL will returnMinimal — Google may stop crawling source
307 TemporarySame as 302 but method-preservingMinimal
410 GonePage permanently deleted, should be removed from indexNone — removes from index faster than 404

Hreflang is required when you publish content in multiple languages or for multiple regional audiences. It must be implemented as a complete reciprocal set — every page must reference all its variants, including itself. Errors in just one tag in the set invalidate the entire cluster.

<link rel="alternate" hreflang="en-gb" href="https://example.com/en-gb/page/">
<link rel="alternate" hreflang="en-us" href="https://example.com/en-us/page/">
<link rel="alternate" hreflang="x-default" href="https://example.com/page/">

Chapter 10

Google Search Console Mastery

Search Console is the most authoritative SEO tool available — because it's direct communication from Google about how they see your site. Most practitioners use 20% of its capabilities.

Filter by page, then sort by impressions descending. Find pages with high impressions but low CTR (under 2%) — these have ranking but weak titles or meta descriptions. Rewrite them. Then find pages with good CTR but average position 8–15 — these are ranking just off page one. Improve content depth or add internal links to push them to position 1–3.
Error URLs need immediate fixing: server errors, redirect errors, submitted-but-blocked URLs. Excluded URLs need judgement: "Crawled, currently not indexed" is the most important bucket — Google crawled but chose not to index these pages, usually due to thin content or low quality. "Discovered, currently not indexed" means Google hasn't even crawled them yet, often indicating crawl budget issues.
Paste any URL to see: whether it's indexed, the last crawl date, the canonical Google chose, what the rendered HTML looked like to Googlebot, and any structured data detected. Use "Request indexing" after publishing new content or making significant updates — this prioritises the URL in the crawl queue.
The CWV report groups URLs by similar template type — fixing one instance of a template type often fixes thousands of URLs simultaneously. Focus on the "Poor" URLs first; "Needs Improvement" has lower ranking impact. Once fixed, it takes 28 days for field data to update, so plan improvements well in advance of deadlines.

Chapter 11

The Technical SEO Audit — Full Checklist

Run this audit quarterly for established sites and after any major site migration, redesign, or CMS change. Use Screaming Frog or Sitebulb for crawl data, Search Console for field data, and PageSpeed Insights for performance.

Crawlability

Access & Indexation

robots.txt accessible · XML sitemap submitted · No important pages blocked · No noindex on pages that should rank · URL inspection passes for key pages

Architecture

Structure & Links

All key pages reachable in ≤3 clicks · No orphan pages · Breadcrumbs implemented · Internal links use descriptive anchor text · No broken internal links

Performance

Core Web Vitals

LCP ≤ 2.5s · INP ≤ 200ms · CLS ≤ 0.1 · Images in WebP/AVIF · Render-blocking resources eliminated · CDN in use

Content Signals

On-Page Technical

Unique title tags on all pages · Unique meta descriptions · One H1 per page · Canonical tags self-referencing · hreflang correct if multilingual · Schema validated

"A technical audit isn't a one-time event. It's a quarterly habit that compounds — each fix you make now is one fewer problem compounding against you in six months."