How to Avoid IP Bans When Web Scraping

IP bans are the most common obstacle in web scraping. Whether you are collecting product prices, monitoring competitors, or gathering training data, getting blocked means lost time and incomplete datasets. This guide covers the practical techniques that professional scrapers use to stay under the radar.

Why Websites Ban IPs

Before diving into solutions, it helps to understand what triggers bans. Websites use several detection signals, and bans usually result from a combination of them rather than any single factor.

Technique 1: Rotate Your IPs

The single most effective technique is IP rotation. Instead of sending all requests from one address, distribute them across many IPs so that no single address accumulates enough requests to trigger detection.

Residential proxies are ideal for this because each IP belongs to a real ISP and has a clean reputation. With a rotating gateway, every request automatically exits through a different IP — no pool management required on your end.

import requests
proxy_url = "http://USER:PASS@p.proxyshare.io:8080"
proxies = {"http": proxy_url, "https": proxy_url}
# Each request exits through a different residential IP
for page in range(1, 101):
resp = requests.get(
f"https://example.com/products?page={page}",
proxies=proxies,
timeout=30,
)
print(f"Page {page}: {resp.status_code}")

Technique 2: Set Realistic Headers

A real Chrome browser sends a specific set of headers with every request. Your scraper should mimic this. At minimum, set User-Agent, Accept, Accept-Language, and Accept-Encoding.

headers = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/122.0.0.0 Safari/537.36"
),
"Accept": (
"text/html,application/xhtml+xml,application/xml;"
"q=0.9,image/avif,image/webp,*/*;q=0.8"
),
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
}
session = requests.Session()
session.headers.update(headers)

Rotate your User-Agent strings periodically. Maintain a list of 10-20 current browser User-Agents and pick one randomly per session or per request.

Technique 3: Randomize Request Timing

Humans do not browse at perfectly regular intervals. Adding random delays between requests makes your traffic pattern look more natural.

import time
import random
for url in urls:
response = session.get(url, proxies=proxies, timeout=30)
process(response)
# Random delay between 0.5 and 3 seconds
time.sleep(random.uniform(0.5, 3.0))

The delay range depends on the target site. For aggressive anti-bot systems, use 2-5 second delays. For lighter protection, 0.5-1.5 seconds is usually enough. With rotating residential IPs, you can afford shorter delays because each request appears to come from a different user.

Technique 4: Respect robots.txt and Rate Limits

The robots.txt file specifies crawl guidelines. While not legally binding in most jurisdictions, respecting it demonstrates good faith and reduces your chance of getting actively blocked. Many sites specify a Crawl-delay directive — honor it.

When you receive a 429 (Too Many Requests) response, back off immediately. Implement exponential backoff: wait 2 seconds, then 4, then 8. Continuing to hammer a site after receiving 429s is the fastest way to get permanently banned.

def fetch_with_backoff(session, url, max_retries=3):
"""Fetch URL with exponential backoff on rate limits."""
for attempt in range(max_retries):
response = session.get(url, proxies=proxies, timeout=30)
if response.status_code == 429:
wait = 2 ** (attempt + 1)
print(f"Rate limited, waiting {wait}s...")
time.sleep(wait)
continue
return response
raise Exception(f"Failed after {max_retries} retries: {url}")

Technique 5: Use a Headless Browser for JS-Heavy Sites

Some websites require JavaScript execution to render content or to pass anti-bot checks. For these targets, tools like Puppeteer or Playwright are necessary. They run a real browser engine, which produces authentic TLS fingerprints and can execute JavaScript challenges.

import { chromium } from "playwright";
const browser = await chromium.launch({
proxy: {
server: "http://p.proxyshare.io:8080",
username: "USER",
password: "PASS",
},
});
const page = await browser.newPage();
await page.goto("https://example.com", {
waitUntil: "networkidle",
timeout: 60000,
});
// Now the page has fully rendered including JS content
const data = await page.evaluate(() => {
return document.querySelector(".product-price")?.textContent;
});
console.log(data);
await browser.close();

Headless browsers consume more bandwidth and are slower than raw HTTP requests. Use them only for sites that genuinely require JavaScript rendering. For most targets, a well-configured requests session with residential proxies is sufficient.

Technique 6: Vary Your Scraping Patterns

Do not scrape pages in sequential order. Shuffle your URL list so that requests appear random rather than systematic. Vary the pages you visit — occasionally access the homepage, category pages, and other non-target pages to make your browsing pattern look organic.

import random
# Shuffle URLs to avoid sequential patterns
urls = [f"https://example.com/product/{i}" for i in range(1, 500)]
random.shuffle(urls)
for url in urls:
response = session.get(url, proxies=proxies, timeout=30)
process(response)
time.sleep(random.uniform(1.0, 3.0))

Putting It All Together

No single technique is a silver bullet. The most reliable scraping setups combine multiple approaches: rotating residential IPs provide the foundation, realistic headers and timing add authenticity, proper error handling ensures resilience, and headless browsers handle the toughest targets.

Start simple and escalate only when needed. Many websites can be scraped with basic HTTP requests and rotating proxies. Save the complexity of headless browsers for sites that genuinely require them. Focus on resilience over speed — a slower scraper that runs reliably collects more data than a fast one that gets banned after 10 minutes.

Stop fighting IP bans

ProxyShare rotates residential IPs automatically on every request. Focus on your data, not your infrastructure.

View Plans