Is web scraping legal?

Collecting publicly available data is often allowed, but check robots.txt, terms of use, and avoid hammering servers with excessive requests.

BeautifulSoup vs Selenium?

BeautifulSoup fits static HTML; Selenium fits pages that rely heavily on JavaScript.

Why set a User-Agent?

To reduce the chance of being blocked as an unidentified bot.

[2026] Python Web Scraping | BeautifulSoup and Selenium Explained

2026년 3월 28일 · 20분 읽기 · 수정 2026년 3월 28일 Intermediate Tutorial

이 글의 핵심

Python web scraping tutorial: requests, BeautifulSoup for static HTML, Selenium for dynamic pages, ethics (robots.txt, rate limits), and CSV export—SEO-friendly patterns.

Introduction

“Collect data from the web”

Web scraping is the technique of automatically extracting data from websites.

1. requests basics

Fetching HTML

아래 코드는 python를 사용한 구현 예제입니다. 필요한 모듈을 import하고. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

import requests
# GET request
response = requests.get('https://example.com')
print(response.status_code)  # 200
print(response.text)  # HTML body
print(response.headers)  # Response headers
# Custom User-Agent
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get('https://example.com', headers=headers)

2. BeautifulSoup

Parsing HTML

다음은 python를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

from bs4 import BeautifulSoup
import requests
url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Single tag
title = soup.find('title')
print(title.text)
# Multiple tags
links = soup.find_all('a')
for link in links:
    print(link.get('href'))
# CSS selectors
articles = soup.select('.article-title')
for article in articles:
    print(article.text)

Example: news headlines

다음은 python를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

import requests
from bs4 import BeautifulSoup
import pandas as pd
def scrape_news(url):
    """Collect news titles and links."""
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
    }
    
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    articles = []
    
    for item in soup.select('.news-item'):
        title = item.select_one('.title').text.strip()
        link = item.select_one('a')['href']
        date = item.select_one('.date').text.strip()
        
        articles.append({
            'title': title,
            'link': link,
            'date': date
        })
    
    return pd.DataFrame(articles)
# Usage
df = scrape_news('https://news.example.com')
df.to_csv('news.csv', index=False, encoding='utf-8-sig')

3. Selenium (dynamic pages)

Install

pip install selenium

Basic usage

다음은 python를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 에러 처리를 통해 안정성을 확보합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
try:
    driver.get('https://example.com')
    
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, 'content'))
    )
    
    title = driver.find_element(By.TAG_NAME, 'h1')
    print(title.text)
    
    button = driver.find_element(By.ID, 'load-more')
    button.click()
    
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    
finally:
    driver.quit()

4. Real-world example

Price monitoring

import requests
from bs4 import BeautifulSoup
import time
from datetime import datetime
def check_price(url, target_price):
    """Read product price from a page (selectors vary by site)."""
    headers = {
        'User-Agent': 'Mozilla/5.0'
    }
    
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    price_text = soup.select_one('.price').text
    price = int(price_text.replace(',', ').replace('원', '))
    
    print(f"[{datetime.now()}] Current price: {price:,} KRW")
    
    if price <= target_price:
        print(f"🎉 Target reached! (≤ {target_price:,} KRW)")
        return True
    
    return False
# Check every hour
url = 'https://shopping.example.com/product/123'
target = 50000
while True:
    if check_price(url, target):
        break
    time.sleep(3600)

5. Saving data

CSV export

아래 코드는 python를 사용한 구현 예제입니다. 필요한 모듈을 import하고, 함수를 통해 로직을 구현합니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

import pandas as pd
def scrape_and_save(url, output_file):
    """Scrape and write CSV."""
    data = scrape_data(url)
    df = pd.DataFrame(data)
    df.to_csv(output_file, index=False, encoding='utf-8-sig')
    print(f"Saved: {output_file}")

Practical tips

Scraping etiquette

# ✅ Check robots.txt
# https://example.com/robots.txt
# ✅ Space out requests
import time
time.sleep(1)
# ✅ Set a descriptive User-Agent
headers = {'User-Agent': '...'}
# ✅ Handle errors
try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Summary

Key takeaways

requests: HTTP calls
BeautifulSoup: HTML parsing
Selenium: JavaScript-heavy pages
Etiquette: robots.txt, pacing
Storage: CSV, JSON, databases

Next steps

Python environment setup | Install Python on Windows and Mac