Hello. I'm No Tuggeun, and besides this blog, I run various other sites. I host everything—my company homepage, personal blog, and even WordPress sites I've built for clients—on a single AWS Lightsail instance.
Running everything on a single instance keeps costs low,
However, there's a downside. Whether it's a dynamic site or static pages, having everything on one server… what happens when traffic spikes? The entire setup can go down. (If one static site goes down, it takes down the company homepage and all the outsourced sites I host.)

That's why I regularly check server traffic and spend time blocking "malicious bots."
For WordPress, to prevent server traffic issues, you can achieve relatively stable server operation by combining two basic settings: Wordfence plugin configuration + robots.txt malicious bot blocking.
This article covers robots.txt and shares the optimized robots.txt file I've developed through experience.
What is robots.txt? (Concept Overview)
robots.txt is a communication file for robots. It communicates with
search engines and AI robots (crawlers). Crawlers (robots) are programs like Google, Naver, and GPT that scan websites.
The robots.txt file uses code to distinguish between information we want shared and information we don't want shared, telling them what to crawl and what not to crawl.
-
Example:
- To have your homepage appear in Google search results 👉 Robots must be able to read your content
- But what if they crawl login pages or admin screens? ❌ That's risky, so we must tell them not to crawl it.
So we use a file called robots.txt to tell them, "You can crawl this / Don't crawl this."
-
❌ What if there's no robots.txt?
- Wasted site traffic + security risk
- Most bots crawl every page by default
- Malicious bots can scrape and scan admin pages too
robots.txt Where should the file be located?
robots.txt The file must always be in the domain root directory.
https://내사이트주소.com/robots.txt
Accessing the above address allows both bots and humans to view the robots.txt file.
robots.txt Basic Syntax Reference Table
| Syntax | Meaning | Example | Description |
|---|---|---|---|
User-agent: | Specifies target robots | User-agent: * | Applies to all robots (crawlers) (Googlebot, Bingbot, etc.) |
Disallow: | Set paths to block | Disallow: /private/ | Prevent robots (crawlers) from scraping the specified path |
Allow: | Set paths to allow access | Allow: /public/ | Allow robots (crawlers) to crawl these paths |
Sitemap: | Specify sitemap location | Sitemap: https://example.com/sitemap.xml | Guides site structure to aid search engine optimization |
- User-agent: Specifies who the
instruction is for. Example:*= All,Googlebot= Google only - Disallow: Don't look
here Example:/public - Allow: You can see this
. Example:/wp-admin/admin-ajax.php - sitemap: The site structure is here. Used to tell
search engines about the sitemap
"User-agent:"
This syntax tells you "who it's talking to." For example, User-agent: *If you write it like this, it applies to all bots, whether Googlebot or Naverbot. If you want to
tell only a specific bot, User-agent: Googlebot Write it like this:
"Disallow:"
This is a command prohibiting access: "Do not look at this path!" For example, Disallow: /private/ If you write `/index.html`, the robot will not read the content below. example.com/private/ will not read the content below.
"Allow:"
Conversely, this is permission saying "You can crawl here!" It's mainly used Disallow:" when you block everything and only open exceptions within it.
"Sitemap:" It tells
search engines, "Here's the map of our house!" Having a sitemap file helps search engines understand your site better and expose it more.
Frequently used ROBOTS.TXT file
1. Allow access to entire site: All robots can crawl everything!
User-agent: *
Disallow:
2. Block entire site: Absolutely no access. Not visible to search engines either.
User-agent: *
Disallow: /
3. Block specific crawlers (e.g., AhrefsBot): Block backlink scanning bots like Ahrefs that generate traffic
User-agent: AhrefsBot
Disallow: /
4. Block specific folders: /private/ Access to the folder contents is prohibited
User-agent: *
Disallow: /private/
You cannot "block humans" with robots.txt
robots.txt only applies to robots. Humans accessing directly via browser will see everything.
To block humans,
- Redirect to a login page
- Implement a member authentication system
- Server-side
User-AgentYou can use
to block people from the homepage or redirect them to the login page.
WordPress Default robots.txt
The robots.txt code below is the default robots.txt file automatically generated upon WordPress installation.
# 워드프레스 기본 설정
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Allow: /wp-admin/admin-ajax.php
Sitemap: https://사이트주소.com/sitemap_index.xml
📌If you set it to the default WordPress robots.txt file, you may experience server downtime due to increased traffic. (Server downtime in WordPress can stem from various causes. Examples: using low-cost hosting, robots.txt issues, server crashes, plugin conflicts, etc.)
Nowadays, beyond simple search engine crawlers, AI crawlers are becoming increasingly common.
GPTBot, ClaudeBot, Applebot, Perplexity… While some AI bots are welcome, others are malicious bots that only generate traffic and scrape your content.
For bots that can be utilized (excluding malicious bots), I've organized them in the robots.txt file to ensure they can crawl properly.
AI Crawler Control + Malicious Bot Blocking Version (2025.05.23)
The robots.txt file I created follows these principles:
| Item | Setting Method | Purpose |
|---|---|---|
| WordPress Default Security | Disallow Setting | Block Login Page |
| AI Crawlers | Crawl-delay | Allow positive exposure but control speed |
| Malicious Bots | Disallow: / | Block traffic/information scraping |
| For search engines | Allow + Sitemap | Maintain SEO optimization |
-
1. Allow AI crawlers but throttle speed
- GPTBot, Gemini, Applebot, etc.
Crawl-delay: 30Configure - Scrape our content, but come slowly
- GPTBot, Gemini, Applebot, etc.
-
2. Block malicious bots outright
- Ahrefs, Semrush, MJ12, etc. Backlink analysis bots: Completely blocked
- DataForSeoBot, barkrowler, and other unknown information scraping bots OUT
-
3. Block suspicious crawlers based in Russia/China
- Yandex, PetalBot, MauiBot, etc.
Disallow: /Handled
- Yandex, PetalBot, MauiBot, etc.
The robots.txt file can be used in two ways. It can be downloaded and uploaded directly to the root folder, or it is written in two ways so that the robots.txt code can be copied and pasted.
robots.txt file distribution methods
🔹 Method 1: Directly download the robots.txt file and upload it to the root
🔹 Method 2: Copy + Paste the code below
**워드프레스 Robots.txt 최적화 코드 ( ai bot + 악성 봇 차단)**
# == 워드프레스==
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Allow: /wp-admin/admin-ajax.php
# ==============================================
# 🤖 AI & SEO 크롤러 제어 설정 - by 노퇴근
# GPTBot, Ahrefs, Baidu 등 트래픽 유발 크롤러 관리
# robots.txt v2025.05.23
# ==============================================
# 🧠 국내 AI 크롤러들
# ====================================
# 네이버의 클로바 AI 크롤러
User-agent: CLOVA
Crawl-delay: 30
# 카카오의 AI 및 검색용 크롤러
User-agent: KakaoBot
Crawl-delay: 30
# ====================================
# 🌎 글로벌 AI 크롤러들 - 허용하되 딜레이만 설정
# ====================================
# OpenAI의 ChatGPT용 크롤러 (공식)
User-agent: GPTBot
Crawl-delay: 30
# 구글의 Gemini (Bard) AI 관련 크롤러 (추정)
User-agent: Gemini
Crawl-delay: 30
# 마이크로소프트의 Copilot (VS Code 등 연동)
User-agent: Copilot
Crawl-delay: 30
# Anthropic Claude AI의 일반 User-agent (별도 공식 미확인)
User-agent: Claude
Crawl-delay: 30
# Perplexity AI의 검색형 LLM 봇
User-agent: Perplexity
Crawl-delay: 30
# ChatGPT와 연결된 일반 유저 요청 (비공식 User-agent 사용시)
User-agent: ChatGPT-User
Crawl-delay: 30
# ====================================
# 🍏 Apple & Microsoft AI 크롤러 - 허용하되 딜레이만 설정
# ====================================
# 🍏 Apple의 Siri/Spotlight용
User-agent: Applebot
Crawl-delay: 30
# Apple의 AI 학습용 확장 크롤러
User-agent: Applebot-Extended
Crawl-delay: 30
# Bing AI 기반 봇 (Copilot 연계)
User-agent: Bing AI
Crawl-delay: 30
# ====================================
# 🌐 글로벌 번역/검색/대화형 AI
# ====================================
# DeepL 번역 서비스 연동 크롤러
User-agent: DeepL
Crawl-delay: 30
# 캐릭터 기반 대화 AI 서비스 (Character.AI)
User-agent: Character.AI
Crawl-delay: 30
# Quora 기반 Poe AI 또는 관련 크롤러
User-agent: Quora
Crawl-delay: 30
# Microsoft의 실험적 대화형 모델 DialoGPT (추정 User-agent)
User-agent: DialoGPT
Crawl-delay: 30
# Otter.ai 회의 텍스트 전환 및 음성 분석 서비스
User-agent: Otter
Crawl-delay: 30
# 학생용 학습 Q&A AI 앱 Socratic (구글 소유)
User-agent: Socratic
Crawl-delay: 30
# ====================================
# ✍️ AI 콘텐츠 자동생성 툴들
# ====================================
# Writesonic (ChatGPT 대안급 AI 카피/에디터)
User-agent: Writesonic
Crawl-delay: 30
# CopyAI (스타트업 대상 카피라이팅 AI)
User-agent: CopyAI
Crawl-delay: 30
# Jasper (전문 마케팅/블로그 AI)
User-agent: Jasper
Crawl-delay: 30
# ELSA 스피킹/영어 말하기 코칭 AI
User-agent: ELSA
Crawl-delay: 30
# Codium (코드 자동화 AI) — Git 연동
User-agent: Codium
Crawl-delay: 30
# TabNine (VSCode 기반 코딩 AI)
User-agent: TabNine
Crawl-delay: 30
# Vaiv (국내 AI 스타트업, NLP 서비스)
User-agent: Vaiv
Crawl-delay: 30
# Bagoodex (출처 불명, 데이터 수집 크롤러 추정)
User-agent: Bagoodex
Crawl-delay: 30
# You.com의 YouChat AI 봇
User-agent: YouChat
Crawl-delay: 30
# 중국 기반 iAsk AI 검색/QA 봇
User-agent: iAsk
Crawl-delay: 30
# Komo.ai — 개인정보 중심 AI 검색
User-agent: Komo
Crawl-delay: 30
# Hix AI — 콘텐츠 생성 특화 AI
User-agent: Hix
Crawl-delay: 30
# ThinkAny — ChatGPT 기반 AI 플랫폼
User-agent: ThinkAny
Crawl-delay: 30
# Brave 검색 엔진 기반 AI 요약/검색
User-agent: Brave
Crawl-delay: 30
# Lilys — AI 추천 엔진/챗봇 추정
User-agent: Lilys
Crawl-delay: 30
# Sidetrade Indexer Bot — AI 영업 CRM 기반 크롤러
User-agent: Sidetrade Indexer Bot
Crawl-delay: 30
# Common Crawl 기반 AI 학습 봇
User-agent: CCBot
Crawl-delay: 30
# 추후 사용자 정의 AI 크롤러 등록용 placeholder
User-agent: AI-Bot-Name
Crawl-delay: 30
# ====================================
# 🧠 기타 주요 AI/웹 크롤러 (이전에 추가한 것 포함)
# ====================================
# Anthropic의 Claude 공식 크롤러
User-agent: ClaudeBot
Crawl-delay: 30
# Claude의 웹 전용 크롤러
User-agent: Claude-Web
Crawl-delay: 30
# Google의 AI 학습용 크롤러
User-agent: Google-Extended
Crawl-delay: 30
# Google 기타 Crawlers
User-agent: GoogleOther
Crawl-delay: 30
# Google Search Console 검사 도구 크롤러
User-agent: Google-InspectionTool
Crawl-delay: 30
# Google Cloud Vertex AI 크롤러
User-agent: Google-CloudVertexBot
Crawl-delay: 30
# DuckDuckGo의 AI 요약 지원 봇
User-agent: DuckAssistBot
Crawl-delay: 30
# 웹 페이지를 구조화된 데이터로 바꾸는 Diffbot
User-agent: Diffbot
Crawl-delay: 30
# Kagi 검색엔진의 고급 AI 요약 크롤러
User-agent: Teclis
Crawl-delay: 30
# ====================================
# 🔍 기타 불필요한 크롤러들 - 딜레이만
# ====================================
# 중국 검색엔진 Baidu - 국내 사이트엔 불필요
User-agent: Baiduspider
Crawl-delay: 300
# 📊 마케팅 분석/광고 관련 봇 - 과도한 트래픽 유발 가능
User-agent: BomboraBot
Crawl-delay: 300
User-agent: Buck
Crawl-delay: 300
User-agent: startmebot
Crawl-delay: 300
# ==============================
# ❌ 완전 차단이 필요한 크롤러
# ==============================
# 🦾 백링크 분석 툴들 - 모든 페이지 긁어감
User-agent: MJ12bot
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
# 🛑 중국/러시아/광고용 등 트래픽 & 정보 분석용 봇 차단
User-agent: PetalBot
Disallow: /
User-agent: MediaMathbot
Disallow: /
User-agent: Bidswitchbot
Disallow: /
User-agent: barkrowler
Disallow: /
User-agent: DataForSeoBot
Disallow: /
User-agent: DotBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CensysInspect
Disallow: /
User-agent: rss2tg bot
Disallow: /
User-agent: proximic
Disallow: /
User-agent: Yandex
Disallow: /
User-agent: MauiBot
Disallow: /
User-agent: AspiegelBot
Disallow: /
Sitemap: https://사이트주소.com/sitemap_index.xml
robots.txt Management Tips
- Utilize the robots.txt inspection feature in Google Search Console
- When server traffic spikes, check crawl logs and immediately register new bots
- Even static pages can crash your server if bots scrape them… Always monitor them
