AI SEO Optimizer Ekibi
SEO & GEO Uzmanı
AI platformları, GEO optimizasyonu ve dijital pazarlama konusunda uzman. Blog yazılarında yapay zeka arama motorları için içerik stratejileri ve teknik SEO uygulamaları paylaşır.
AI crawlerlar site trafiğinin %20'sine ulaştı. GPTBot %305 büyüdü. Robots.txt ile allow/block, crawl-delay optimization, rate limiting ve server yükü yönetimi ile AI botları nasıl kontrol edersiniz?
AI SEO Optimizer Ekibi
Yazar
SEO & GEO Uzmanı
AI platformları, GEO optimizasyonu ve dijital pazarlama konusunda uzman. Blog yazılarında yapay zeka arama motorları için içerik stratejileri ve teknik SEO uygulamaları paylaşır.
Google ve Microsoft, Mart 2025'te LLM'lerin schema markup kullandığını doğruladı. ChatGPT, Perplexity, Gemini ve Claude için schema optimizasyonu nasıl yapılır? JSON-LD implementasyonu, entity tanıma ve knowledge graph entegrasyonu.
Claude, peer-reviewed content %89 daha fazla önceliklendiriyor. Academic/technical content favorisi. Balanced perspectives, nuanced analysis ve citation transparency ile Claude'da nasıl görünürsünüz? ClaudeBot optimization rehberi.
2025'te AI crawlerlar, Googlebot trafiğinin %20'sine ulaştı.
GPTBot (OpenAI): Mayıs 2024'ten Mayıs 2025'e %305 growth (share: %5 → %30).
Challenge: AI botları allow ederek citation opportunity kazanın, ama server yükünü manage edin.
AI bot yönetimi ile beraber platform-spesifik optimizasyon stratejileri için teknik SEO ve AI platformlar rehberine bakabilirsiniz.
AI bot landscape (2025):
Bu rehberde, AI botları optimize etmenin, robots.txt configuration'ın, rate limiting stratejilerinin ve server resource management'ın best practices'lerini derinlemesine inceleyeceğiz.
Purpose: Model training + knowledge base.
User-agent:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)
Crawl frequency: 2-4x/week (active sites için).
Impact: ChatGPT citations, knowledge base updates.
ChatGPT özelinde citation optimizasyon için ChatGPT citation optimizasyon rehberine bakabilirsiniz.
Purpose: Real-time web search (SearchGPT feature).
User-agent:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
Crawl frequency: Real-time (query-driven).
Impact: SearchGPT citation potential.
Purpose: Browser/API-based user interactions.
User-agent:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 ChatGPT-User
Crawl frequency: On-demand (user query triggers).
Purpose: Real-time answer engine indexing.
User-agent:
Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/bot)
Crawl frequency: 2-7x/week (fresh content için daha sık).
Impact: Perplexity citation (real-time search results).
Perplexity özelinde citation stratejileri için Perplexity AI citation stratejileri rehberine bakabilirsiniz.
Note: Perplexity controversy - stealth crawlers kullandığı report edildi (robots.txt bypass etmek için). Bu davranış, community backlash sonrası azaldı ama rate limiting hala önemli.
Purpose: Claude model training + knowledge base.
User-agent:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claude.ai/bot)
Crawl frequency: 1-2x/week (daha az agresif).
Impact: Claude citations, technical/academic queries için.
Claude özelinde citation stratejileri ve teknik içerik optimizasyonu için Claude AI citation optimizasyon rehberine bakabilirsiniz.
Characteristic: Peer-reviewed content, .edu domains önceliklendiriyor.
Purpose: Gemini + Bard model training (Google Search indexing'den ayrı).
User-agent:
Google-Extended
Crawl frequency: Weekly (GoogleBot'tan bağımsız).
Impact: Gemini citations, Google AI Overviews.
Critical: Google-Extended block etmek, Google Search ranking etkilemiyor (ayrı crawler).
Purpose: Apple Intelligence training.
User-agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 Applebot-Extended/1.0
Crawl frequency: Monthly (daha az agresif).
Impact: Apple Intelligence (Siri, iOS AI features).
Purpose: ByteDance AI products (TikTok, Doubao LLM).
User-agent:
Mozilla/5.0 (compatible; Bytespider; [email protected])
Crawl frequency: Daily (çok agresif).
Impact: TikTok content recommendations, Doubao LLM (China).
Caution: Bytespider, en aggressive crawler'lardan biri. Rate limiting critical.
Use case: AI citations maximize etmek istiyorsunuz, server capacity yeterli.
robots.txt:
txt# OpenAI (ChatGPT + SearchGPT) User-agent: GPTBot Allow: / Crawl-delay: 1 User-agent: OAI-SearchBot Allow: / Crawl-delay: 1 User-agent: ChatGPT-User Allow: / # Perplexity AI User-agent: PerplexityBot Allow: / Crawl-delay: 1 # Anthropic (Claude) User-agent: ClaudeBot Allow: / Crawl-delay: 2 # Google (Gemini) User-agent: Google-Extended Allow: / Crawl-delay: 1 # Apple Intelligence User-agent: Applebot-Extended Allow: / Crawl-delay: 3 # ByteDance (if targeting China/TikTok) User-agent: Bytespider Allow: / Crawl-delay: 2
Avantajları:
Dezavantajları:
Use case: Önemli AI platformları allow, daha az relevant olanları block.
robots.txt:
txt# Priority AI Bots (Allow) User-agent: GPTBot Allow: / Crawl-delay: 1 User-agent: OAI-SearchBot Allow: / Crawl-delay: 1 User-agent: PerplexityBot Allow: / Crawl-delay: 1 User-agent: ClaudeBot Allow: / Crawl-delay: 2 User-agent: Google-Extended Allow: / Crawl-delay: 1 # Lower Priority (Block or Aggressive Rate Limit) User-agent: Applebot-Extended Disallow: / User-agent: Bytespider Disallow: / # Or: Crawl-delay: 5 (allow but slow)
Avantajları:
Dezavantajları:
Use case: Proprietary content, paywall, AI training opt-out.
robots.txt:
txt# Block all AI bots User-agent: GPTBot Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: Bytespider Disallow: /
Use cases:
Note: Block etmek, AI citations completely eliminate ediyor.
Use case: Public content allow, private/premium content block.
robots.txt:
txtUser-agent: GPTBot Allow: /blog/ Allow: /public/ Disallow: /premium/ Disallow: /members-only/ Crawl-delay: 1 User-agent: PerplexityBot Allow: /blog/ Allow: /docs/ Disallow: /customer-portal/ Crawl-delay: 1
Use cases:
Definition: Minimum seconds between successive requests from same bot.
Example:
txtUser-agent: GPTBot Crawl-delay: 2
Meaning: GPTBot, her request arasında minimum 2 saniye bekleyecek.
| Bot | Recommended Crawl-Delay | Reasoning |
|---|---|---|
| GPTBot | 1-2 | Moderate frequency, high value |
| OAI-SearchBot | 1 | Real-time search, fast needed |
| PerplexityBot | 1-2 | Frequent crawls, balance needed |
| ClaudeBot | 2-3 | Less frequent, can be slower |
| Google-Extended | 1 | Google infrastructure, can handle fast |
| Bytespider | 3-5 | Very aggressive, slow down |
Test data (1000-page site):
| Crawl-Delay | Pages/Hour | Server Load | Impact on Citations |
|---|---|---|---|
| 0 (no delay) | 3600 | Very High | Fastest indexing |
| 1 second | 3600 | Moderate | Fast (recommended) |
| 2 seconds | 1800 | Low | Acceptable |
| 5 seconds | 720 | Very Low | Slow (risky) |
Recommendation: 1-2 seconds = sweet spot (server protection + reasonable crawl speed).
Tactic: Server load'a göre crawl-delay ayarla.
Implementation (nginx + Lua):
nginxlocation / { access_by_lua_block { local user_agent = ngx.var.http_user_agent local server_load = get_server_load() -- Custom function if user_agent:match("GPTBot") then if server_load > 80 then ngx.sleep(5) -- High load: 5s delay elseif server_load > 50 then ngx.sleep(2) -- Medium load: 2s delay else ngx.sleep(1) -- Low load: 1s delay end end } }
Avantajları: Adaptive (low traffic = fast crawl, high traffic = slow crawl).
Robots.txt limitations:
Rate limiting benefits:
Rule örneği (GPTBot için):
CloudFlare Dashboard → Security → WAF → Rate Limiting Rules:
Rule name: GPTBot Rate Limit
If incoming requests match:
- User Agent contains "GPTBot"
Then:
- Allow 60 requests per minute
- Block for 10 minutes if exceeded
Advanced rule (multiple bots):
Rule: AI Bots Aggregate Rate Limit
If incoming requests match:
- User Agent matches regex: (GPTBot|PerplexityBot|ClaudeBot|Google-Extended)
Then:
- Allow 120 requests per minute (total)
- Challenge (CAPTCHA) if exceeded
nginx.conf:
nginx# Define rate limit zones limit_req_zone $binary_remote_addr zone=gptbot:10m rate=60r/m; limit_req_zone $binary_remote_addr zone=perplexitybot:10m rate=60r/m; server { location / { # Apply rate limits based on user agent if ($http_user_agent ~* "GPTBot") { set $bot_limit gptbot; } if ($http_user_agent ~* "PerplexityBot") { set $bot_limit perplexitybot; } limit_req zone=$bot_limit burst=10 nodelay; # ... rest of config } }
Parameters:
.htaccess (mod_ratelimit):
apache<IfModule mod_ratelimit.c> <If "%{HTTP_USER_AGENT} =~ /GPTBot/"> SetOutputFilter RATE_LIMIT SetEnv rate-limit 100 # 100 KB/s limit </If> </IfModule>
Or use mod_evasive (request-based):
apache<IfModule mod_evasive20.c> DOSHashTableSize 3097 DOSPageCount 10 DOSSiteCount 100 DOSPageInterval 1 DOSSiteInterval 1 DOSBlockingPeriod 600 </IfModule>
Monitor these for AI bot impact:
| Metric | Tool | Alert Threshold |
|---|---|---|
| CPU usage | htop, CloudWatch | > 80% |
| Memory usage | free, CloudWatch | > 85% |
| Bandwidth | vnstat, GA4 | > daily budget |
| Request rate | nginx logs, CloudFlare | > 1000 req/min |
| TTFB | Pingdom, GTmetrix | > 500ms |
Nginx access log parsing:
bash# Count requests by bot
awk '{print $12}' /var/log/nginx/access.log | grep -E "(GPTBot|PerplexityBot|ClaudeBot)" | sort | uniq -c | sort -rn
# Output example:
# 1247 GPTBot/1.2
# 892 PerplexityBot/1.0
# 234 ClaudeBot/1.0
# Check bot request distribution over time
grep "GPTBot" /var/log/nginx/access.log | awk '{print $4}' | cut -d: -f1-2 | uniq -c
# Identify most crawled pages
grep "GPTBot" /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20
CloudFlare Dashboard → Analytics → Traffic:
Filters:
Metrics:
Bot traffic GA4'te görünmüyor (bot filtering) ama impact görebilirsiniz:
Segment:
Correlation: Bot crawl artışı → 2-4 hafta sonra referral traffic artışı.
Allow AI bots eğer:
Expected ROI:
Block AI bots eğer:
Use cases:
Block specific paths:
robots.txt:
txtUser-agent: GPTBot Allow: /blog/ Allow: /resources/ Disallow: /customer-data/ Disallow: /api-docs/
Use cases:
Problem: Bazı crawlers, user-agent spoof edebilir (örn. Perplexity controversy).
Server-side detection (nginx + Lua):
lua-- Detect suspicious patterns
local user_agent = ngx.var.http_user_agent
local remote_addr = ngx.var.remote_addr
-- Check if user-agent claims to be GPTBot
if user_agent:match("GPTBot") then
-- Verify IP is from OpenAI ranges
local openai_ips = {"66.249.64.0/19", "66.249.64.1/20"} -- Example ranges
if not ip_in_range(remote_addr, openai_ips) then
ngx.log(ngx.WARN, "Spoofed GPTBot detected: " .. remote_addr)
ngx.exit(403) -- Block
end
end
Official IP ranges:
Note: IP ranges değişebilir. Regularly update edin.
Verify bot legitimacy:
bash# Check GPTBot claim
host 66.249.66.1
# Should return: *.openai.com
# Reverse check
host *.openai.com
# Should match original IP
Automated check (Python):
pythonimport socket
def verify_bot(ip, expected_domain):
try:
hostname = socket.gethostbyaddr(ip)[0]
return expected_domain in hostname
except:
return False
# Usage
is_legit = verify_bot("66.249.66.1", "openai.com")
Example scenario:
Calculation:
Total data/week = 10,000 pages × 500 KB × 3 bots = 15 GB/week
Annual bandwidth = 15 GB × 52 weeks = 780 GB/year
Cost (AWS CloudFront pricing):
Insight: AI bot bandwidth, manageable (typical blog için).
Tactics:
1. Compression (gzip/brotli):
nginxgzip on; gzip_types text/html text/css application/javascript; gzip_comp_level 6;
Impact: %60-70 bandwidth reduction.
2. Conditional requests (ETag):
nginxetag on;
Benefit: Bot re-crawl'da, unchanged content için 304 Not Modified (no data transfer).
3. CDN caching:
nginxlocation ~* \.(jpg|png|css|js)$ { expires 30d; add_header Cache-Control "public, immutable"; }
Benefit: Static assets CDN'den serve (origin server load azalır).
Çoğu case için allow (with rate limiting).
Allow Avantajları:
Block scenarios (rare):
Recommendation: Allow + Crawl-delay: 1-2 + server-level rate limiting.
1-2 saniye optimal.
Test data:
Exception: Bytespider için 3-5 saniye (very aggressive).
Evet, ikisine de allow.
GPTBot:
OAI-SearchBot:
Not: İkisi ayrı crawlers, ayrı ayrı configure edin.
Azaldı ama rate limiting hala önemli.
History:
Best practice: robots.txt allow + server-level rate limit (60-120 req/min).
CloudFlare WAF + Rate Limiting Rules:
Setup:
Monitoring:
Advantage: CloudFlare, IP verification otomatik yapabiliyor (spoofed bots detect eder).
Hayır (bot filtering).
GA4, bots filter ediyor (Settings → Data Filters → Bot Filtering).
But: AI botların impact'ini görebilirsiniz:
Indirect metrics:
Server logs: AI bot activity'yi görmek için nginx/Apache logs analiz edin.
Test data (average B2B SaaS blog):
| Bot | % of AI Bot Traffic | Aggressiveness |
|---|---|---|
| Bytespider | 40-50% | Very High |
| GPTBot | 25-30% | Moderate |
| PerplexityBot | 15-20% | Moderate-High |
| ClaudeBot | 5-10% | Low |
| Google-Extended | 5-10% | Low-Moderate |
Insight: Bytespider = bandwidth hog (özellikle Asia-Pacific sites için).
Recommendation: Bytespider için aggressive rate limiting veya block (TikTok/China market değilse).
AI botları, 2025'te critical opportunity ama doğru management gerekiyor:
Allow (with limits): robots.txt allow + rate limiting (60-120 req/min)
Crawl-delay: 1-2 saniye: Server protection + reasonable crawl speed
Selective blocking: Public content allow, proprietary block
Monitor server impact: CPU, memory, bandwidth, TTFB
Platform priority: GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot > Bytespider
Bandwidth manageable: Typical blog için <$100/year (AI bots için)
Week 1:
Week 2:
Week 3:
Week 4:
Sonraki adım: robots.txt optimize edin, rate limiting kurun, AI bot impact monitor edin.
AI bot erişimini optimize etmeye ek olarak schema markup ile AI görünürlüğü rehberine bakarak crawler'ların içeriğinizi daha iyi anlamasını sağlayabilirsiniz.
İlgili Yazılar: