2008年07月10日

今見ているサイトのrobots.txtに簡単にアクセスできるブックマークレット このエントリーをはてなブックマークに追加

結構違うね。

でも未だに、
http://mdn.mainichi.jp/robots.txt
の効果はなさそうだねえ。
User-agent: *

Disallow: /
上記のよーなrobots.txtを設定するのはサイトを閉鎖する際、トップページだけ表示しておくよーなときが考えられるね。閉鎖しちゃうからもうクローラーさん、さよーならってな具合でね。
ってことはきっとサイト閉鎖の準備なのかもね。

ちなみに、今見ているサイトのrobots.txtに簡単にアクセスするブックマークレットを怪しくつくってみたよ。(bookmarklet作るの久々だな)

今見ているサイトのrobots.txtを表示(右クリックでお気に入りに追加)


robots.txtを見たいページで実行してね。
ちなみにrobots.txtがあればいいんだけどなければ、404になっちゃいます。
The Web Robots Pagesとか読んでみると面白いかもね。

mixiとかのよーなクローズドなソーシャルネットワークの場合は検索対象になるといやーな感じだからrobots.txtは書いておくっぽいね。
参考として、mixiのrobots.txtはこんな感じ。
User-agent: *
Disallow: /add_diary.pl
Disallow: /show_calendar.pl
Disallow: /confirm.pl
Disallow: /confirm_email.pl
Disallow: /invite.pl
Disallow: /join.pl
Disallow: /list_bbs.pl
Disallow: /list_community.pl
Disallow: /list_diary.pl
Disallow: /list_event_member.pl
Disallow: /list_friend.pl
Disallow: /list_member.pl
Disallow: /list_request.pl
Disallow: /logout.pl
Disallow: /manage_friend.pl
Disallow: /mikly.pl
Disallow: /search_diary.pl
Disallow: /regist.pl
Disallow: /reset_password.pl
Disallow: /search.pl
Disallow: /search_album.pl
Disallow: /search_community.pl
Disallow: /search_event.pl
Disallow: /search_review.pl
Disallow: /show_friend.pl
Disallow: /show_intro.pl
Disallow: /show_profile.pl
Disallow: /view_album.pl
Disallow: /view_bbs.pl
Disallow: /view_community.pl
Disallow: /view_diary.pl
Disallow: /view_enquete.pl
Disallow: /view_event.pl
Disallow: /view_item.pl
Disallow: /view_message.pl
Disallow: /banner.pl
Disallow: /list_message.pl
Disallow: /list_review.pl
Disallow: /new_friend_diary.pl
Disallow: /set_cookie.pl
Disallow: /img/
他も調べてみた。
http://www.microsoft.com/robots.txt
# Robots.txt file for http://www.microsoft.com
#

User-agent: *
Disallow: /canada/Library/mnp/2/aspx/
Disallow: /communities/bin.aspx
Disallow: /communities/eventdetails.mspx
Disallow: /communities/blogs/PortalResults.mspx
Disallow: /communities/rss.aspx
Disallow: /downloads/Browse.aspx
Disallow: /downloads/info.aspx
Disallow: /downloads/thankyou.aspx
Disallow: /france/formation/centres/planning.asp
Disallow: /france/mnp_utility.mspx
Disallow: /germany/library/images/mnp/
Disallow: /germany/mnp_utility.mspx
Disallow: /ie/ie40/
Disallow: /info/customerror.htm
Disallow: /info/smart404.asp
Disallow: /intlkb/
Disallow: /isapi/
Disallow: /japan/enable/textview.asp
Disallow: /japan/mnp_utility.mspx
Disallow: /japan/products/library/search.asp
Disallow: /japan/showcase/print/default.aspx
Disallow: /japan/terminology/query.asp
Disallow: /library/errorpages/smarterror.aspx
Disallow: /library/toolbar/3.0/
Disallow: /mac/help.mspx
Disallow: /mnp_utility.mspx
Disallow: /netherlands/mnp_utility.mspx
Disallow: /resources/casestudies/casestudyimageshow.asp
Disallow: /resources/casestudies/CompanyLogoShow.asp
Disallow: /resources/casestudies/ddi/companylogoshow.asp
Disallow: /resources/casestudies/ddi/showfile.asp
Disallow: /resources/casestudies/FindCaseStudyResults.aspx
Disallow: /resources/casestudies/showfile.asp
Disallow: /uk/mnp_utility.mspx
Disallow: /windowsmobile/catalog/
Disallow: /windowsmobile/components/referafriend/getcallbackresult.aspx
Disallow: /france/ie/default.asp?*
Disallow: /mac/help.mspx?*


Sitemap: http://www.microsoft.com/germany/kleinunternehmen/gsitemap.aspx
Sitemap: http://www.microsoft.com/business/success/sitemap.xml
Sitemap: http://www.microsoft.com/downloads/sitemap.asp

http://www.google.co.jp/robots.txt
User-agent: *
Allow: /searchhistory/
Disallow: /news?output=xhtml&
Allow: /news?output=xhtml
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Disallow: /nwshp
Allow: /news?btcid=
Disallow: /news?btcid=*&
Allow: /news?btaid=
Disallow: /news?btaid=*&
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /relcontent
Disallow: /sorry/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /advanced_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /default
Disallow: /m?
Disallow: /m/?
Disallow: /m/lcb
Disallow: /m/search?
Disallow: /wml?
Disallow: /wml/?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=
Disallow: /local?
Disallow: /local_url
Disallow: /froogle?
Disallow: /products?
Disallow: /froogle_
Disallow: /product_
Disallow: /products_
Disallow: /print
Disallow: /books
Disallow: /patents?
Disallow: /scholar?
Disallow: /complete
Disallow: /sponsoredlinks
Disallow: /videosearch?
Disallow: /videopreview?
Disallow: /videoprograminfo?
Disallow: /maps?
Disallow: /mapstt?
Disallow: /mapslt?
Disallow: /maps/stk/
Disallow: /mapabcpoi?
Disallow: /translate?
Disallow: /ie?
Disallow: /sms/demo?
Disallow: /katrina?
Disallow: /blogsearch?
Disallow: /blogsearch/
Disallow: /blogsearch_feeds
Disallow: /advanced_blog_search
Disallow: /reader/
Disallow: /uds/
Disallow: /chart?
Disallow: /transit?
Disallow: /mbd?
Disallow: /extern_js/
Disallow: /calendar/feeds/
Disallow: /calendar/ical/
Disallow: /cl2/feeds/
Disallow: /cl2/ical/
Disallow: /coop/directory
Disallow: /coop/manage
Disallow: /trends?
Disallow: /trends/music?
Disallow: /notebook/search?
Disallow: /musica
Disallow: /musicad
Disallow: /musicas
Disallow: /musicl
Disallow: /musics
Disallow: /musicsearch
Disallow: /musicsp
Disallow: /musiclp
Disallow: /browsersync
Disallow: /call
Disallow: /archivesearch?
Disallow: /archivesearch/url
Disallow: /archivesearch/advanced_search
Disallow: /base/search?
Disallow: /base/reportbadoffer
Disallow: /base/s2
Disallow: /urchin_test/
Disallow: /movies?
Disallow: /codesearch?
Disallow: /codesearch/feeds/search?
Disallow: /wapsearch?
Disallow: /safebrowsing
Disallow: /reviews/search?
Disallow: /orkut/albums
Disallow: /jsapi
Disallow: /views?
Disallow: /c/
Disallow: /cbk
Disallow: /recharge/dashboard/car
Disallow: /recharge/dashboard/static/
Disallow: /translate_c?
Disallow: /s2/profiles/me
Allow: /s2/profiles
Disallow: /s2
Disallow: /transconsole/portal/
Disallow: /gcc/
Disallow: /aclk
Disallow: /cse?
Disallow: /tbproxy/
Disallow: /MerchantSearchBeta/
Disallow: /ime/
Disallow: /websites?
Disallow: /shenghuo/search?
Disallow: /support/forum/search?
http://del.icio.us/robots.txt(Slurp,Googlebot,Teoma,msnbotと名指しで拒否?)
User-agent: *
Disallow: /

User-agent: delicious-thumbnails
Allow: /



User-agent: Slurp
Allow: /
Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss

User-agent: Googlebot
Allow: /
Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss

User-agent: Teoma
Allow: /
Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss

User-agent: msnbot
Allow: /
Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss
おもしろいrobots.txtがあったら教えてね。さらに他のサイトの状況。

robots.txt でいろんなことわかるな。面白れーな。

User-agent: Megalodon
Disallow: /
Sitemap: http://sankei.jp.msn.com/sitemap.xml

User-Agent: *
Disallow: /_REGIST
Disallow: /_TEST
Disallow: /css
Disallow: /images
Disallow: /js
Disallow: /parts
Disallow: /personnel
Disallow: /obituary
#Google Search Engine Robot
User-agent: Googlebot
# Crawl-delay: 10 -- Googlebot ignores crawl-delay ftl
Disallow: /*?
Disallow: /*/with_friends

#Yahoo! Search Engine Robot
User-Agent: Slurp
Crawl-delay: 10
Disallow: /*?
Disallow: /*/with_friends

#Microsoft Search Engine Robot
User-Agent: msnbot
Crawl-delay: 10
Disallow: /*?
Disallow: /*/with_friends

# Every bot that might possibly read and respect this file.
User-agent: *
Disallow: /*?
Disallow: /*/with_friends
User-agent: *
Allow: /searchhistory/
Disallow: /news?output=xhtml&
Allow: /news?output=xhtml
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Disallow: /nwshp
Allow: /news?btcid=
Disallow: /news?btcid=*&
Allow: /news?btaid=
Disallow: /news?btaid=*&
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /relcontent
Disallow: /sorry/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /advanced_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /default
Disallow: /m?
Disallow: /m/?
Disallow: /m/lcb
Disallow: /m/search?
Disallow: /wml?
Disallow: /wml/?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=
Disallow: /local?
Disallow: /local_url
Disallow: /froogle?
Disallow: /products?
Disallow: /froogle_
Disallow: /product_
Disallow: /products_
Disallow: /print
Disallow: /books
Disallow: /patents?
Disallow: /scholar?
Disallow: /complete
Disallow: /sponsoredlinks
Disallow: /videosearch?
Disallow: /videopreview?
Disallow: /videoprograminfo?
Disallow: /maps?
Disallow: /mapstt?
Disallow: /mapslt?
Disallow: /maps/stk/
Disallow: /mapabcpoi?
Disallow: /translate?
Disallow: /ie?
Disallow: /sms/demo?
Disallow: /katrina?
Disallow: /blogsearch?
Disallow: /blogsearch/
Disallow: /blogsearch_feeds
Disallow: /advanced_blog_search
Disallow: /reader/
Disallow: /uds/
Disallow: /chart?
Disallow: /transit?
Disallow: /mbd?
Disallow: /extern_js/
Disallow: /calendar/feeds/
Disallow: /calendar/ical/
Disallow: /cl2/feeds/
Disallow: /cl2/ical/
Disallow: /coop/directory
Disallow: /coop/manage
Disallow: /trends?
Disallow: /trends/music?
Disallow: /notebook/search?
Disallow: /musica
Disallow: /musicad
Disallow: /musicas
Disallow: /musicl
Disallow: /musics
Disallow: /musicsearch
Disallow: /musicsp
Disallow: /musiclp
Disallow: /browsersync
Disallow: /call
Disallow: /archivesearch?
Disallow: /archivesearch/url
Disallow: /archivesearch/advanced_search
Disallow: /base/search?
Disallow: /base/reportbadoffer
Disallow: /base/s2
Disallow: /urchin_test/
Disallow: /movies?
Disallow: /codesearch?
Disallow: /codesearch/feeds/search?
Disallow: /wapsearch?
Disallow: /safebrowsing
Disallow: /reviews/search?
Disallow: /orkut/albums
Disallow: /jsapi
Disallow: /views?
Disallow: /c/
Disallow: /cbk
Disallow: /recharge/dashboard/car
Disallow: /recharge/dashboard/static/
Disallow: /translate_c?
Disallow: /s2/profiles/me
Allow: /s2/profiles
Disallow: /s2
Disallow: /transconsole/portal/
Disallow: /gcc/
Disallow: /aclk
Disallow: /cse?
Disallow: /tbproxy/
Disallow: /MerchantSearchBeta/
Disallow: /ime/
Disallow: /websites?
Disallow: /shenghuo/search?
Disallow: /support/forum/search?
posted by りょーち | Comment(0) | TrackBack(0) | Web周辺技術
この記事へのコメント
コメントを書く
お名前:

メールアドレス:

ホームページアドレス:

コメント:


×

この広告は90日以上新しい記事の投稿がないブログに表示されております。