Crawl experiment: Asia-NoCJK web crawling
CCTLD domain :
ae
af
az
bd
bh
bn
bt
cy
id
il
in
iq
ir
jo
kg
kh
kw
kz
la
lb
lk
mm
mn
mv
my
np
om
pg
ph
pk
ps
qa
sa
sg
sy
th
tj
tm
tp
tr
uz
vn
ye
System locks : 22-40
Max depth : 8
Max URLs per host : 1,000
URL delay : 10,000 ms
Crawler name : UbiCrawler/v0.4beta (http://gii.nagaokaut.ac.jp/~ubi/)
Contact e-mail : s077003@ics . n a g a o k a u t . a c . j p
For web-master. To stop LOP's crawling :
UbiCrawler supports the Robot Exclusion Standard. if you want to exlcude your site from being crawled by UbiCrawler see The Web Robots Pages.
Briefly, you can put this robots.txt file at the root of the web server you want to exclude from the crawling.
To monitor network traffic : http://gii2.nagaokaut.ac.jp/~ycchew/php/phpViewRrdGraph.php?rrdgraph=netTraffic&duration=day&btnSubmit=Submit
General status of Asia-NoCJK crawling can be view at here.