Language Observatory
2006-11-18
Ethiopic search engine development
Our partner in Ethiopia, Dr. Solomon Atnafu of Addis Ababa University (AAU), is encouraging one of his master course students to develop an Ethiopic search engine. He has been contacting with us to provide LOP's language identification module to support his team's development work.
During my recent visit to Ethiopia, myself and Dr. Kodama had a discsussion with his team at his laboratory (see photo).
What his team is doing are;
1) Collect UTF-8 encoded Ethiopic pages using existing search engines
2) Using above result as a seed-URL, collect Ethiopic pages by their own crawler
3) Use our language identification module (LIM) to screen out UTF-8 Ethiopic pages
4) Index above pages and save them into database
While they are recognizing the existence of local proprietary encodings of Ethiopic text, they have no plan ro collect those pages because so many different such encodings are used, and it seems almost difficult to convert all of them into UTF-8.
I am really hoping their success.
02:14:51 -
Mikami -
mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=778: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed
No comments
2006-03-27
Photos of LO/ALRN Workshop
Library Tour

Mikami Sensei describing the Chuetsu Earthquake (2004/10/23) after effect by showing a pictorial of damaged structures on houses, roads etc.
From Left: Baden Hughes, Valaxay Dalaloy, Timothy Baldwin, Ali Selamat, Chea Sok Huor, Yoshiki Mikami, Chew Yew Choong, Virach Sornlertlamvanich & Canasai Kruengkrai (hardly visible)
Server Room Tour

Recently installed new server clusters as addition to the first cluster.
17:09:00 -
zaidi -
mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=613: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed
No comments
2006-03-26
Non-UDHR Training Texts
Although our language idenfication module (LIM) is using UDHR text as a training data, several language versions of UDHR is not available and we have to employ other texts in replace of UDHR. Our partners have collected such traning data for this purpose. The following list shows such set of texts.
| Language | Script | Encoding | Name of contributor | Bytes | Date
|
| Kazakh | Cyrillic | UTF-8 | Ahmed Tarcan | 15,501 | 2006/03/24
|
| Kirgiz | Cyrillic | windows-1251 | Ahmed Tarcan | 11,485 | 2006/03/24
|
| Tatar | Latin | windows-1254 | Ahmed Tarcan | 11,673 | 2006/03/24
|
| Turkmen | Latin | windows-1250 | Ahmed Tarcan | 40,109 | 2006/03/24
|
| Uygurche | Latin | ISO 8859-9 | Ahmed Tarcan | 23,342 | 2006/03/24
|
| Uzbek | Cyrillic | windows-1251 | Ahmed Tarcan | 21,244 | 2006/03/24
|
| Uzbek | Latin | windows-1251 | Ahmed Tarcan | 19,659 | 2006/03/24
|
19:38:07 -
Mikami -
mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=611: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed
No comments
2005-09-01
International Advisors
| Name of Advisor | Affiliation | Location |
| Claudio Menezes | UNESCO | Paris, France |
| Paul Hector | UNESCO | Addis Ababa, Ethiopia |
| Adama Samassekou | ACALAN | Bamako, Mali |
| Emmanuel Sagara | ACALAN | Bamako, Mali |
| David Dalby | Linguasphere Observatory | Wales, UK |
| Debbie Garside | Linguasphere Observatory | Wales, UK |
| Om Vikas | TDIL, Ministry of IT | New Delhi, India |
| Ahmad Zaki Abu Bakar | Universiti Teknologi Malaysia (UTM) | Johor Bahru, Malaysia |
| Ali Selamata | UTM | Johor Bahru, Malaysia |
| Virach Sornlertlamvanich | TCL/NICT | Bangkok, Thailand |
| Canasai Kruengkrai | TCL/NICT | Bangkok, Thailand |
| Thatsanee Charoenporn | TCL/NICT | Bangkok, Thailand |
| Timothy Baldwin | The University of Melbourne (UniMelb) | Melbourne, Australia |
| Baden Hughes | UniMelb | Melbourne, Australia |
| Chea Sok Huor | PAN Localization | Phnom Penh, Cambodia |
| Valaxay Dalaloy | Science Technology & Environment Agency | Vientiane, Lao PDR |
| S.T. Nandasara | University of Colombo | Colombo, Sri lanka |
| Sebastiano Vigna | Milano University | Milano, Italy |
| Paolo Boldi | Milano University | Milano, Italy |
| Massimo Santini | Milano University | Milano, Italy |
| Michael Stack | Internet Archive | San Fransisco, USA |
| Gordon Mohr | Internet Archive | San Fransisco, USA |
| Hammam Riza | IPTEKnet, BPPT | Jakarta, Indonesia |
| Hajime Ohiwa | Keio University | Fujisawa, Japan |
| Ikechukwu Anyanwu | Placade Japan | Nagaoka, Japan |
| Kozo Ishizaki | NUT | Nagaoka, Japan |
| Ario Ohsato | NUT | Nagaoka, Japan |
| Mitsuru Kano | NUT | Nagaoka, Japan |
| Masahiro Iwahashi | NUT | Nagaoka, Japan |
| Takashi Yukawa | NUT | Nagaoka, Japan |
| Kazuhide Yamamoto | NUT | Nagaoka, Japan |
23:29:00 -
Mikami -
mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=484: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed
No comments
Partners
Ohiwa Laboratory - CreW
Keio University, Shonan-Fujisawa Campus(SFC), Japan
Research Institute for Language and Culture of Asia and Africa
Tokyo University of Foreign Studies (TUFS), Tokyo, Japan
Fakulti Sains Komputer & Sistem Maklumat (FSKSM)
Universiti Teknologi Malaysia (UTM), Johor, Malaysia
University of Miskolc, Miskolc, Hungary
The Laboratory for Web Algorithmics(LAW)
Dipartimento di Scienze dell'Informazione
Università degli Studi di Milano(USM), Milano, Italy
Linguasphere Observatory, Wales, UK
Yamana Laboratory
Deaprtment of Engineering, Waseda University, Tokyo, Japan
Thai Computational Linguistics Laboratory (TCL), Bangkok, Thailand
Technology Development of Indian Languages (TDIL)
Department of Information Tecnology, New Delhi, India
Excel Solutions, Singapore
Placade Japan, Co., Nagaoka, Japan
Numerics Inc., Fujisawa, Japan
CT Solutions, Tokyo, Japan
Center of International Cooperation for Computerization, Tokyo, Japan
Saora Inc., Cupertino, CA, U.S.A. & Yokohama, Japan
Kumari Co. Ltd. Musashino, Tokyo, Japan
Communication & Information, UNESCO, Paris, France
Japan Science and Technology Agency (JST)
01:17:00 -
Mikami -
mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=456: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed
No comments