Language Observatory

2006-11-18

Ethiopic search engine development

members of Dr. Solomon Atnafu's laboratory, Addis Ababa University
Our partner in Ethiopia, Dr. Solomon Atnafu of Addis Ababa University (AAU), is encouraging one of his master course students to develop an Ethiopic search engine. He has been contacting with us to provide LOP's language identification module to support his team's development work.

During my recent visit to Ethiopia, myself and Dr. Kodama had a discsussion with his team at his laboratory (see photo).

What his team is doing are;
1) Collect UTF-8 encoded Ethiopic pages using existing search engines
2) Using above result as a seed-URL, collect Ethiopic pages by their own crawler
3) Use our language identification module (LIM) to screen out UTF-8 Ethiopic pages
4) Index above pages and save them into database

While they are recognizing the existence of local proprietary encodings of Ethiopic text, they have no plan ro collect those pages because so many different such encodings are used, and it seems almost difficult to convert all of them into UTF-8.

I am really hoping their success.

02:14:51 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=778: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-03-27

Photos of LO/ALRN Workshop

Library Tour
Library Tour
Mikami Sensei describing the Chuetsu Earthquake (2004/10/23) after effect by showing a pictorial of damaged structures on houses, roads etc.
From Left: Baden Hughes, Valaxay Dalaloy, Timothy Baldwin, Ali Selamat, Chea Sok Huor, Yoshiki Mikami, Chew Yew Choong, Virach Sornlertlamvanich & Canasai Kruengkrai (hardly visible)

Server Room Tour
Server Clusters Recently installed new server clusters as addition to the first cluster.
17:09:00 - zaidi - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=613: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-03-26

Non-UDHR Training Texts

Although our language idenfication module (LIM) is using UDHR text as a training data, several language versions of UDHR is not available and we have to employ other texts in replace of UDHR. Our partners have collected such traning data for this purpose. The following list shows such set of texts.

LanguageScriptEncodingName of contributorBytesDate
KazakhCyrillicUTF-8Ahmed Tarcan15,5012006/03/24
KirgizCyrillicwindows-1251Ahmed Tarcan11,4852006/03/24
TatarLatinwindows-1254Ahmed Tarcan11,6732006/03/24
TurkmenLatinwindows-1250Ahmed Tarcan40,1092006/03/24
UygurcheLatinISO 8859-9Ahmed Tarcan23,3422006/03/24
UzbekCyrillicwindows-1251Ahmed Tarcan21,2442006/03/24
UzbekLatinwindows-1251Ahmed Tarcan19,6592006/03/24

19:38:07 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=611: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2005-09-01

International Advisors

Name of AdvisorAffiliationLocation
Claudio MenezesUNESCOParis, France
Paul HectorUNESCOAddis Ababa, Ethiopia
Adama SamassekouACALANBamako, Mali
Emmanuel SagaraACALANBamako, Mali
David DalbyLinguasphere ObservatoryWales, UK
Debbie GarsideLinguasphere ObservatoryWales, UK
Om VikasTDIL, Ministry of ITNew Delhi, India
Ahmad Zaki Abu BakarUniversiti Teknologi Malaysia (UTM)Johor Bahru, Malaysia
Ali SelamataUTMJohor Bahru, Malaysia
Virach SornlertlamvanichTCL/NICTBangkok, Thailand
Canasai KruengkraiTCL/NICTBangkok, Thailand
Thatsanee CharoenpornTCL/NICTBangkok, Thailand
Timothy BaldwinThe University of Melbourne (UniMelb)Melbourne, Australia
Baden HughesUniMelbMelbourne, Australia
Chea Sok HuorPAN LocalizationPhnom Penh, Cambodia
Valaxay DalaloyScience Technology & Environment AgencyVientiane, Lao PDR
S.T. NandasaraUniversity of ColomboColombo, Sri lanka
Sebastiano VignaMilano UniversityMilano, Italy
Paolo BoldiMilano UniversityMilano, Italy
Massimo SantiniMilano UniversityMilano, Italy
Michael StackInternet ArchiveSan Fransisco, USA
Gordon MohrInternet ArchiveSan Fransisco, USA
Hammam RizaIPTEKnet, BPPTJakarta, Indonesia
Hajime OhiwaKeio UniversityFujisawa, Japan
Ikechukwu AnyanwuPlacade JapanNagaoka, Japan
Kozo IshizakiNUTNagaoka, Japan
Ario OhsatoNUTNagaoka, Japan
Mitsuru KanoNUTNagaoka, Japan
Masahiro IwahashiNUTNagaoka, Japan
Takashi YukawaNUTNagaoka, Japan
Kazuhide YamamotoNUTNagaoka, Japan
23:29:00 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=484: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

Partners

Ohiwa Laboratory - CreW
Keio University, Shonan-Fujisawa Campus(SFC), Japan

Research Institute for Language and Culture of Asia and Africa
Tokyo University of Foreign Studies (TUFS), Tokyo, Japan

Fakulti Sains Komputer & Sistem Maklumat (FSKSM)
Universiti Teknologi Malaysia (UTM), Johor, Malaysia

University of Miskolc, Miskolc, Hungary

The Laboratory for Web Algorithmics(LAW)
Dipartimento di Scienze dell'Informazione
Università degli Studi di Milano(USM), Milano, Italy

Linguasphere Observatory, Wales, UK

Yamana Laboratory
Deaprtment of Engineering, Waseda University, Tokyo, Japan

Thai Computational Linguistics Laboratory (TCL), Bangkok, Thailand

Technology Development of Indian Languages (TDIL)
Department of Information Tecnology, New Delhi, India

Excel Solutions, Singapore

Placade Japan, Co., Nagaoka, Japan

Numerics Inc., Fujisawa, Japan

CT Solutions, Tokyo, Japan

Center of International Cooperation for Computerization, Tokyo, Japan

Saora Inc., Cupertino, CA, U.S.A. & Yokohama, Japan

Kumari Co. Ltd. Musashino, Tokyo, Japan

Communication & Information, UNESCO, Paris, France

Japan Science and Technology Agency (JST)

01:17:00 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=456: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments