Projects

2005-09-07

Asian Language Resource Network: Mission Statement

Mission

The mission of the ALRN (Asian Language Resource Network) is to create a network of qualified Asian partners to specify and support the development of high priority Language Resources (LRs) for Asian Languages in a systematic, standards-driven, collaborative and learning context. The project will focus on identifying the state of the art of LRs in the region, assessing priority requirements through consultations with language research, industry and communication players, and establishing a protocol and standards for developing a LR Network for the languages spoken in the region.

Background

According to the UNESCO recommendation concerning the “Promotion and Use of Multilingualism and Universal Access to Cyberspace” (Adopted by the UNESCO General Conference at its 32nd session (October 2003), the fields of Human Language Technology (HLT) have identified Language Resources (LRs) as the first area to be networked, on the basis of consultations and collocations involving research institute, individual researchers, industry, service providers, etc. LRs are in fact today unanimously recognized as a central component of the linguistic infrastructure, necessary for the development of HLT applications and products, and therefore for industrial development. The availability of adequate LRs for as many languages as possible and, in particular, of multilingual LRs, is a necessary condition for the development of a truly multilingual Information Society. The availability of LRs is also a sensitive issue, touching directly the field of linguistic and cultural identity, and is a crucial precondition for the participation of the citizens speaking this language, in the Information Society.
In response to the UNESCO action agenda in relating to enabling, fostering, facilitating, increasing and enlarging the multilingual resourses and global access in cyberspace, Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) decided to give 3 year financial support to initiate “ALRN” project (Asian Language Resource Network).

Goal

During 3 year project period (July 2005 – March 2008), expected tasks can be divided into the following categories.

1. Qualify, collaborate and support leading LR actors in Asian region to form a network. This would establish a strong links with such institutions to exchange of regular LR information among all stakeholders by disseminating a regular information newsletter, ensuring information feeds to existing networks and information sources, and by holding an international workshop on current activities and future orientations in Asian language resources creation and management.

2. Identifying and fostering possible synergies and co-operation among such institution to promote the compatibility and interoperability of their results, thus facilitating the successful transfer of technologies and tools among languages and the construction of multilingual Language Resources.

3. Develop and disseminate a ‘map’ of south and south-east Asian stakeholders, national and cross-border projects, and existing language resources and processing tools addressing the existing linguistic diversity and language digital divide in the region.

4. Survey and analyze the key strengths, weaknesses, opportunities and threats to the development of Asian and other language resources in the region in collaboration with “Language Observatory Project”.

5. Identifying a standard set of minimum resources and tools per language necessary for carrying out all further research and training on language resource development to overcome language digital divide.

6. Create a website for collecting and disseminating of information on Asian and other regional/local language resources, tools and technologies. Information regarding Asian tracks, sessions or sub-event of major language processing and language resource conferences in the fields of Language Digital Divide, Language Diversity, Language Information Processing and Speech and Text Processing will be linked.

Links with International Resource Centres

Identify international resource centers of excellence in Asian languages already established, and build up collaborative links between such organizations (as listed below and not limited to) to identify, distribute, collect, validate, improve and standardize the production of LRs in Asian languages.

o ELRA, European Language Resources Association
http://www.elra.info/

o GSK, Gengo Shigen Kyoukai (Language Resource Association of Japan)

o LDC, Linguistic Data Consortium
http://www.ldc.upenn.edu/

o PAN-Localization
http://www.panl10n.net/english/about-pan.htm

o LRC, Localization Research Center
http://www.localisation.ie

o IPA, The International Phonetic Association
http://www.arts.gla.ac.uk/ipa/ipa.html

o BNC, British National Corpus
http://www.natcorp.ox.ac.uk/

o COCOSDA, International Committee for Co-ordination and Standardisation of Speech Databases
http://www.cocosda.org/

o ENABLER, European National Activities for Basic Language Resources
http://www.enabler-network.org/

Key Contributors

- Nagaoka University of Technology (NUT), Nagaoka, Niigata , Japan
- Gengo Shigen Kyoukai; Language Resource Association of Japan (GSK), Chairman is Prof. Hozumi Tanaka), Japan
- Tokyo University of Foreign Studies (TUFS) , Japan
- University of Colombo School of Computing (UCSC), Sri Lanka
- University of Technology Malaysia (UTM), University of Malaya (UoM) Malaysia
- NECTEC, NICT, CLR, Thammasat University, Thailand
- Technology Development for Indian Languages (TDIL), India
- Myanmar Computer Federation (MCF), Myanmar

Language Resources (LRs)

The term language resources refers to a set of speech or language data and descriptions in machine readable form, used e.g. for building, improving or evaluating natural language and speech algorithms or systems, or, as core resources for the software localization and language services industries, for language studies, electronic publishing, international transactions, subject-area specialists and end users. Examples of language resources are written and spoken corpora, computational lexicons, terminology databases, speech collection, semantic repertories, annotated textual corpora, spoken corpora for speech applications, annotated spoken dialogues, formal repertories of reusable grammatical knowledge, multimodal resources, software frameworks to ease integration of Language Resources, and evaluation methods for Language Resources. Basic software tools are also important for the acquisition, preparation, collection, management, customization and use of language and other resources.

Up Coming Events

• 1st Steering Committee Meeting (September, 2005), Nagaoka University of Technology (NUT), Nagaoka, Japan
• 1st Workshop/Symposium on Asian Language Resources Network, International Joint Committee on Natural Language Processing (IJCNLP) (14th October, 2005), Jeju Island, Korea.

September 8, 2005

Yoshiki MIKAMI, NagaokaUniversity of Japan
Turrence Nandasara, University of Colombo, Sri Lanka


12:08:04 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=366: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2005-01-29

ONGOING PROJECTS

Development of Systematic Indicators for ICT Policy Planning & Review

UN General assembly adopted "Millenium Development Goals" in September 2000. It is a integrated system of policy goals and indicators comprising of 8 policy goals/18 targets/48 indicators. This project is to develop a similar sytematic and integrated goals & indicators for ICT policy planning and review.

SOURCE: Grants-in-aid for Scientific Research (KAKENHI Scientific Research(B)), 2005-2008.


Design of collating weights for Universal Character Sets (UCS)

Sort orders vary culture to culture. Authentic dictionaries or national level language commissions have defined a standard lexicographic order - a sequence of word listings in dictionary, phonebook, etc. The rules are, however, complex and are difficult especially for non-native speakers to understand. In response to this difficulty, generalized description of sort orders have been developed along with the spreading usage of multi-lingual applications. Unicode Standard (1991), Canadian standard CSA Z 243.4.1 (1992), Japanese standard JIS X 4061 (1996) are few of those efforts. The project try to develop:

1. A mathematical approach for the description of lexicographic orders by employing a concept of Partially Ordered Set (POSET) and Complete Ordered Set (COSET).

2. Sophisticated design rlules for allocation of collating weights for Universal Character Sets based on Finite State Automaton (FSA) model.

SOURCE: Grants-in-aid for Scientific Research (KAKENHI Exploratory Research), 2004-2006.


The Language-Observatory Project

The Language-Observatory project is a project funded by Japanese government, through Japan Science and Technology Agency(JST)'s RISTEX Program and supported by UNESCO. The project was officially launched in 2004, in conjunction with the International Mother Language Day of the year.
Technically, the project is to produce a language profile such as language/script usage, character set/encoding scheme over the Internet. This supports the objectives which are to raise public awareness in Digital Divide among Languages and to encourage support to the processing of endangered languages in the cyberspace. We share the same concerns with UNESCO on this point when stressing on the importance of "the preservation of a balanced use of languages in cyberspace".

SOURCE: Research Institute for Science and Technology for Society (RISTEX), Japan Science and Technology Agency (JST), 2003-2006


e-Government in Asia

Development of the web resource to provide updated and analytical descriptions of e-Government policies and practices in Asian region. The project is a joint effort between GII laboratory and the Center of International Cooperation for Computerization(CICC).


eSAFE

The project is t develop a e-learning courseware to offer safety management subjects.



Completed Projects



08:54:07 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=124: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments