Language Observatory

2006-05-26

Russian Search Engine

In WWW2006, Edinburgh, Language Observatory members presented two posters:

1. Status of the African Web
2. Geographic Locations of Web Servers under African Domains

On the second day of poster presentation, one of our neighbors is Russian search engine, "Яндекс"(yandex.ru). We have a nice discussion with them and I learned their elegant geo-tagging. Also I found that the search engine is indexing more than three times of Russian pages than Google.

YANDEX LOGO
For example, if you search Google by Russian equivalent of "language observatory", "Языковая обсерватория", then Google returns 19 pages (as of today), but Яндекс returns 51 pages!!. Google is indexing well, but not all. It is needed to search out such language- or region-specific search engines around the world to verify and check the langauge-specific coverage of our crawling.

02:52:57 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=690: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-05-16

An interesting article on .cat TLD realization

As being mentioned by Mikami Sensei here . I took a portion from a paper authored by Peter Gerrand in First Monday entitled "Cultural Diversity in cyberspace: The Catalan campaign to win the new .cat top level domain".

He was then asked: “But doesn’t .cat create a precedent for thousands of minority languages, all wanting a TLD?”

Abril’s response was:

“You can set thresholds, to meet ICANN’s goal of financial viability for a registry. How many minority languages are spoken in more than one country? Of these, how many are written languages? Of these, how many have at least, say, five million speakers? [Catalan is credited with 10 million speakers as a first and second language.] You might find perhaps 15 languages satisfying those criteria. That is not too many to introduce as language–based sTLDs.”

Some ICANN directors were apparently concerned that UNESCO might fund 5,000 new cultural sTLDs. To which Abril’s response was: “What is the problem if someone pays?”
Image of Amadeu Abril from ICANNWIKI
Amadeu Abril from www.icannwiki.org I believe that it was an interesting reading. Please read it!!!. On the other hand, I hope to receive the first hand information from Amadeu Abril himself on the realization of .cat. (Note: An email was sent to him today)
23:44:17 - zaidi - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=688: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-05-14

Catalan community acquired a new domain

Zaidi reported at LOP's Friday seminar that Catalan community has acquired a new top level domain ".cat". The decision was made last year by ICANN [1]. The sponsor of the domain is puntCAT (http://www.puntcat.org/). It is a good news for minority language communities in the world that specific language communities found a way to get their own domain names for their exclusive use.

When a new ICANN policy was announced to accept a request for proposal for such new domains, ten new TLDs were proposed, but only one of them came from language specific sponsors [2].


[1] ICANN Special Meeting of the Board Preliminary Report, 15 September 2005
[2] ICANN Announcement, "Progress in Process for Introducing New Sponsored Top-Level Domains", 19 March 2004.
[3] New sTLD RFP Application by puntCAT, 19 March 2004.

04:11:53 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=684: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

World banknotes gallery

I encountered an interesting site.
Please browse at the site below.
http://www.banknotes.com/images.htm

01:05:00 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=711: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-05-03

JINDAI Script

Jindai Script found at Akagi Shrine, Gunma JINDAI(神代) means "the age of gods" and it was once advocated that Japanese people had used this script to write Japanese language before the arrival of Chinese ideographs. There is a lot of different versions of the script.

The plate above is found in Akagi Shrine at Miyosawa, Gunma Prefecture. Apparently, the script looks quite similar to Korean Hangul. Sound correspondence is also quite resembling with Hangul's system. The plate is annotated to read from right to left and vertically, as "manahitokorononareruyuweyosi".
YOYURENOKOHIMA
SIWERUNAROTONA

Currently the above story is considered as a complete fiction, and the script is not a creation of Japanese people but an imitation of Korean Hangul.
REFERENCE
[1] Jindai moji at Wikipedia
[2] 神代文字 at Wikipedia in Japanese
[3] 鈴木敏幸氏作成,神代文字一覧

15:17:57 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=671: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-03-12

Arabic Calligraphy - Geometrical Kufi

I found a beautiful geometrical Kufi text at the Louvre museum in Paris. Please give a reading of it.

Geometric Kufi Sample
source: Page of the Koran (Sura VI, al-Anam, The Herds), Turkey, circa 1540. The work of the famous calligrapher; Ahmed Karahisari (1469-1556).
Ahmed Karahisari in Turkish Wikipedia.
15:32:34 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=600: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-01-22

Arabic Typewriter History

Carla introduced me one story about this theme. She found a few lines in the book "OUT OF PLACE", an autobiography of Columbia University Professor, Edward W. Said, who was born in Jerusalem, went to school in Cairo, and spent summers in Beirut during his youger days. The lines read:

"With my mother's help he (Said's father) developed--"invented" would not be wrong--the Arabic typewriter with Royal, whose aristocratic American owners, the John Barry Ryans, he came to know quite well."

It was between the fall of 1946 and November 1947. The place was in Cairo, Egypt. The autobiography contains two photos of the typewriter shop in town, but no further information about Arabic typewriter history. My search for this topic continues....

REFERENCE
[1] Ian Frazier, The Typewriter Man, The Atlantic Monthly, November 1997.
[2] Edward W. Said, Out of Place - A Memoir, p.94, Vintage Books, 2000
[3] Royal Typewriter Company's poster with sphinx as background
16:45:21 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=547: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

Divehi and Thaana: About Maldivian scipts

Maldives is famous among Japanese as a destination for honeymooners or as a place for retreat. But not much is known about her history, culture, people, language, etc. Only well-known is the fact that she has beautiful seacoast.

When I wrote a book on the history of character code development in Asian region, I mentioned about the script in Maldives. It is called "Thaana script". When writing the book, however, I misunderstood that Thaana is the another name of Divehi. Recently I noticed my mistake. The fact is that "Divehi script" is used by Maldivian Divehi speakers until early in the 20th century. It is derived from Brahmi-based Indic scripts, and is written from left to right. The shape looks similar to Sinhalese script.

Divehi script sample taken from http://www.maldives-ethnography.com/scripts.html
http://www.maldives-ethnography.com/scripts.html

The phasing out of Divehi script began in the 17-18th century, to be gradually replaced by the modern Thaana script. Thaana script looks like Arabic, and is written from right to left. Actually Thaana script was invented and introduced to accommodate Arabic sentences and words into the Divehi texts.

Thaana Alphabet
ހށނރބޅކއވމފދތލގޏސޑ
ޒޓޔޕޖޗޘޙޚޛޜޝޞޟޠޡޢޣޤޥ
ަާިީުޫެޭޮޯްޱ

Although Thaana script currently got its position in UCS/Unicode, it is not widely used yet. You can find some Thaana script sites on the web. For example, Deapartment of Immigration and Emigration of the Maldivian Government has Divehi page. Whose text, however, is encoded by non-UCS/Unicode encoding. Thaana text is also missing in UDHR translations.

REFERENCE
[1] http://www.maldives-ethnography.com/scripts.html
[2] http://www.maldivesroyalfamily.com)
[3] Thaana at wiki
13:24:15 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=545: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-01-06

Petroglyphs of Bangudae in Korea

Kwon Yeolum introduced me an interesting stone inscriptions found near his home town, Ulsan. It's "petroglyphs". Petroglyphs of Bangudae was first found by The Korean Buddhist Research Institute of Dongguk University in 1971. Hundreds of animals, figures, tools, and unknown objects were inscribed on the stone wall.
See Petroglyph at wiki. You can visit Bangudae at here
Bangudae
18:13:28 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=530: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments

2006-01-01

ZWJ and ZWNJ: How search engines treat them

To understand how search engines treat special functional sgins ZWNJ:ZERO WIDTH NON-JOINER [U+200C] and ZWJ:ZERO WIDTH JOINER [U+200D], I prapared three consonant-conjunct code strings of a Hindi word "असेम्‍बली" with and without ZWNJ, ZWJ, and compared the searched results returned by several search engines.

CASE 1: Consonant + Halant + Consonant
+ + + + + + = = असेम्बली

CASE 2: Consonant + Halant + ZWNJ + Consonant
+ + + + + ZWNJ + + = = असेम्‌बली

CASE 3: Consonant + Halant + ZWJ + Consonant
+ + + + + ZWJ + + + = असेम्‍बली

Below is the search results of these three CASEs (hit counts are as of 2006/01/01). The treatment differs between search engines. Google treats CASE 3 (ZWJ) as same as CASE 1 and treats CASE 2 (ZWNJ) as an illegal string, while other search engines treat them differently. AlltheWeb, msn and Yahoo treat CASE 3 (ZWJ) diferently from CASE 1 but don't distinguish CASE 2 (ZWNJ) and CASE 3 (ZWJ).

CASEconsonant-conjunct formGoogleallthewebmsnYahoo
CASE 1 cons.+halant+cons. 121 pages 42 pages 17 pages 41 pages
CASE 2 cons.+halant+ZWNJ+cons, null 1 page 1 page 1 page
CASE 3 cons.+halant+ZWJ+cons. 121 pages 1 page 1 page 1 page

In either case, "dual-spelling problem" arises. CASE 1 and CASE 3 (ZWJ) look same but are spelled differently.

15:29:48 - Mikami - mySQL error with query SELECT COUNT(*) FROM nucleus_comment as c WHERE c.citem=507: Table './nucleus/nucleus_comment' is marked as crashed and last (automatic?) repair failed

No comments