Ra Japhala in Bengali (bn) and Unicode 5.0

During the past two days, I have been trying to weed out the remaining Bengali related bugs in Pango, and came up with three bug reports (427667, 427611, 427584). Of these, bug 427667 affects all Indic scripts and should be of interest to the entire Indic community. The rest are Bengali specific.

While working on the issues, to my dismay, I noticed that the last version of the Unicode Standard Book which is freely available online (as PDF documents) covers only version 4.0. The book that covers the latest version (5.0) has to be bought for around 50 USD (no idea how much the shipping costs for sending it to Kolkata would be).

The lack of documentation on 5.0 became very irritating while I was trying to work on issue 427611, which tries to implement correct behaviour in Pango while trying to render the sequence U+09B0 ZWNJ U+09CD U+09AF, which according to the Indic FAQ at the Unicode site, should be rendered as Ra-Japhala. However, after some discussion today on IRC with Runa-di and Rahul, it seems that Unicode 5.0 changes the recommended sequence for Ra-Japhala to U+09B0 ZWJ U+09CD U+09AF from U+09B0 ZWNJ U+09CD U+09AF. This seems to be a result of the acceptance of Public Review Issue #37, proposed by Peter Constable of Microsoft in July 2004. The logic put forward by PR-37 is absolutely fine, as far as my opinion goes, but the entire experience raises a couple of questions – namely:

  • Why hasn’t the Indic FAQ been updated to reflect the changes?
  • Why isn’t the Unicode 5.0 book freely available for download by developers?

A very good friend of my mine has access to a (physical) copy of the book, and she mailed me a snippet from the book:

” …Unicode Standard adopts the convention of placing the character U+200D ZWJ immediately after the ra to obtain the ra-yaphaala…”

So there you go. My work would have been made much more easier if I had access to that particular book – I spent almost an entire day trying to figure out which is the right way of representing Ra-Japhala :-( .

However, I would probably also mention here that though the recommended way is to represent Ra Japhala as U+09B0 ZWJ U+09CD U+09AF, people seem to favour rendering the sequence U+09B0 ZWNJ U+09CD U+09AF as Ra Japhala as well (probably for backward compatibility reasons), so the patch against issue 427611 stands. I asked Jamil-bhai to get the following sequences tested in a Windows XP and a Windows Vista machine (since I don’t run any flavour of Windows at all). The sequences to be tested were:

  • U+09B0 ZWNJ U+09CD U+09AF (as a part of a larger word)
  • U+09B0 ZWNJ U+09CD U+09AF (standalone)
  • U+09B0 ZWJ U+09CD U+09AF (standalone)

Windows XP does not render the third sequence correctly:

Ra Japhala in Windows XP

Windows Vista renders all three correctly:

Ra Japhala in Windows Vista

Pango (SVN trunk) with the patch applied renders all three correctly:

Ra Japhala in Pango

The difference between the two versions of Windows is probably caused by different versions of Uniscribe bundled with the OSs.


UPDATE: After writing this entry, I realize that I might be overreacting a bit – but somehow, I can’t help feeling a bit pissed off :-( .

Commentary

Leave a response »

  1. 1. 3 years, 3 months ago

    Hello, could you suggest a good free-as-in-freedom Bengali font? I can’t speak or read Bengali, but I hate the “missing glyph” squares-with-numbers.

    Giacomo
  2. 2. 3 years, 3 months ago

    Cool. Can you please check if Firefox [with Pango] renders Ra-Japhaala?

    Jamil
  3. 3. 3 years, 3 months ago

    Do you know anything about the Nukta problem in Unicode normalization?

    For reference/explanation, see
    http://bugzilla.wikimedia.org/show_bug.cgi?id=5948

    Thanks

    Ragib

  4. 4. 3 years, 3 months ago

    Nukta has no problem. If the font is OK then everything is OK. I think you use SolaimanLipi, very soon there should be another version of SolaimanLipi, which will support that. So things will become OK then.

    Omi Azad
  5. 5. 3 years, 3 months ago

    By the way, Office 2007 doesn’t support U 09B0 ZWNJ U 09CD U 09AF and future version of Unicode supported products will not support that. Because when PR-37 get approved U 09B0 ZWJ U 09CD U 09AF became the default one for the future. But I like that Pango supports both. But the problem is, people should be habitual of universal standard. That means if you sent a mail to me written U 09B0 ZWNJ U 09CD U 09AF on the body and I may not read that if I’m reading that over Vista.

    Omi Azad
  6. 6. 3 years, 3 months ago

    in a nutshell can’t we say that the Ra Japhala issue have been fixed? or, not yet? :(

    mak
  7. 7. 3 years, 3 months ago

    Ya,
    We can. But we have to circulate that the sequence for typing it is to put a ZWJ after it not a ZWNJ. Cause the future is ZWJ.

    Omi Azad
  8. 8. 3 years, 2 months ago

    Hi My Name Is ivaaze.

  9. 9. 2 years, 4 months ago

    If u try vrinda font in windows &
    word2007 also show it

    Tarak Nath Mondal

Trackbacks

Leave a comment, a trackback from your own site or subscribe to an RSS feed for this entry. Trackback URL for this entry Comments feed for this entry

Leave a response

Leave a URL

Preview