Ra Japhala in Bengali (bn) and Unicode 5.0
Posted by Sayamindu 3 years, 5 months ago
During the past two days, I have been trying to weed out the remaining Bengali related bugs in Pango, and came up with three bug reports (427667, 427611, 427584). Of these, bug 427667 affects all Indic scripts and should be of interest to the entire Indic community. The rest are Bengali specific.
While working on the issues, to my dismay, I noticed that the last version of the Unicode Standard Book which is freely available online (as PDF documents) covers only version 4.0. The book that covers the latest version (5.0) has to be bought for around 50 USD (no idea how much the shipping costs for sending it to Kolkata would be).
The lack of documentation on 5.0 became very irritating while I was trying to work on issue 427611, which tries to implement correct behaviour in Pango while trying to render the sequence U+09B0 ZWNJ U+09CD U+09AF, which according to the Indic FAQ at the Unicode site, should be rendered as Ra-Japhala. However, after some discussion today on IRC with Runa-di and Rahul, it seems that Unicode 5.0 changes the recommended sequence for Ra-Japhala to U+09B0 ZWJ U+09CD U+09AF from U+09B0 ZWNJ U+09CD U+09AF. This seems to be a result of the acceptance of Public Review Issue #37, proposed by Peter Constable of Microsoft in July 2004. The logic put forward by PR-37 is absolutely fine, as far as my opinion goes, but the entire experience raises a couple of questions – namely:
- Why hasn’t the Indic FAQ been updated to reflect the changes?
- Why isn’t the Unicode 5.0 book freely available for download by developers?
A very good friend of my mine has access to a (physical) copy of the book, and she mailed me a snippet from the book:
” …Unicode Standard adopts the convention of placing the character U+200D ZWJ immediately after the ra to obtain the ra-yaphaala…”
So there you go. My work would have been made much more easier if I had access to that particular book – I spent almost an entire day trying to figure out which is the right way of representing Ra-Japhala
.
However, I would probably also mention here that though the recommended way is to represent Ra Japhala as U+09B0 ZWJ U+09CD U+09AF, people seem to favour rendering the sequence U+09B0 ZWNJ U+09CD U+09AF as Ra Japhala as well (probably for backward compatibility reasons), so the patch against issue 427611 stands. I asked Jamil-bhai to get the following sequences tested in a Windows XP and a Windows Vista machine (since I don’t run any flavour of Windows at all). The sequences to be tested were:
- U+09B0 ZWNJ U+09CD U+09AF (as a part of a larger word)
- U+09B0 ZWNJ U+09CD U+09AF (standalone)
- U+09B0 ZWJ U+09CD U+09AF (standalone)
Windows XP does not render the third sequence correctly:

Windows Vista renders all three correctly:

Pango (SVN trunk) with the patch applied renders all three correctly:

The difference between the two versions of Windows is probably caused by different versions of Uniscribe bundled with the OSs.
UPDATE: After writing this entry, I realize that I might be overreacting a bit – but somehow, I can’t help feeling a bit pissed off
.
Hello, could you suggest a good free-as-in-freedom Bengali font? I can’t speak or read Bengali, but I hate the “missing glyph” squares-with-numbers.
Cool. Can you please check if Firefox [with Pango] renders Ra-Japhaala?
Do you know anything about the Nukta problem in Unicode normalization?
For reference/explanation, see
http://bugzilla.wikimedia.org/show_bug.cgi?id=5948
Thanks
Ragib
Nukta has no problem. If the font is OK then everything is OK. I think you use SolaimanLipi, very soon there should be another version of SolaimanLipi, which will support that. So things will become OK then.
By the way, Office 2007 doesn’t support U 09B0 ZWNJ U 09CD U 09AF and future version of Unicode supported products will not support that. Because when PR-37 get approved U 09B0 ZWJ U 09CD U 09AF became the default one for the future. But I like that Pango supports both. But the problem is, people should be habitual of universal standard. That means if you sent a mail to me written U 09B0 ZWNJ U 09CD U 09AF on the body and I may not read that if I’m reading that over Vista.
in a nutshell can’t we say that the Ra Japhala issue have been fixed? or, not yet?
Ya,
We can. But we have to circulate that the sequence for typing it is to put a ZWJ after it not a ZWNJ. Cause the future is ZWJ.
Hi My Name Is ivaaze.
If u try vrinda font in windows &
word2007 also show it