Some of the tamil fonts are monolingual and use the standard/basic ASCII slots (first 128) to place the tamil characters. There are some bilingual ones which place the roman/english characters in the first 128 slots and the tamil characters in the upper ASCII (128-255 positions). There are no official standards yet on the keymapping employed. Some use the same keymapping as in the classical tamil typewriter while others use the principle of phonetic linkage of tamil character being typed with the roman letter under which the corresponding tamil character is placed.
Stand-alone Tamil fonts
As examples of monolingual tamil typewriter keymap fonts, one can cite the Tamillaser font (of Prof. George Hart), Saraswathi ( of Vijayakumar) and Ananku ( of Dr. P. Kuppuswamy). Tamil fonts such as Palladam (T. Govindaraj) and Mylai (Dr. K. Kalyanasundaram) use phonetic linkage of the tamil character to the roman letters under which the tamil letter is placed. For those who never used a tamil typewriter, phonetic link on keymap is easy to master and can be very appealing.
Bilingual ANSI type Tamil fonts
The second type of tamil fonts are of bilingual type (roman and tamil). They are of the ANSI type - use the first/basic ASCII slots for the roman characters and the upper ASCII slots (129 to 256) to place the tamil characters. The ADHAWIN font that comes as part of the Adhawin package (of Dr. K. Srinivasan), Inaimathi font that comes as part of the Anjal Newsreader (of Muthu Nedumaran) and TamilFix font of Naa. Govindaswamy are examples of this kind.
It is rather tedious to access directly the upper 128 ASC characters - on PCs they are accessed by having the 'alt' key pressed and typing appropriate key number. In Adhawin, Anjal packages, the input is in the form of romanized/transliterated text. Associated keyboard editor/manager or a macro converts subsequently and displays equivalent tamil text on the screen. In majority of cases, keyboard managers/editors that allow switching/toggling between the tamil set (upper ASCII) and the roman (lower ASCII) characters.
GIST (Graphics and Intelligence based Script Technology) has now
been known for more than a decade and has become synonymous with the standards in Indian script applications on computers and other electronic media. It has provided unique and simple solutions for a multilingual country like India. The kind of applications that are emerging with wide-spread use of computers is very diverse and here we shall cover some of them.
This allows a multilingual text creat in LEAP to be used convineantly for creating HTML files incorporating the <FONTFACE> tags that are supported on IE4 and Netscape4 browsers.
If you have ISM (ISM-Publisher or ISM-Office or ISM-Soft) you may directly insert the <Fontface fontname> tags and type directly under a Windows based HTML editor using keyboard support of ISM.
If you have other versions of LEAP (LEAP V1.0 or LEAP-PP etc.), you may mark the text block in LEAP and transfer it via clipboard to the HTML editor. You will need to add the <Fontface> tag with the name of the TrueType font you wish to offer for download and browsing.
Previously these diacritic characters were not found in any standard character set and so scholars had to resort to using ASCII representation of these characters (e.g. The Kyoto Harvard Convention ), or the use the FONT FACE tag (which is not included in the HTML standard) along with ad-hoc conventions such the "Classical Sanskrit" (CS)and "Classical Sanskrit Extended" (CSX), conventions [previously the nearest thing that existed to an agreed encoding standard for Romanised Indian text] which use fonts in which glyphs for characters normally found at a given position have been substituted by glyphs for these diacritic characters - in other words using fonts with a non-standard glyph encoding.
# CSX+ aims to be downward compatible with CSX, save for moving aacute
# away from the slot (decimal 160) used as non-breaking space on PCs.
# It also seeks to implement the ISO 15919 standard, while retaining
# a useful set of European accented characters and adding dashes and
# directional double quotes.
These fonts implement the character set designed by Professor K. R. Norman of the University of cambridge for use in printing Indian language material in Roman script.
After consulting with existing users of Professor Norman's fonts and with Professor Norman himself, I have introduced a small number of changes to the character set. (1) "Retroflex t" appears, as before, at character position 160 decimal; however, that position is inaccessible to many Windows users (it is used for non-breaking space), so a second copy of "retroflex t" has been placed at 173 decimal instead of the "notequal" sign that used to be there. If you need "notequal" it can be found in the Symbol font. (2) In an earlier release of versions of Professor Norman's fonts for the PC, a copy of "retroflex t" was placed at character position 202 decimal. This causes problems when the fonts are used on a Macintosh, and the character has been removed from that position. (3) The characters "vocalic R", "long vocalic r", "vocalic r acute" and "long vocalic r acute" (positions 244, 165, 218, 225 decimal) are now defined with a subscript dot rather than a subscript ring.
<center>
<font face="xdvng"><font size="+2">
!
<br>
Â:i l:xm:in:àes:÷h p:rb:ÒÉN:ð n:m:H
<br>
</font></font>
<font size="-1" color="red"><strong>
(If the two lines above and the one below are not displayed
in devanaagarii script, please install
<a href="https://web.archive.org/web/19991128172638/ftp://weed.arch.com.inter.net/pub/" target="_top">
Xdvng</a> fonts.)</strong>
</center></font>
With the availability of Arun Gupta's TrueType devanaagarii font Xdvng for Windows, and Sandeep Sibal's transliteration package Jtrans , this page was added in March 1997.
Itranslator with Sanskrit New font is a more recent utility for preparing Devanaagarii texts. Itranslator was used for preparing several texts in these pages, after August 1997. Documents created with "Sanskrit New font" are so indicated in the list below. It will be necessary to install Sanskrit New font on the computer before you can display these documents in Devanaagarii script.
Jtrans and Itranslator have their relative merits. Individual choice of the type of translator will depend on the type of task and personal preference.
Kruti Dev is one of the most popular hindi font. This font commonly used in many north Indian states. In Madhya Pradesh, Haryana, Rajasthan and many other states of India Kruti Dev font is standard hindi font. Most of the typing tests in these states are conducted in Kruti Dev font. DevLys and Kruti Dev fonts are most used hindi fonts in typing.
In north Indian states many public service commissions conduct their Clerk, Stenographer, Data Entry operator's typing exams using Kruti Dev 010, Kruti Dev 016 or DevLys 010 font. Kruti Dev and DevLys fonts shares same keyboard layout.
So if a candidate have practiced in any Kruti Dev font then he will easily handle DevLys fonts also.
DevLys font series is popular Hindi font list and widely used. Font are developed by C-DAC.
DevLys and Kruti Dev are two most famous Hindi font series and widely used by Hindi typists. For state public service commission and other clerk or stenographer exams DevLys 010 is familiar name for candidates.
Kruti Dev and DevLys fonts share same keyboard layouts.
<title>HindustanDainik.com</title>
<!-- start link to PFR --><link rel="fontdef" src="http://www.hindustantimes.com/news/HTChanakya.pfr"><!-- end link to PFR --><!-- Start Bitstream WebFont Player support -->
<script language="javascript">
<!--
if (navigator.appName == "Microsoft Internet Explorer" &&
navigator.appVersion.indexOf("Windows", 0) != -1 &&
navigator.appVersion.substring(0,1) >= 4)
{
document.writeln("<OBJECT");
document.writeln("classid=\"clsid:0246ECA8-996F-11D1-BE2F-00A0C9037DFE\"");
document.writeln("codebase=\"http://www.bitstream.com/wfplayer/tdserver.cab#version=1,0,0,10\"");
document.writeln("id=\"TDS\" width=0 height=0");
document.writeln(">");
document.writeln("</OBJECT>");
}
// -->
</script>
<link>
<!-- End Bitstream WebFont Player support -->
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>Rasik.com - Search Books, Authors</title>
<link rel="stylesheet" href="/web/20060504194651cs_/http://www.rasik.com/rasik.css" type="text/css">
<!-- Start link to PFR -->
<link rel="fontdef" src="/shivaji01.pfr">
<!-- End link to PFR -->
<li>to type <font size="5" face="Shivaji01">doXapaMDo</font>,
enter "deshapAMDe" (to
make it easy even
"deshpande" will work,
although it is phonetically
incorrect!)</li>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<title>Rasik.com - Hindi Books</title>
<link rel="stylesheet" href="/web/20010303162615cs_/http://www.rasik.com/rasik.css" type="text/css">
<!-- Start link to PFR -->
<link rel="fontdef" src="/shusha.pfr">
<!-- End link to PFR -->
<!-- Start Bitstream WebFont Player support -->
<script src="https://web.archive.org/web/20010303162615js_/http://www.bitstream.com/wfplayer/tdserver.js" type="text/javascript">
</script>
<!-- End Bitstream WebFont Player support -->
<td width="100%"><strong>Rasik.com </strong><big><font face="Shusha">ko
maaQyama sao Aap Gar baOzo ApnaI psaMd kI ihndI pustkoM Kaoja
evaM p`aPt kr sakto hOM.</font><font face="Times New Roman">
</font></big><font face="Shusha"><big>ipClao ek vaYa- ko
daOrana hjaaraoM ]%saahI pazkaoM nao hmaarI [nTrnaoT saa[T kao
doKa AaOr saraha hO.Anaok laaogaaoM nao pustkoM KrId kr hmaara
]%saah vaQa-na ikyaa hO.risakaoM ko [sa p`oma kao doKkr hma yah
saovaa ihndI pazkaoM ko ilayao BaI p`armBa kr rho hOM.
AaXaa hO ik Aap [saI trh hmaara ]%saah vaQa-na krto rhoMgao.
saBaI risakaoM kao hmaarI Aaor sao haid-k Qanyavaad² </big></font>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>Rasik.com - Search Books, Authors</title>
<link rel="stylesheet" href="/web/20010215024037cs_/http://www.rasik.com/rasik.css" type="text/css">
<!-- Start link to PFR -->
<link rel="fontdef" src="/vakil_01.pfr">
<!-- End link to PFR -->
<!-- Start Bitstream WebFont Player support -->
<script src="https://web.archive.org/web/20010215024037js_/http://www.bitstream.com/wfplayer/tdserver.js" type="text/javascript">
</script>
<!-- End Bitstream WebFont Player support -->
<li>to type <font face="Vakil_01">maoQaanaI</font>, enter "medhAnI" (to make it
easy even "medhani" will work, although it is phonetically incorrect!)</li>
I started this initiative way back in 1999 when i myself was looking for a good Marathi font. Did a lot of research but could not find a font that had all the features that I wanted to have and was available FREE! I must give credit to the Shivaji family fonts that was the closest match for my findings then but it had its own limitations in terms of quality and some other keyboard mapping that I found not user friendly. Also CDAC had good quality fonts available in their demo software provided. However, it always required their software in order to use it in any program.
I then gave up and took the challenge of developing a font myself. Its a painstaking process but ultimately I thought I have to do it! I commited myself to this task and created my first font Kiran.ttf that became popular in my friends circle. Then I made an official release of the same in late 1999.
2012: The Indian Rupee Currency Symbol was added in all the fonts. The character is mapped at ASCII 0226 (Alt+0226) and its official Unicode code point U+20b9
Mylai (Included through MTX files for inter-conversion)
Auto-detect encoding
Just select the text and do a "Check Encoding" and Murasu Anjal will tell you what encoding the text is in. Anjal, TSCII, TAB and Unicode are automatically recognised.
[181]>>177 の *.mtx は対応している符号化の変換表。
TSCII 1.6,
TAB (TamilNet99 Bilingual Encoding),
Mylai,
Murasu Tamil encoding,
Anjal Tamil encoding,
Anjal Roman encoding,
Tamil Unicode Character Set
Auto-detection is a feature in the converter that detects the format of the text automatically. If the text is is any of the commonly used encoding formats, namely TSCII, TAB or Unicode, the converter will automatically set the appropriate encoding parameters for the user. This feature will be very useful if the user does not know which particular encoding the document is in.
If the encoding used is not one of those mentioned above, the converter will alert the user that the document is in an unknown encoding. In this situation, the user can set the encoding manually. Most documents on the Internet today use one of the common encoding formats listed above.
The converter already supports the following encoding formats : TSCII 1.6, TSCII 1.7, TAB, TAM, Anjal, Murasu-6, Murasu-7, Murasu-8, Kanian, Vikatan, Unicode, Romanised Tamil and Mylai. To add a new encoding, all you need is an MTX file for that encoding. (see next question).
An MTX file is a file that contains information about the encoding. To create and MTX file, you may use the MTX Editor that's bundled free with Murasu Anjal-2000 as a plug-in.
This site is set up with Dynamic Fonts for Tamil and Sanskrit texts. The dynamic fonts are supported by Netscape browser 4.06 (and later releases) and Microsoft Internet Explorer 4.*. Netscape users: please make sure the option, Use document specified fonts, including Dynamic Fonts, under Edit->Preferences->Appearance->Fonts is set.
We understand that many second generation Telugu People who live abroad can speak Telugu and can read Telugu if Telugu is written in Roman (English) Script. For their convenience, we have provided an option to read the Telugu Script in Roman (English) script.
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<meta name="description" content="Welcome to most popular Kannada News Site - Kannadaprabha.com">
The conjuncts in practical use form a large set of more than 800 and each can combine with a vowel raising the total number of combinations to more than 13,000.
Note: A minimal set comprising the basic consonants and vowels as well as the symbols representing the vowel extensions(called matras) is in principle adequate to write the text in each script. Manual typewriters for Indian languages work on this principle. The ISCII code and UNICODE for Indian languages also follow this concept where codes are provided primarily for the consonants, vowels and the vowel extensions. This set is easily accommodated within 128 codes. The real problem however is that the akshara will have different number of bytes varying from one to as many as seven.
The IITM system has taken mote of this problem and has utilized a uniform 2 byte code for each and everyone of the 13000 or more characters.
It has also been observed that Dynamic fonts, a very interesting and useful concept which permits the fonts used in a web page to be sent along with the page, are rendered properly only if they conform to the ISO8859-1 latin encoding. As of now Dynamic fonts generation tools work only with truetype fonts. Many Indian language fonts in truetype format are not rendered properly on Unix systems as these fonts
for Indian languages are encoded according to windows-1252 encoding which permits several more glyphs to be accommodated in the font compared to iso-8859-1. In fact, Sanskrit_1.2, a truly high quality Devanagari font, does not get rendered properly when sent as a Dynamic font. This is a very useful font as it supports Vedic symbols as well but it is usable only under Windows-95. The same applies to a Telugu font called Pothana.
It appears that as of today, the sfest approach to displaying Indian language text on web pages so as to be viewed from almost all the browsers, is to restrict the font used to an ISO-8859-1 encoding. If Java based applications are considered, then it iss even more important that we stick to this encoding.
Non-Unicode fonts often use a combination of Thai script and Latin Unicode ranges to resolves the incompatibility problem of Unicode Tai Tham in Microsoft office. However, these fonts may encounter a display problem when used on web browsers as the text can be encoded as an unintelligible Thai text instead.
Khottabun is a collection of fonts for ancient Lao scripts, including Lao Tham,
Thai Noi (Lao Buhan), as well as contemporary Lao with Pali-Sanskrit extension.
TSCCONVERTER is a Windows based utility that allows you to convert text files that were created in the following fonts to TSCII encoding: Amutham, Baamini, Divya, Elango, Inaimathi, Kalki, Mylai, TBoomi, Shree802, TMNews, and Marx. Several other fonts that are based on Tamil typewriter keyboard may also work.
For Tamil, apart from providing 'PhoneticTransliteration' and UserDefinedPhonetics for typing in Unicode, Azhagi+ supports typing in various other Non-Unicode font encodings and keyboards too. The full list of supported font encodings is as follows:
Unicode (யூனிகோட், ஒருங்குறி)
SaiIndira (சாய்இந்திரா)
TSCII (திஸ்கி)
Bamini (பாமினி), TamilBible (தமிழ் பைபிள்)
TAM (தாம், டேம்)
TAB (தாப், டேப்)
Baamini (பாமினி 2) (not the same as Bamini)
Vanavil (வானவில்)
STMZH (செந்தமிழ்) [same as RGB Tamil fonts]
Shreelipi (ஸ்ரீலிபி)
LT-TM (எல்.டி-டி.எம்) [same as IndoWord Tamil fonts]
Among other exquisite features (super-fast plain text conversion), Azhagi's converter can do conversion of formatted text too - directly inside 'MS Word' documents itself - thereby retaining all formatting - Bold/Italics/Underline, Color, Alignment, Tables, etc. etc. etc. - of all your Tamil text. The converter is extendable by the user himself/herself - to convert from ANY Tamil font encoding to ANY other Tamil font encoding. The 45 Tamil font encodings supported (as on March 2022) are: Unicode, SaiIndira, TSCII, TAB, TAM, Bamini & TamilBible, Vanavil, Shreelipi, STMZH, LT-TM [same as IndoWord], Gee_Tamil, DCI+Tml+Ismail, SunTommy, ELCOT-ANSI, ELCOT-Bilingual, Diamond, Amudham, Ka, Shree, Mylai Plain, TACE, Elango, Periyar, Priya, Chenet Platinum, KrutiTamil, TM-TTValluvar, Roja, MCL Kannamai, Baamini [not the same as Bamini], Needhimathi, Pandian, TBoomiS, APT-Sangam, Dev, TA-Arul, Sashi, Ganesha, Lakshmi, Tamil-Aiswarya, Adhawin, TmlCheran, Avaice Jasmine, KavipPriya, Vikatan.
If you own a font whose name (e.g. Kalaham) does not match with any of the names in the existing 'Font Encodings' list# of Azhagi, it does not mean you cannot effect to-and-fro conversion using that font. It is very much possible that your font is of the same encoding as one of the font encodings already supported by Azhagi. For instance, 'Kalaham' font is of the same encoding as 'Bamini' font.
So, if you wish to effect conversion from 'Kalaham' font to any other font encoding, then just select 'Bamini' in the 'from this font encoding' list of Azhagi's Font Converter, before effecting conversion.
Similarly, if you wish to effect conversion from any other font encoding to 'Kalaham' font, then just select 'Bamini' in the 'to this font encoding' list of Azhagi's Font Converter before effecting conversion.
Please note that eventhough Unicode has specified only 8 signs/symbols (for day, month, year, etc.), I have given provision for 23 such symbols so that even if Unicode brings in some more signs/symbols in future, they can be accomodated. Not only that. If the font you are using has some special symbols/signs, you can specify them here after the first 8 characters. Not only that. You can specify your own short forms too here. For instance, for the 9th character, if you specify "இப்படிக்கு", then when you press 'Mi', you can get 'இப்படிக்கு'. If you specify 'அடியேன்' for 10th character, then when you press 'Mh', you can get 'அடியேன்' and so on. :)
Please note that 'SaiIndira' font is actually of "Tscii" font encoding and hence, as such, including "Tscii" under 'Font Encoding' list is itself enough. But, I have still included 'SaiIndira' separately for the convenience of people who do not know that SaiIndira is of 'Tscii' encoding.
Similarly, TamilBible font has its Tamil characters in the same slots as Bamini has and hence, as such, including "Bamini" under 'Font Encoding' list is itself enough. But, I have still included 'TamilBible' separately for the convenience of people who do not know that both TamilBible and Bamini fonts hold the Tamil characters in the same slots.
In case you are still using Azhagi+ 10.45 downloaded prior to 9-October-2020, then please download Azhagi+ 10.45 afresh and install. It is necessary that you do the same since Azhagi+ 10.45 got updated on 9-October-2020 (allowing you to effect typing and conversion in 10 more font encodings - Chenet Platinum, Elango, Tace, MylaiPlain, KrutiTamil, MCLKannamai, Periyar, Priya, Roja, TmTtValluvar) and again in May-2021 (allowing you to effect typing and conversion in 14 more font encodings - Avaice Jasmine, Adhawin-Tamil, Baamini [not the same as Bamini], Needhimathi, TmlCheran, Pandian, TBoomiS, APT-Sangam, Dev, TA-Arul, Tamil-Aiswarya, Sashi, Ganesha, Lakshmi). And, the "txts-all.zip" file provided below for download will allow you to effect typing and conversion in 2 more font encodings - Vikatan and KavipPriya (and also many more fonts of this same encoding - Cauvery, Chitram, Ellachelvi, EzhilArasi, Kalaiarasi, Kannaki, Kayalvizhi, Menaka, Meenakshi, Nanthini, Nattiya, Ponni, Sakunthalai, Seethai, Sivakami, Thamarai, Thenmozhi, Ulagamai, etc.).
Software to Convert text in most known Tamil character encoding schemes from one to another.
Supports Unicode, TACE, TSCII, TAB, TAM, Bamini, Shreelipi, Diacritic, Vanavil, Softview.
Definitions in an easy XML Structure that makes NHM Converter extendable to any language, any encoding easily
Marathi Language written in Devanagari script, So the all font that are used in Devanagari aka Hindi font are also used for Marathi typing. The most common Devanagari fonts are Krutidev and Devlys font. In Marathi typing Devanagari keyboard is used. Many government requires Marathi typing test also taken in krutidev font.
Free download Shivaji font
Free download Kruti Dev Font
Free download Marathi Shree Dev Lipi fonts Whole Series
These programs (csx2tex, dn2tex, tex2csx, tex2dn, iscii2csx, tex2norman, norman2tex) are for use in converting between different encodings used to represent Indian-language text: (1) CSX, (2) the DN encoding used in conjunction with Frans Veltuis's “Devanagari for TeX” package, (3) the ISCII standard used by much Indian software, (4) the encoding popularised by Professor K. R. Norman, and (5) my variation on standard TeX (in which “\.” represents a subscript dot, “\:” a superscript).
Csx2isc and csxp2isc convert respectively from CSX and CSX+ to ISCII. Csxp2ur converts text from CSX+ to accented Unicode Roman. A2c and c2a convert between CSX and Harvard-Kyoto ASCII. Iscii2ud converts from ISCII to Unicode Devanagari, and ud2iscii converts in the opposite direction. Ur2ud converts from Unicode Roman to Unicode Devanagari; it can read and write UTF-8 and other standard Unicode formats; Roman transliteration adheres to the ISO 15919 standard. Ud2ur converts in the opposite direction, but both input and output are restricted UTF-8.
Macros for Microsoft Word enabling the user to convert documents using legacy encodings such as CSX+ or Norman to Unicode
The text of the Mahābhārata is available in three formats: Unicode Devanagari, Unicode Roman (using the conventions defined in ISO 15919), and ASCII (using the Harvard/Kyoto conventions).
There is nothing intrinsically wrong with Tokunaga's seven-bit ASCII system of transcription, but it is difficult to read and therefore prone to errors. I have converted his texts into the eight-bit CSX encoding. I chose this not for its inherent merits (it has few) or because it is well suited to the Unix environment in which I work (it is very badly suited) but because it is the only attempt at a standard eight-bit encoding known to me, and standards are precious things. In converting the texts I have done my best to resolve the ambiguities in Tokunaga's original material, where "m" may be the labial nasal or anusvara, "h" may be the voiced breathing or visarga, and "n" may be the dental, palatal or velar nasal.
The only area where it is likely that errors may remain is the conversion of "n" to velar "n", which has to be largely done on a case-by-case basis. If errors do remain here, they are certainly not numerous.
Tokunaga's texts are entered with vowel sandhi undone and compounds broken up, ostensibly to facilitate word searches. I do not believe that this is desirable, since the texts ought to be usable for other purposes (printing high-quality copy in Nagari, metrical analysis, analysis of diction, etc.), and since there is not in fact any real difficulty in performing word searches on normal Sanskrit texts -- with care, even a "difficult" word like api can be isolated and searched for. I have therefore normalised the sandhi and attempted to rejoin the compounds.
The transcription used is not that adopted by Prof. Tokunaga (in which aa represents long a, T retroflex t, etc.) but instead is the eight-bit CSX encoding.
Prof. Tokunaga presents his Sanskrit texts with vowel sandhi undone, in order to simplify word-searches. Similarly, compound-members are separated. I am unconvinced of the desirability of doing this, and have converted the text to a more conventional representation.
The text of the Rāmāyaṇa is available in three formats: Unicode Devanagari, Unicode Roman (using the conventions defined in ISO 15919), and ASCII (using the Harvard/Kyoto conventions).
Padma is a technology for transforming Telugu text between various public and proprietary formats. This page is currently under development and is known to work on Internet Explorer 6.x, Mozilla Firefox 1.x, Netscape Nagivator 8.x on Windows 2000 and Windows XP; Mozilla Firefox 1.x on Linux.
This page is based on the technology used by the Mozilla Extension of the same name. The extension (supporting Firefox, Thunderbird and Netscape) can be downloaded from Mozilla Update or Mozilla Developer websites.
//Use Unicode Private Use Area for Padma's internal symbols starting with U+EC00.
//Code pints used: +UEC00-+UEC0F, +UEC10-+UEC1E, +UEC20-+UEC68, +UEC70-+UEC80, +UECA1-+UECE8, +UED33-+UED68.
//Code points 32-64, 91-96, 123-127 (from the ASCII range) are not explicitly listed here
//but are part of Padma's internal format and are of type Padma.type_unknown.
# This program creates a Plain Text, OpenOffice.org Writer (odt), or HTML file
# in Khmer Unicode/Legacy format from Legacy/Unicode input file respectively.
# Currently it supports legacy font types: ABC, ABC-ZWSP, Baidok, Fk, Kaoh Kong, Khek, Limon, Truth and a rasmei kampuchea...
In today's context we use or will use LLM (Large Language Models) in our daily life. But the problem is that the LLMs are trained on the Unicode text. But in many cases, we have to deal with the legacy fonts. So, this package will help you to convert the legacy fonts to Unicode text.
Formerly, a small group of Khmer, Khmer-American and American folks were met on the Internet to discuss about the possiblity of using Khmer language for email, homepage etc.. From that point on, we looked at the commercial Khmer fonts available in market. different font foundry assigned same Khmer character with different code. The ASCII characters are replaced by the Khmer alphabet according to its sound. With this type of arrangement, extensive sort algorithm will be needed if ones want to sort Khmer words in the right order such as the order used in the dictionary. We also looked at the requirement by Netscape to find a way to use Khmer characters in its documents. We've found that english language is still needed in order to read the title, the articles listing and command.
また、フォントについては
After two months, we gave up and try to convince the Khmer font foundy for their contribution. Fornuately, Hann So has some relationship with one of the Khmer font foundry, the Khek's brothers. After several emails using Hann as middle man, Khek's brother make a generous offer by donating one of their Khmer font to be modified with the new code arrangement.
Many different types of Lao fonts have been created for use with Windows applications. Some widely-used, older Lao fonts (such as SengChanh, alice_0 just replaced the English (Latin alphabet) characters by the Lao characters that use the same typewriter keys, but in many applications, that caused serious incompatibilities.
Other Lao fonts (such as Saysettha Lao) included most English characters at the usual code-points, and added Lao characters in the "upper ASCII" range of 8-bit characters. However, unlike Thai, no 8-bit coding standard for Lao was ever adopted or supported by Microsoft or application developers, so Lao text will not always be displayed correctly with these fonts.