Institute for Urdu Informatics By Dr. Attash Durrani

محبوب خان · فروری 22، 2008

Dr. Attash Durrani’s* vision revealed

Institute for Urdu Informatics: A Futuristic Approach for Language Development
PART ONE

The year 2008 has been announced by UN as the year of languages for development and dialogues among the languages of the world. As far as our national language Urdu is concerned it has a two way dialogue openings i.e. International (English, French, Arabic, Chinese, etc. and (2) local languages like Hindi, Punjabi, Pashto, Sindhi, Balochi, Brahavi, Shina, Khawar etc.). This is another vista opened for language development i.e. IT and Computer Science. Future of Urdu language development now lies in this field of informatics.
By the end of 20th century, it was universally agreed by the linguists and computer technologists that only those languages would survive in the 21st century, which can be effectively used as a Computer Language.
The computer giants like Microsoft, realizing the fact that computer usage can no longer be restricted to Word Processing but instead be expanded to create a database for research and development and internet usage, decided to convert and produce its soft wares in one hundred different languages. Urdu is now one top most of these languages emerged on the digital scene.
Urdu Informatics Department of National Language Authority devised so many projects and standardized many tables which are now in use by the computer world: NADRA, Google, Microsoft, Nokia, Motorola, IBM etc. Changing Computer screen from English-to-Urdu is making it usable for an ordinary Pakistani to operate and use computer easily and locally.
Microsoft Urdu Office 2003, Microsoft Urdu Windows XP have been released by the Microsoft.
This initiative will potentially enable computer access in every street and villages because 92 percent of the population of Pakistan does not speak or use English.

The future projects of NLA such as Urdu Databases/ Data Bank and for all their computer needs, without any extra expenditure for an Urdu support to computing, will also be a support to all others working in computing.
Before the intervention of NLA, UNICODE (the International Standard for Computers) was adopting Urdu characters and standards erroneously from Arabic code plate and other unreliable sources, especially from India revealing Urdu a subset of Arabic. With NLA now as a Full Corporate Consortium Member of UNICODE for last Eight years, have been effectively warding off interference in Urdu language and diverting its attention towards our needs.
How the future of Urdu be saved in the era of Computer technology, internet and IT, the answer lies in the development of the projects like Center of Excellence for Urdu Informatics (CEUI) with a view to providing short-term and long-term language policy, ways and means of adoption of Urdu as Official, Judicial and Instructional language of Pakistan, and conduct research and development for Urdu standardization and academic support to national and international stakeholders and Government of Pakistan.
Advantages to the State with the development of this project are being visualized as:

Having everything in Urdu on Computers will promote national integrity more than any other cultural tool.
E-Government can take advantage of Urdu Informatics at large by having everything in Urdu accessible for masses.
Hand held Urdu devices will help to cover the state operations to less literate areas where English isn’t known.
Will add up to a strong localized electronic infrastructure equally accessible from all parts of the state.
Automatic Machine Translation will assist the state to keep an eye on world opinions in their own language (Urdu) by continuously translating from various sources.

Advantages to the Society may be reflected as:

Urdu Computing platforms will attract masses to these platforms hence eliminating the “digital divide” in our society.
Hand held Urdu devices will transform our society into a more interconnected society having uncountable benefits.
Automatic Machine Translation from English to Urdu will enable the masses to access the data that, before translation, is foreign to them.
When it comes to Scientific Text, Automatic Machine Translation will help to translate science and technology literature from abroad into Urdu thus raising the overall intellect of the society.
Electronic Urdu Database will help to standardize the language among masses thus boosting the research and refinement of the language which definitely has a positive impact on society.

Advantages to the Economy after completion of such type of the projects may be visualized as under:

Computers in Urdu will encourage less literate businessmen to track their business with digital devices.
As a result of the previously mentioned consequence, enabling Urdu platforms will boost the local Software industry to produce Urdu Business Solutions.
Small business will start to track their business with Urdu enabled devices which in turn will generate both, the business for software industry and revenue for the central government.
Urdu platforms will force the Hi-Tech companies to release their products specifically aimed at Urdu consumer market.

So why the Government of Pakistan in its Vision 2030 made some line to make Urdu a language of internet on March 2007.

“True sustainability, however, will come when these languages create their synergies with global modern movements and ideas, especially the Internet.”
Urdu is and must remain the first language of Pakistan. It is and must remain the language of our culture and of our day-to-day communication. It is and must remain the first.

The story starts since 1998 when National Identity Cards were proposed to be developed in Urdu but there was lack of the standardization in Urdu Applications and Softwares. An effort was viewed in a seminar at FAST (NU) Lahore in October 1998 and a resultant Urdu Informatics department started in the National Language Authority, Cabinet Division Islamabad to coop with the running issues of standards.
The Government of Pakistan decided in its cabinet meeting on August 23, 2000 that the development of standards for the use of Urdu for Computer Applications shall continue the responsibility of the Cabinet Division." And after developing so many standards and changing the computer screen into Urdu, the Prime Minister of Pakistan issued directive, reviewing it on 11-02-2006 as: "The promotion of Urdu language by making it a computer language be put on fast track."
What were these efforts; let us have a glance of its history. 1st Gazette Notification of Urdu Typewriter's Keyboard was made in 1980. It published in the Gazette of Pakistan. Extra, December 6, 1980 (Pak.III) that:
The public sector companies concerned may undertake the manufacture of Typewriter and Teleprinters with the standardized key boards only.

This keyboard also got a place on Urdu word processing but was not efficient for computing needs. A Keyboard Ver.1.00 was developed on 14th December 1999 by the Urdu Informatics Department of National Language Authority, Cabinet Division, by its standard committee.

This department also developed ASCII Code plate for the use of Urdu in computers.

In July 2007 the Chief Executive of Pakistan General Pervaiz Musharaf gave his approval as a standard for this standard ASCII code plate. Then this was revised and the code plate Ver.2 was developed by the standardization committee of NLA. The main feature of this version was ghost character set along with dots to process all the Pakistani Languages.

The newly developed keyboard was also made to process this. A consensus of Pakistani Language Boards was also taken in November 2000. This keyboard was adopted by NADRA in Pakistan and by Microsoft in Windows XP: English and Urdu versions. Now all other companies are also adopting this keyboard.

The National Language Authority of Pakistan is also grateful to Microsoft for its great initiative to collaborate in the development of the Urdu Language for use in Informatics through its Local Language Program. The National Language Authority, in collaboration with Microsoft, is working to bring Computer Technology to Urdu. Providing the interface in Urdu will boost IT development activities in Pakistan, as well as in India and other SAARC countries. This initiative will potentially enable computer access in every street and village of Pakistan and South Asia. Teaching of Urdu will be facilitated and education in school can now be enhanced with the help of computers, because 92 percent of the population of Pakistan does not speak English. In addition, the software development industry will gain a new field of business activities using Office and Windows in Urdu, and we expect that the LLP will also benefit Urdu informatics research activities in Pakistan’s universities.”
This message is also released and available on the website of Microsoft as a Press Release of March 16, 2004

The President of Pakistan also allowed National Language Authority to become full member of International code for language processing named as UNICODE. Some changes are adopted by UNICODE in its Ver. 4.0 from the ASCII code plate version 2 of NLA. The ghost character set was also recognized and adopted by UNICODE.

There are some more proposals sent by CEUI to UNICODE to develop standards for Urdu and Pakistani languages.

A set of ghost characters and dots/ nuqtas makes easy to formulate any character of any language written in Arabic script.
The technical committee of UNICODE has accepted this proposal and in its last meeting on October 13, 2007, they were of the opinion that alternate nuqta proposal to add spacing characters he developed. Korean and other CJK languages following this proposal made their contributions. ISO-Open Office Standards also brought for Urdu. Standards in its ECMA-376 for Office Open XML file formats. Two meetings of its sub-committees have been conducted.
Why this is to be done. A basic rationale is revealed through a research paper of UNICODE developed by Mr. Mark Davis as in GDP by language on January 22, 2003. He writes:
"Many people in the software industry don't realize how important it is to localize products for different languages around the world. While English is a major language, it only accounts for around 30% of the world Gross Domestic Product (GDP), and is likely to account for less in the future. Neglecting other languages means ignoring quite significant potential markets.
The most notable feature is the steady rise of Chinese and slow relative decline of Japanese and most European languages. Korean and Indic languages also show growth over that period, though slower than Chinese."

....................................
The article divided into two parts because of limited characters submission here at Mehfil Forum
.....................................

*Project Director, Centre of Excellence for Urdu Informatics (CEUI), National Language Authority, Cabinet Division, Islamabad.

محبوب خان · فروری 22، 2008

Institute for Urdu Informatics By Dr. Attash Durrani PART TWO

Dr. Attash Durrani’s* vision revealed:

Institute for Urdu Informatics: A Futuristic Approach for Language Development

PART TWO

In general, the data is less reliable for smaller languages, so the order should not be taken as significant.
[*]Tagalog, Afrikaans, Persian, Swedish, Ukrainian, Malay, Telugu, Greek, Marathi, Tamil, Vietnamese, Cantonese, Urdu, Norwegian, Danish, Czech, Hebrew, Catalan, Romanian, Hungarian, Gujarati, Finnish, Turkic, Punjabi, Kannada, Other Indic, Malayalam, Oriya, Slovak, Galician, Bulgarian, Byelorussian, Croatian, Amharic, Sindhi, Sinhalese, Assamese, Nepali, Kurdish, Kazakh, Uzbek, Slovenian, Pashto (Pushto), Luxembourgish, Azerbaijani, Latvian (Lettish), Cambodian, Turkmen, Basque, Estonian, Albanian, Balochi, Malagasy, Lithuanian, Armenian, Kinyarwanda, Swahili, Laothian, Macedonian, Icelandic, Luri, Georgian, Serbian, Tajik, Hindko, Moldavian, Konkani, Sesotho, Mongolian, Manipuri, Kirghiz, Maltese, Brahui, Chichewa, Croatian, Kirundi, Afar, Rhaeto-Romance, Samoan, Tonga,...

Why Urdu or other Pakistani Languages, be lacking in these position. What is the remedy? The answer is "Localization".

Localization (sometimes shortened to "L10n") is the process of adapting a product or service to a particular language, culture, and desired local "look-and-feel." Ideally, a product or service is developed so that localization is relatively easy to achieve - for example, by creating technical illustrations for manuals in which the text can easily be changed to another language and allowing some expansion room for this purpose.

In localizing a product, in addition to idiomatic language translation,. A successfully localized service or product is one that appears to have been developed within the local culture.

Language translation, which is a large part of localization, can sometimes be facilitated with automatic language translation.

The Localization Research Centre (LRC) is the information, educational, and research centre for the localization community, established in 1995 at University College Dublin under the Irish Government and European Union funded Technologies Centers programme.
Localization (Education & Research) facilities are now spreading.
New Postgraduate Localisation Courses at the University of Limerick2005-2006 Localisation Reader available now STAR Servicios Lingüísticos launch new Certified Localisation Professional course.PhD Research opportunity with the Localisation Research.

Mobile phone companies like NOKIA and MOTOROLA Corporation are also localizing their applications. Urdu, Pashto, Sindhi SMS are now in use. A Pakistani company INKSOFT made it possible.
NLA's Keyboard is also done by the google (Urdu), Wikipedia (Urdu) Localized.

MSc. In software localization Program is available in University of Limerick

There are some other International Localization Institutes also working in this world. Localization is now a Global Business. Total Business in 2005 was $ 8.8 Billion (Urdu is one of the 4 Topmost Microsoft Localization Languages) Microsoft- NOKIA- Motorola- Panasonic- IBM etc. are in this business now. It means that Urdu is already in the mainstream of the IT Industry for localization.
Automatic Language Translation is basic in the field of localization. It is the use of a computer program to translate input text from one national language to another while maintaining the original document format. Yahoo and some other sites offer what is sometimes called instant translation using such a tool. Since language is heavily dependent on context and connoted as well as denoted meaning, a program needs to have access to such context as well as the ability to use it. Since providing enough context is difficult, automatic language translation thus far seems to be successful only in limited and well-understood situations and as a first time-saving step toward translation (or "post-editing") by a human being.
(a) Machine Translation Software’s are required for automatic language translation. NLA is in search of a Common Lingual Code to fulfill these requirements.
(b) Steps in M.T. found in English-Urdu lingual code are to go down from complex to simple sentence translation.

Lingual Equivalents and Properties/ Diction/ Simple Sentences are to be found or standardized.

Clause Equivalents and Rearrangements – Basic Formula for rearrangement of a complex sentence from English to Urdu is devised:

1/aN+ n+ (n-1) + (n-2) + (n-a) + 1/bV

Here N=Noun, V= verb, n= total number of clauses. 1= 1st clause divided into noun and verb portions.

Induction, Gender, Plurality, Conjunctions etc. are also to be taken into considerations and additives are required in editing and polishing of translated Urdu sentences.

Another project of NLA is to develop Urdu Database (Software) for Urdu Databank like English Banks. It may have the following features:

Search Items (From idea/ Meaning to word/ Phrase).

Filing Arrangements.

Synonyms and Homonyms.

Meaning– Relationship/ Thesauri

Data Development

Testing

Urdu Databank (Corpus) development.

Both the localization and MT need Computational Linguistics to be researched. Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.These are localization and translation challenges presented in different Microsoft Conferences e.g. PDC 2006 and 2007. A conference on Language and Technology was arranged by Peshawar University in August 2007. It revoked that the areas of research are as follows:

Ambiguity resolution

Anaphora Resolution

Character Recognition

Corpus Linguistics

Discourse Analysis

Ellipses Resolution

Fonts

Information Retrieval

Localization

Machine Translation

Morphology

OCR

Part-of-Speech Tagging

Pattern Recognition

Phonology

Semantics

Speech Recognition

Syntax

Text to speech

Future Needs of these researches are to be translated as follows:

R&D- A Continuous Process for Computational linguistics.

Urdu Data Bank- A Continuous Process- with an Interval of 5 years.(90% Completion is expected in 10 years)

E-Teaching/ Learning- Teaching Courses, Certification, Training.

E- Publishing- A Continuous Process.

Laboratory Product- Software/ CD’s/ Programs, Tools etc.

This type of Institute for Urdu Informatics is needed for R&D and Teaching for this new and emerging field. It started with the project: "Center of Excellence for Urdu Informatics". Its PHASE I was for the Development of Standards; PHASE II for the Establishment of Urdu Informatics Center; PHASE III for Urdu Laboratory Products and the Institute.
Its activities are now for:

Font Development- Pak Nastaleeq Font is released.

Urdu Database– Data Development is being carried out.

Machine Translation for Official Urdu is under construction.

OCR- Writing Pad is being developed for Urdu writers to process on Word/ Office.

E- Learning– Prototype Training Courses are to be launched.

E- Publishing – Website and Data ware house is being established.

Building– Pre qualification/ designing is under way.

In Cabinet Committee meeting on February 1, 2007 for adoption of Urdu as Official Language, it is decided that a project for the Urdu Informatics Institute be prepared. HEC may provide funds for the purpose.
“Urdu Informatics” is a new field established in IT. It is quoted in Wikipedia, the free encyclopedia that “Urdu Informatics (اردو اطلاعیات) relates to the cutting-edge research and efforts in bringing the utilities and usage of National language to the modern information and communication technologies in education and businesses.
National Language Authority has been at the forefront in introducing Urdu Informatics as a tool for wider standardization of the language. Apart from development of Urdu keyboard, one of the key steps in this respect has been the establishment of a Centre of Excellence for Urdu Informatics in Islamabad, Pakistan.
Works in Urdu Informatics also relate to the use of Urdu for Internet applications, input controls, dynamic association of keyboard buttons to Urdu alphabets and use of proper character coding schemes.”

Directorate of Admissions, University of Peshawar has announced that Master, M.Phil and Ph.D in Urdu Course are already in progress in the Department. We intend to start Diploma and Certificate Courses for the people other than from Pakistan, and also working on the Degree Courses of "Urdu informatics" in near future.

In the new Urdu Curriculum for P.Phil & Ph.D; developed by HEC Urdu Informatics or use of Urdu on Computers been recognized.

Institute for Urdu Informatics is an important and emerging need of the era and it will be soon established. It is the need of the time that an institute of Urdu Informatics in public sector should enhance and boost the localization activities already being rendered in the world of business. Cabinet Secretary Ejaz Raheem (Now Health Minister) expressed these view at display of “Pak Nastaleeq” font developed by CEUI on August 20, 2006. He was addressing ceremony before the initial or beta release of Unicode character based highly efficient “Pak Nastaleeq” font. He said that the Prime Minister has decided to move the Urdu quickly into the cyber world. After the successful development of Urdu Office and Urdu Windows in collaboration with the Microsoft, he admired the development of Pak Nastaleeq font which is 90 times faster than other Nastaleeq fonts.

"Institute for Urdu Informatics" will provide a platform to Linguists and IT professionals to initiate more research and efforts in the field of Urdu Informatics.

In a review session in January, 2008 the Cabinet Secretary in chair Mr. Syed Masoos Ahmed Rizvi appreciated the remarkable work done so for in developing Urdu as Computer Language. The future of this work is the R&D and Institute.

In fact it is voice of era that an institute for Urdu Informatics should be established so that our graduates may get higher education in the field of Urdu Informatics and it provides a strong platform to the research and efforts for National Language to be used in information and communication technologies.

It is a futuristic approach in language development course for Urdu. Urdu Informatics is the future of Urdu and the Institute is the guarantor. We have contributed our share, now what is the deliberations of our nation?

..........................
*Project Director, Centre of Excellence for Urdu Informatics (CEUI), National Language Authority, Cabinet Division, Islamabad.

الف نظامی · مئی 15، 2008

بہت خوب!
کیا یہ مضمون اردو میں دستیاب ہے؟

mfdarvesh · مئی 15، 2008

بہت شکریہ محبوب خاں مگر یہ سب بے کا ر ہے اس لیے کہ ہم اردو زبان ماننے پر ہی تیا ر نہیں ہمارا سب کام تو انگریزی میں ہوتاہے کسی بھی جگہ اہمیت صرف انگریزی ہے ان حالات میں کچھ بھی کرنا بے کار ہے یہ صرف امیر اور غریب میں فرق بڑھائے گا۔ اردو صرف غریب طبقے کی زبان ہے ۔ وگرنہ پاکستا ن میں‌اہمیت صرف انگریزی کلچر اور زبان کی ہے۔

محبوب خان · مئی 17، 2008

سب باتیں درست مگر اس کا ہرگز یہ مطلب نہیں کہ بس ہاتھ پہ ہاتھ رکھ کےبھیٹا جائے ۔ جہاں تک اردو کو عام آدمی کی زبان قرار دینے کا تعلق ہے تو جناب یہ سب درست۔ مگر یہ تو ٹیکنالوجی کا دور ہے اگر اردو کو ایک جامع انفراسٹکچر مل جتا ہے تو اسے عام و خاص یا امیر اور غریب جسے فیکٹر متاشر نیہں کرسکتے ۔ یاد رکھے آج اس ٹیکنالوجی کی بدولت ہی یہ ممکن ہوا ہے کہ مختلف شعبوں کے لوگ آج اردو استعمال کررہے ہیں۔ مطلب کہ نہیں نومید اقبال اس کشت ویراں سے ۔ ہمارا کام ہے کہ اس کے لیے کام کریں سو کررہے ہیں باقی وقت کرتا ہے پرورش برسوں حوادث یک دم ہو نہیں کرتے۔

ویسے اگر جائزہ لیا جائے تو اردو نے ٹیکنالوجی کی بدولت ترقی کی ہے ۔ اس کا استعمال روز بروز بڑھ رہا ہے اور اردو کے لیے مختلف ادارے ، فورم ، یہاں تک کہ شخصیتں اس کی بہتری کے لیے سر گرداں ہیں۔ سو یہ سب امید کی ووشن کرنیں ہیں ۔ ان کی روشنی کو محسوس کریں ان کی حوصلہ افزائی کریں۔

ڈاکٹر عطش درانی کے بقول جو کہ اردو کے کام کے بارے ہے کہ:
ہم کام کررہے ہیں
ہمیں کام کرنے دیں

یہ کہنا دراصل سب کے لیے ہے کہ ہر کوئی کام کر رہا ہے ۔

سیدہ شگفتہ · مئی 17، 2008

الف نظامی نے کہا:
بہت خوب!
کیا یہ مضمون اردو میں دستیاب ہے؟

السلام علیکم

یہی سوال دوبارہ ۔

بلال · مئی 17، 2008

الف نظامی نے کہا:
بہت خوب!
کیا یہ مضمون اردو میں دستیاب ہے؟

میری طرف سے بھی یہی سوال دوبارہ

محبوب خان · مئی 18، 2008

آپ میں سے اگر کوئی ہمت کر کے لکھے اور یہاں شائع کرے کیا خیال ہے؟

بلال · مئی 18، 2008

محبوب خان نے کہا:
آپ میں سے اگر کوئی ہمت کر کے لکھے اور یہاں شائع کرے کیا خیال ہے؟

خیال تو آپ کا اچھا ہے۔۔۔ لیکن اس کام کی قربانی کوئی اور دوست ہی دے سکتا ہے کیونکہ میں انگلش سمجھ لوں یہ میرے لئے بڑی بات ہے اور اردو میں ترجمہ کرنا یہ بہت مشکل ہے جناب۔۔۔

ظہور احمد سولنگی · جون 18، 2008

محبوب خان نے کہا:
dr. Attash Durrani’s* Vision Revealed:

institute For Urdu Informatics: A Futuristic Approach For Language Development

part Two

in General, The Data Is Less Reliable For Smaller Languages, So The Order Should Not Be Taken As Significant.
[*]tagalog, Afrikaans, Persian, Swedish, Ukrainian, Malay, Telugu, Greek, Marathi, Tamil, Vietnamese, Cantonese, Urdu, Norwegian, Danish, Czech, Hebrew, Catalan, Romanian, Hungarian, Gujarati, Finnish, Turkic, Punjabi, Kannada, Other Indic, Malayalam, Oriya, Slovak, Galician, Bulgarian, Byelorussian, Croatian, Amharic, Sindhi, Sinhalese, Assamese, Nepali, Kurdish, Kazakh, Uzbek, Slovenian, Pashto (pushto), Luxembourgish, Azerbaijani, Latvian (lettish), Cambodian, Turkmen, Basque, Estonian, Albanian, Balochi, Malagasy, Lithuanian, Armenian, Kinyarwanda, Swahili, Laothian, Macedonian, Icelandic, Luri, Georgian, Serbian, Tajik, Hindko, Moldavian, Konkani, Sesotho, Mongolian, Manipuri, Kirghiz, Maltese, Brahui, chichewa, Croatian, Kirundi, Afar, Rhaeto-romance, Samoan, Tonga,...

why Urdu Or Other Pakistani Languages, Be Lacking In These Position. What Is The Remedy? The Answer Is "localization".

localization (sometimes Shortened To "l10n") Is The Process Of Adapting A Product Or Service To A Particular Language, Culture, And Desired Local "look-and-feel." Ideally, A Product Or Service Is Developed So That Localization Is Relatively Easy To Achieve - For Example, By Creating Technical Illustrations For Manuals In Which The Text Can Easily Be Changed To Another Language And Allowing Some Expansion Room For This Purpose.

in Localizing A Product, In Addition To Idiomatic Language Translation,. A Successfully Localized Service Or Product Is One That Appears To Have Been Developed Within The Local Culture.

language Translation, Which Is A Large Part Of Localization, Can Sometimes Be Facilitated With automatic Language Translation.

the Localization Research Centre (lrc) Is The Information, Educational, And Research Centre For The Localization Community, established In 1995 At University College Dublin Under The Irish Government And European Union Funded Technologies Centers Programme.
localization (education & Research) Facilities Are Now Spreading.
new Postgraduate Localisation Courses At The University Of Limerick2005-2006 Localisation Reader Available Now Star Servicios Lingüísticos Launch New certified Localisation Professional Course.phd Research Opportunity With The Localisation Research.

mobile Phone Companies Like Nokia And Motorola Corporation Are Also Localizing Their Applications. Urdu, Pashto, Sindhi Sms Are Now In Use. A Pakistani Company Inksoft Made It Possible.
nla's Keyboard Is Also Done By The Google (urdu), Wikipedia (urdu) Localized.

msc. In Software Localization Program Is Available In University Of Limerick

there Are Some Other International Localization Institutes Also Working In This World. Localization Is Now A Global Business. Total Business In 2005 Was $ 8.8 Billion (urdu Is One Of The 4 Topmost Microsoft Localization Languages) Microsoft- Nokia- Motorola- Panasonic- Ibm Etc. Are In This Business Now. It Means That Urdu Is Already In The Mainstream Of The It Industry For Localization.
automatic Language Translation Is Basic In The Field Of Localization. It Is The Use Of A Computer Program To Translate Input Text From One National Language To Another While Maintaining The Original Document Format. Yahoo And Some Other Sites Offer What Is Sometimes Called instant Translation Using Such A Tool. Since Language Is Heavily Dependent On Context And Connoted As Well As Denoted Meaning, A Program Needs To Have Access To Such Context As Well As The Ability To Use It. Since Providing Enough Context Is Difficult, Automatic Language Translation Thus Far Seems To Be Successful Only In Limited And Well-understood Situations And As A First Time-saving Step Toward Translation (or "post-editing") By A Human Being.
(a) machine Translation Software’s Are Required For Automatic Language Translation. Nla Is In Search Of A Common Lingual Code To Fulfill These Requirements.
(b) steps In M.t. Found In English-urdu Lingual Code Are To Go Down From Complex To Simple Sentence Translation.

lingual Equivalents And Properties/ Diction/ Simple Sentences Are To Be Found Or Standardized.

clause Equivalents And Rearrangements – Basic Formula For Rearrangement Of A Complex Sentence From English To Urdu Is Devised:

1/an+ N+ (n-1) + (n-2) + (n-a) + 1/bv

here N=noun, V= Verb, N= Total Number Of Clauses. 1= 1st Clause Divided Into Noun And Verb Portions.

induction, Gender, Plurality, Conjunctions Etc. Are Also To Be Taken Into Considerations And Additives Are Required In Editing And Polishing Of Translated Urdu Sentences.

another Project Of Nla Is To Develop Urdu Database (software) For Urdu Databank Like English Banks. It May Have The Following Features:

search Items (from Idea/ Meaning To Word/ Phrase).

filing Arrangements.

synonyms And Homonyms.

meaning– Relationship/ Thesauri

data Development

testing

urdu Databank (corpus) Development.

both The Localization And Mt Need Computational Linguistics To Be Researched. computational Linguistics (cl) is A Discipline Between Linguistics And Computer Science Which Is Concerned With The Computational Aspects Of The Human Language Faculty. It Belongs To The Cognitive Sciences And Overlaps With The Field Of artificial Intelligence (ai), A Branch Of computer Science Aiming At Computational Models Of Human Cognition. Computational Linguistics Has Applied And Theoretical Components.these Are Localization And Translation Challenges Presented In Different Microsoft Conferences E.g. Pdc 2006 And 2007. A Conference On Language And Technology Was Arranged By Peshawar University In August 2007. It Revoked That The Areas Of Research Are As Follows:

ambiguity Resolution

anaphora Resolution

character Recognition

corpus Linguistics

discourse Analysis

ellipses Resolution

fonts

information Retrieval

localization

machine Translation

morphology

ocr

part-of-speech Tagging

pattern Recognition

phonology

semantics

speech Recognition

syntax

text To Speech

future Needs Of These Researches Are To Be Translated As Follows:

r&d- A Continuous Process For Computational Linguistics.

urdu Data Bank- A Continuous Process- With An Interval Of 5 Years.(90% Completion Is Expected In 10 Years)

e-teaching/ Learning- Teaching Courses, Certification, Training.

e- Publishing- A Continuous Process.

laboratory Product- Software/ Cd’s/ Programs, Tools Etc.

this Type Of Institute For Urdu Informatics Is Needed For R&d And Teaching For This New And Emerging Field. It Started With The Project: "center Of Excellence For Urdu Informatics". Its Phase I Was For The Development Of Standards; Phase Ii For The Establishment Of Urdu Informatics Center; Phase Iii For Urdu Laboratory Products And The Institute.
its Activities Are Now For:

font Development- Pak Nastaleeq Font Is Released.

urdu Database– Data Development Is Being Carried Out.

machine Translation For Official Urdu Is Under Construction.

ocr- Writing Pad Is Being Developed For Urdu Writers To Process On Word/ Office.

e- Learning– Prototype Training Courses Are To Be Launched.

e- Publishing – Website And Data Ware House Is Being Established.

building– Pre Qualification/ Designing Is Under Way.

in Cabinet Committee Meeting On February 1, 2007 For Adoption Of Urdu As Official Language, It Is Decided That A Project For The Urdu Informatics Institute Be Prepared. Hec May Provide Funds For The Purpose.
“urdu Informatics” Is A New Field Established In It. It Is Quoted In Wikipedia, The Free Encyclopedia That “urdu Informatics (اردو اطلاعیات) Relates To The Cutting-edge Research And Efforts In Bringing The Utilities And Usage Of national Language To The Modern Information And Communication Technologies In education And businesses.
national Language Authority Has Been At The Forefront In Introducing Urdu Informatics As A Tool For Wider standardization Of The Language. Apart From Development Of urdu Keyboard, One Of The Key Steps In This Respect Has Been The Establishment Of A Centre Of Excellence For Urdu Informatics In islamabad, Pakistan.
works In Urdu Informatics Also Relate To The Use Of Urdu For internet Applications, Input Controls, Dynamic Association Of Keyboard Buttons To Urdu Alphabets And Use Of Proper Character Coding Schemes.”

directorate Of Admissions, University Of Peshawar Has Announced That master, M.phil And Ph.d In Urdu Course Are Already In Progress In The Department. We Intend To Start Diploma And Certificate Courses For The People Other Than From Pakistan, And Also Working On The Degree Courses Of "urdu Informatics" In Near Future.

in The New Urdu Curriculum For P.phil & Ph.d; Developed By Hec Urdu Informatics Or Use Of Urdu On Computers Been Recognized.

institute For Urdu Informatics Is An Important And Emerging Need Of The Era And It Will Be Soon Established. It Is The Need Of The Time That An Institute Of Urdu Informatics In Public Sector Should Enhance And Boost The Localization Activities Already Being Rendered In The World Of Business. Cabinet Secretary Ejaz Raheem (now Health Minister) Expressed These View At Display Of “pak Nastaleeq” Font Developed By Ceui On August 20, 2006. He Was Addressing Ceremony Before The Initial Or Beta Release Of Unicode Character Based Highly Efficient “pak Nastaleeq” Font. He Said That The Prime Minister Has Decided To Move The Urdu Quickly Into The Cyber World. After The Successful Development Of Urdu Office And Urdu Windows In Collaboration With The Microsoft, He Admired The Development Of Pak Nastaleeq Font Which Is 90 Times Faster Than Other Nastaleeq Fonts.

"institute For Urdu Informatics" Will Provide A Platform To Linguists And It Professionals To Initiate More Research And Efforts In The Field Of Urdu Informatics.

in A Review Session In January, 2008 The Cabinet Secretary In Chair Mr. Syed Masoos Ahmed Rizvi Appreciated The Remarkable Work Done So For In Developing Urdu As Computer Language. The Future Of This Work Is The R&d And Institute.

in Fact it Is Voice Of Era That An Institute For Urdu Informatics Should Be Established So That Our Graduates May Get Higher Education In The Field Of Urdu Informatics And It Provides A Strong Platform To The Research And Efforts For National Language To Be Used In Information And Communication Technologies.

it Is A Futuristic Approach In Language Development Course For Urdu. Urdu Informatics Is The Future Of Urdu And The Institute Is The Guarantor. We Have Contributed Our Share, Now What Is The Deliberations Of Our Nation?

..........................
*project Director, Centre Of Excellence For Urdu Informatics (ceui), National Language Authority, Cabinet Division, Islamabad.

محبوب کیا آپ بتانا پسند کریں گے کہ اس تھریڈ میں انک سافٹ کا تذکرہ کیوں کیا گیا ہے جبکہ آپ یہ سرکاری حیثیت میں یہاں پر پوسٹ کیا ہے؟

محسن حجازی · جون 18، 2008

بالکل! یہ کیا چکر ہے ذرا یہ بھی بتایا جائے حاضرین محفل کو۔

محبوب خان · جون 19، 2008

Quote=ظہور احمد سولنگی;290970] آپ یہ سرکاری حیثیت میں یہاں پر پوسٹ کیا ہے؟[/quote]

محترم ظہور احمد سولنگی صاحب!

یہاں میں‌اپنے ذاتی حیثیت سے مضامین پوسٹ کرتا ہوں۔ درستگی کی درخواست ہے کہ اس کو سرکاری حیثیت سمجھنے والے اپنی تصحیح کریں۔

مضامین اس لیے زینت اردو محفل بناتاہوں تاکہ اردو سے پیار کرنے والے اس سے استفادہ کریں۔ عرض ہے کہ تحقیق مضامین کو حرف آخر کی بجائے حرف آغاز سمجھا جائے تو ذرا آسانی سے بات سمجھی جا سکتی ہے۔

یہ ایک تحقیقی مضمون ہے اور عرض ہے کہ تحقیقی مضامین میں حوالاجات دیے جاتے ہیں۔

نبیل · جون 19، 2008

میرے خیال میں بہتر ہے کہ یہ موضوع جس تھریڈ میں جاری ہے وہاں ہی یہ اپنے انجام کو پہنچے۔ اس کے بعد میں اس طرح کے تمام موضوعات کو مقفل کر دوں گا۔

shoaibnawaz · جون 19، 2008

آپ جس حیثیت میں بھی پوسٹ کریں۔ لیکن یہ تو بتائیں کہ ذاتی طور کسی کو متعارف کروانے کے لیے اس سے موزوں کوئی شخصیت یا ادارہ نہ ملا آپ کو؟

ظہور احمد سولنگی · جون 20، 2008

انک سافٹ کا تذکرہ بھی مشین ریڈایبل کے دھاگے میں کیا جا رہے اور یہ بھی وہاں بتایا جائے کہ گا انک سافٹ اور مرکز فضیلت برائے اردو اطلاعیات کا آپس میں کیا تعلق ہے۔ محبوب صاحب کا کوئی قصور نہیں آپ کو شاید پتہ بھی نہیں کہ آپ کو ذاتی تشہیر کے لئے استعمال کیا جارہا ہے اور آپ یہ آرٹیکل اور بھی متعدد فورمز پر شایع کرچکے ہیں لنکس دوسرے دھاگے میں دی جا رہی ہیں۔

محسن حجازی · جون 20، 2008

محبوب خان نے کہا:
ms Tanveer Fatima*

ghost Characters Theory For Orthographic Representation Of The Arabic Block

There Are Many Constraints In The Spread Of Knowledge, Most Important Of Which Is Language/communication Problem. As About 45% Volume Of The Knowledge Is In English And Most Of The People Cannot Understand English. Having 35% Literacy Rate, Out Of This Only 2% Can Read And Understand English. This Is Really A Big Obstacle To Reach The Unreached. Solution To This Problem Is Localization I.e. All The Products Of I.t. And Other Computer Operations Should Be Converted In User’s Native Language. History Also Reflects That The Localization Is Now A Global Business.
Mark Davis (2003) Of Unicode States In His Unicode Research Paper That, Many People In The Software Industry Don’t Realize How Important It Is To Localize Products For Different Languages Around The World. While English Is A Major Language, It Only Accounts For Around 30% Of The World Gross Domestic Product (gdp), And Is Likely To Account For Less In The Future. Neglecting Other Languages Means Ignoring Quite Significant Potential Markets.
His Short Article Provides A Picture Of Economic Significance Of Different Languages, With A Breakdown Of The Percentages Of World Gdp By Language. Not Only Does It Show The Current Breakdown, But It Also Provides Data For The Years 1975 To 2002 To Show Modern Trends. The Most Notable Feature Is Steady Rise Of Chinese And Slow Relative Decline Of Japanese And Most European Languages. Korean And Indic Languages Also Show Growth Over That Period, Though Slower Than Chinese.

The Gdp Values Are Expressed In Terms Of Purchasing Power Parity (ppp), Which Accounts For Price Differences Between Countries.
The Other Field Is The Accumulated Total For Languages For Which There Is Data, But Where Each Has Less Than 0.9% Of The World Gdp. While Each Language Separately Corresponds To A Small Percentage, Their Total Is Significant (about The Same As Chinese). In General, The Data Is Less Reliable For Smaller Language, So The Order Should Not Be Taken As Significant.

For Localization We Need Many Technical Approaches E.g. Translation Etc. But Here The Problem Is That Only 10% Of The Computer Literate Never Went Out Of Word-processing Or Never Touched Other Functions Included In Ms Office Or Other Applications. Using Ms Volt, To Become A Necessity Of All Their Arabic Based Languages For All Their Now And Future Characteristics, Fonts Are Never Considered As A Tool Of Localization. Considering Localization In Practice For A While, Another Problem Raises Its Head, I.e. Orthographic Or Script Processing On The Computer In Relation To Font Of The Concerned Languages. As Far As The Arabic Basic Script Of The Languages Is Concerned There Is Ever Growing Need Of Characters In The Arabic Script. But There Is No Room Left In Different Code Pages Of The Computer Standards. Unicode Allotted 06 Place For This Purpose, --- Then On 07 And Now Entering In The Page Of 08. Space Is A Big Problem For Ever-growing Characters Of The Arabic Based Languages. But Every Problem Has Its Solution. How? It Is Possible Only With A New Basis, I.e. Ghost Character Theory: Only 44 Ghost Characters Can Do All The Job And No Need To Find Extra Space For New Characters. There Are A Few Common Items/fractions Of The Characters/letters In Any Script.
J. Kew et. Al. (2003), States That The True Structure Of The Script Is Better Understood As A Small Set Of Underlying "skeleton" Letter Forms, To Which Patterns Of Dots ("nuqtas") Are Added To Differentiate Sounds And Letters Needed To Write A Particular Language.
It Will Have Many Benefits: E.g. Universality To Arabic Coverage Of The Block, Limiting The Block Explosion, Providing The Ease In Data Entry Operations Especially On Limiting Devices This Was A Thought Proved By Dr. Attash Durrani After The Ascii Code Plate For Urdu Was Devised In 1999. Full Atomization Was Presented In Its 2nd Version.
It May Introduce Normalization Issues In The Code Development Process. The Normalization Transformations Are Not Of The Transient Nature, But These Transformations Are There. A User Is Expected To Type In A Hybrid Of Both Forms. A Character May Be Either Of Two Cases (i) Collapsed Case (characters Having Diacritic (nuqta), (ii) Spread Cases (characters Without Diacritic (nuqta), (ii) Hybrid Case (mixture Of Collapsed And Spread Case).
Arabic Script Was Historically A “dot Less” Script. . By This We Mean That A Single Shape May Have Different Sounds Depending On The Word. Here Is An Example

·in The Figure Above, A Native Arabic Speaker Is Able To Comprehend The Meanings Of Text Based On Context And His/her Vocabulary. However, Anyone Less Familiar With Arabic Language Will Not Be Able To Understand The Correct Meanings Of The Text Because Of Limited Vocabulary And Unable To Understand The Context. The Main Reason Of Not Being Able To Read Such A Text The Sound Of A Character Is Heavily Dependent Upon The Context And Content Of The Text.
·to Overcome This Problem, A Muslim Caliph Introduced Nuqtas. The Sole Purpose Of The Dot Was To Sit On A Shape (where We Call Basic Or Ghost Shape Or Kashti) And To Depict Its Phonetic Status. Below Is The “dotted” Version Of The Above-mentioned Text.

arabic Phrase With Dots. Sound Of Characters Is Not To Be “guessed”

·now After The Placement Of Dots, Even A Non-native Reader Can Easily Understand The Text Without Any Hit And Trial Because Dots Are Sufficiently Depicting The Exact Sound Of The Character.
Later On, When New Languages Adopted Arabic Script As Their Script Of Choice A New Problem Arose And That Was Un-available Sounds (phonemes). For Example, The Urdu Has A Sound Exactly Equal To Sound Of “p” In English But Arabic Language Has No Such Sound And There Is No Means To Depict This Sound. Again Nuqta Comes To Rescue, Taking The Basic Shape Of Bey And Placing Three Dots Beneath It Solved This Problem. Here How It Looks

These Characters And The Dots Were Included In The Ascii Code –plate Of National Language Authority.

This Is The Point Of Present And Future Of Urdu Alphabets As Well As Of Other Pakistani Languages. Dr. Durrani Enlisted Pages From Amir Khusro’s “khaliq Bari”, Maulvi Abdul Haq’s Dictionary And Pedagogical Needs From Urdu Primers Of Nwfp.

The Reasons For Encoding The New Letterforms As A Unit And Not Encoding Combining Modifier Forms Separately Or Historic, Due To The Evaluation Of The Unicode Standard Are Simple: While Vowels And Punctuation Marks Have Been Encoded As Combining Marks, The Consonantal Base Letters Have Consistently Been Encoded In Unicode As Unit. To Change A Practice Would Open The Door To Multiple Representations For The Same Letters.
Some New Additions Were Also Made To Make It Simpler.
Arabic Tipple Nuqta Above = Arabic Double Nuqta Above + Arabic Single Nuqta Above

Arabic Tripple Inverted Nuqta Above = Arabic Single Nuqta Above + Arabic Double Nuqta Above

Sindhi Quadrple Nuqta Above = Arabic Double Nuqta Above + Arabic Double Nuqta Above

Dr. Durrani’s Ghost Characters Were Included In The International Standard Of Fonts/characters Unicode But Partially I.e. The Dot Less Character Set Was Completed By Including Dot Less Bey, Fey And Quaff In The Unicode Version 3.1. But There Was No Room For Dots And No Unicode Number Were Allotted To The Dots And Other Atoms. The Theory Request Was For The Addition Of 22 New Combining Characters To The Arabic Block Of Unicode Standard That Will Make Possible To Typeset Almost All Regional Languages Written In The Arabic Script:

there Were Different Costraints During The Development Of This Project I.e. Feasibility /development Constraint, Financial Constraint, Resourses Constraints, Personal Constrains And System (hardware And Software) Constraints.
according To Some Researchers, In The Development Of Unicode, Introduction Of Separate Nuqta Diacrtics For Arabic Would Be Problematic One. These Could Not Be Added To The Standrad Normilized Forms Due To The Stability Requirments And Having The Separate Nuqta Diacritics Without Normalization That Would Be A Security Problem For Which The Technical Committee Has Not Found A Solution.
these Characters Have Individualistic Script Existance And Are Often Needed In The Generation Of Electronic Texts Like Pedagogical Material. Unicode Had Already Added Many Entries From The Ascii Code Plate Notification Of Nla, Including The Notions Of Ghost Characters Thus Completing The Set Of Ghost Characters Of The Arabic Script. Now It Is Complimentry To Add Support For These Nuqta Characters To These Ghost Characters In The Code Blocks To Realize The Real Benefit Of The Set.
nuqtas Are Also Peresent In Quran As Separate Characters Like 2, 3 And 4 Nuqtas Above Used Separately. In These Cicumstances, Need For These Nuqta Marks As Separate Characters Is Of Immense Importance. Another Rationale Was Also Depicted By Dr. Durrani In The Following Examples Where The Nuqtas Are Red In Color,

the Project Was Rejected In 2003 By Unicode Technical Committee (utc) Due To This Reason That Addition Of The Combining Nuqta Characters Would Change The Encoding Model For Arabic. It Is Not Intended To Change The System Or Introduce A Parallal Or Duplicate Encoding System In The Arabic Block. It Is Just The Addition Of These Nuqta Characters Along With The Proposed Properties And If Introduces A Prallal System Then It Is An Additional Benefit Yeilding Self Sufficiency Of The Arabic Script.
but It Was Solved Later And Was Accepted That It Would Constitute An Untenable Destabilization In The Unicode Standard. It Was Precisely That Reason That Utc Was Forced To Reject The Proposal, Even Though The Committee As Whole Agreed That A Decomposed Representation For Arabic Script Would Have Been Preferable. Had It Been Done From The Outset Before Stability Became A Limiting Factor.
Unicode Could Restrict The Usage Of The Combining Nuqtas In Such A Way That Letters That Already Exist In Their Own Right Cannot Be Encoded As Sequences. Thus, The Sequence <dotless Bey, Nuqta Below> Would Be Defined To Not Combine And From A Letter Looking Bey. No New Ambiguities Are Therefore Introduced; Any Given Arabic Letter Still Only Has One Unicode Representation. There Is No Impact Whatsoever On Normalization.
It Also Requires Implementers To Deal With A Specific “exclusion List” Of Apparently-typical Sequences That Must Not Be Rendered “normally”, Nor Interpreted As If They Meant What “ought” To Mean. This Would Represent An Unwelcome Burden On Every Implementation That Wants To Handle Arabic Script In Any Way.
The Answer To This Was That The Ghost Characters Theory Already Exists In Unicode On Different Pages And There Was No Restriction For The Usage Of Nuqtas, So No Ambiguities Were To Be Introduced. It Was Suggested That 08 Place May Be Given To This New Set, I.e. Nuqtas Are Separate Characters. The Example On Page 06 Were Like

To Achieve This, It Is Proposed That Rather Than Adding The Decompositions Of The Current Recomposed Arabic Letters To The Ucd As Canonical Decompositions (which Seems Natural, But Contravenes Published Unicode Stability Policy), A New Property That Could Be Named “required” Should Be Defined. The Existing Recomposed Arabic Letters Would Have Their “decomposed Forms” Defined Here. The Intention Is That The Required Composition Property Gives Compositions That Must Always Be Used During Normalization – Even In Nfd.
The Unicode Standard Allows For The Dynamic Composition Of Accented Forms And Hangul Syllables. Combining Characters Used To Create Composite Forms Are Productive. Because The Process Of Character Composition Is Open-ended, New Forms With Modifying Marks May Be Created From A Combination Of Base Characters Followed By Combining Characters. For Example, The Diacritics “..” May Be Combined With All Vowels And A Number Of Consonants In Languages Using The Latin Script And Several Other Scripts.

There Are Many Ways To Categorize The Points. This Illustrates Some Of The Categorizations And Basic Terminology Used In The Unicode Standard. Not All Assigned Code Points Represent Abstract Characters; Only Graphic, Format, Control And Private-use Do. Surrogates And Noncharacters Are Assigned Code Points But Are Not Assigned To Abstract Characters. Reserved Code Points Are Assignable: Any May Be Assigned In A Future Version Of The Standard. The General Category Provides A Finer Breakdown Of Required Character Codes Following The Base Character. For Combining Characters Placed Below A Base Character, The Situation Is Reversed, With The Combining Characters Starting From The Base Character And Stacking Downward.

Another Example Of Multiple Combining Characters Above The Base Character Can Be Found In Thai, Where A Consonant Letter Can Have Above It One Of The Vowel U+0e34 Through U+0e37 And, Above That, One Of Four Tone Marks U+0e48 Through U+0e4b. The Order Of Character Codes That Produces This Graphic Display Is Base Consonant Character + Vowel Character + Tone Mark Character.

Ligated Base Character With Multiple Combining Marks Do Not Commonly Occur In Most Scripts. However, In Some Scripts, Such As Arabic, This Situation Occurs Quite Often When Vowel Marks Are Used. It Arises Because Of The Large Number Of Ligatures In Arabic, Where Each Element Of A Ligature Is A Consonant, Which In Turn Can Have A Vowel Mark Attached To It. Ligatures Can Even Occur With Three Or More Characters Merging; Vowel Marks May Be Attached To Each Part.

In Cases Involving Two Or More Sequences Considered To Be Equivalent, The Unicode Standard Does Not Prescribe One Particular Sequence As Being The Correct One; Instead, Each Sequence Is Merely Equivalent To The Others. Figure Illustrates The Two Major Forms Of Equivalent Sequences Formally Defined By The Unicode Standard. In The First Example, The Sequences Are Canonically Equivalent. Both Sequences Should Display And Be Interpreted The Same Way. The Second And Third Examples Illustrate Different Compatibility Sequences. Compatible-equivalent Sequences May Have Format Difference In Display And May Be Interpreted Differently In Some Contexts.

A Key Part Of Normalization Is To Provide A Unique Canonical Order For Visually No Distinct Sequences Of Combining Characters. Figure Shows The Effect Of Canonical Ordering For Multiple Combining Marks Applied To The Same Base Character.

When Combining Characters Do Not Interact Typographically, The Relative Ordering Of Contiguous Combining Marks Cannot Result In Any Visual Distinction And Thus Is Insignificant
Then It Was Suggested By Dr. Durrani That 08 Or Other Place Might Be Given To This New Decomposed Set So There Will Be No Duplication Or Problem Of Normalization. Later This Place I.e. 08 Was Allotted To The Proposal Of Dr. Durrani.
The Question Of Appropriate Combining Classes For The Nuqtas Requires Some Attention. Given That The Nuqtas Are Closely Associated With The Base Letter, It Seems Natural To Assign Them A Low Combining Class Value; This Would Keep Them Close To The Base Letter In Nff, Which Could Benefit Analytical Processes And Rendering Systems. It Could Also Tend To Help The Efficiency Of The Nfc/nfd Algorithms Which Need To Recombine Base + Nuqtas Sequences.
There Is Already A Combining Class, 7, Used For “nuqtas” In Indic Scripts; These Are Consonant-modifiers That Go Below The Basic Consonant, And Thus Very Similar To The Proposed Arabic Nuqtas. It Was Suggested, Therefore, Using This Same Combining Class Value For The Nuqtas That Are Positioned Below The Base Or Ghost Letter, And 6 (8 Is Already In Use) For Those That Go Above. Nuqta-like Marks That Actually Attach To The Base Letter (ring, As Seen On U+067c And Others; Stroke Through, As Seen On U+06c5) Could Have Combining Class I (also Used For Combining Overlays Is That They Differ From The Class Of The Combining Hamza Marks That Are Already In Unicode. U+0681 Show That The Hamza Form Has Been Used As A Nukta-like Mark To Create A New Letter In At Least One Instance, In Addition To Its Conventional Use Of Alef, Waw, And Yeh. It Therefore Seems Unfortunate For It Not To Share The Combining Class Value Of The Other Nuqtas.
Here There Was No Need Of More Discussion As Nuqtas Are Now Declared As Separate Character.
Jonathan Kew, (2003) Have Also Stated That A Variety Of Letters That Are Not Represented In Unicode 4.0. Some Of The More “interesting” Letters Are Highlighted. Note That In Many Cases, Several Different Writing Conventions For The Same Language Are Mentioned. Even If Some Characters Are Eventually Dropped During Orthographic Standardization/reform Of These Languages, The Fact That They Have Been Traditionally Used By Some Writers Mean That They Need To Be Taken Into Considerations, Otherwise Existing Texts Cannot Be Encoded. Bey Skeleton With Two Dots Vertically Above Right End; Noon Or Bey Skeleton (ambiguous, Because Chart Shows Linked Initial Form) With Dot Above And Two Dots Below I.e. Songhoy:

Hah With Two Dots Above; Ain With Two Dots Above. Songhoy Language.

After All This Effort Here Somewhat The Success Story Starts And The Dr. Durrani’s Proposed Theory I.e. “ghost Characters Theory” Got Accepted And All The Proposed Characters Were Given 08 Place On Unicode. Following Were The Codes That Were Assigned To The Ghost Characters Of Nuqtas.
According To The Proposal 22 Additions Were Requested. They Have Taken The Name And Tried To Align Them With The Notion Of Being Spacing Characters. The Names Are Also Updated To The Usual Style For Such Characters, Beginning With “arabic” For The Script, And Then Annotated Where Appropriate For The Particular Language.
·0880 Arabic Single Nuqta Above
·0881 Arabic Single Nuqta Below
·0882 Arabic Double Nuqta Above
·0883 Arabic Double Nuqta Below
·0884 Arabic Triple Nuqta Above
·0885 Arabic Triple Nuqta Below
·0886 Arabic Triple Inverted Nuqta Above
·0887 Arabic Triple Inverted Nuqta Below
·0888 Arabic Quadruple Nuqta Above
·* Sindhi
·0889 Arabic Quadruple Nuqta Below
·* Sindhi
·088a Arabic Double Danda Above
·*sindhi
·088b Arabic Double Danda Below
·088c Arabic Double Nuqta Vertical Above
·*sindhi
·088d Arabic Double Nuqta Vertical Below
·*sindhi
·088e Arabic Single Kashida Above
·*urdu
·088f Arabic Single Kashida Below
·*urdu
·0890 Arabic Double Kashida Above
·*urdu
·0891 Arabic Double Kashida Below
·*urdu
·0892 Arabic Single Circle Above
·*pashto
·0893 Arabic Single Circle Below
·*pashto
·0894 Arabic Tota Above
·*urdu
·0895 Arabic Tota Below
·*urdu
This Is A Turning Point In The History Of Arabic Fonts. Any Character/letter For Any Language Based On The Arabic Script. There Are Only 44 Atomized Or Ghost Characters Can Be Normalized Or Formed By These 44 Characters, Hence No Need Of Different Font For Different Languages. Any Pakistani Language Font Developer Or Linguist Can Derive Any Character Having Any Atom-combination. Dr. Attash Durrani’s Ghost Theory Is A Revolutionary Step In The Field Of Font Development.
Here Is The Example Of Nla’s Pak Nastaleeq Font Depicting Arabic, Urdu, Pushto, Persian, Sindhi Processed With A Single Font And That Is The Fruit.

Urdu, Arabic And Persian
بِسْمِ اللّٰہِ الرَّحْمٰنِ الرَّحِیْمِ تمام تعریفیں اللّٰہ ربُّ الْعِزَّت کے لیے ہیں جو تمام جہانوں کا ربّ ہے۔ وہی سزا وار حمدو ثنا ہے۔ اسی کے قبضے میں تمام کائنات ہے۔ اے اہلِ ایمان! صلٰوۃ و زکوٰۃ کا اہتمام کرو۔ نبی اکرمؐ۔ حضرت نوحؑ۔ حضرت عثمانؓ۔ قائد اعظمؒ۔ غالبؔ۔ ویکی‌پدیا پروژه‌ای چندزبانه برای گردآوری دانشنامه‌ای جامع و با محتویات آزاد است. این پروژه (به زبان انگلیسی) از ژانویهٔ ۲۰۰۱ آغاز شده و اکنون ۱۳[font=&quot]٬[/font]۷۰۰ مقاله به زبان فارسی دارد. شما هم می‌توانید مقالات را ویرایش کنید. برای فراگیری و تمرین این کار می‌توانید نخست به صفحهٔ راهنما رفته و سپس در گودال ماسه‌بازی آزمایش کنید. لأن مصطلح الديمقراطية يستخدم لوصف أشكال الحكم و المجتمع الحر بالتناوب، فغالباً ما يُساء فهمه لأن المرء يتوقع عادة أن تعطيه زخارف حكم الأغلبيا كل مزايا المجتمع الحر. إذ في الوقت الذي يمكن فيه أن يكون للمجتمع الديمقراطي حكومتہ ديمقراطتہ فإن وجود حكومتہ ديمقراطيا لا يعني بالضرورة وجود مجتمع ديمقراطي. لقد إكتسب مصطلح الديمقراطيتہ إيحاءً إيجابياً جداً خلال النصف الثاني من القرن العشرين الى حد دفع بالحكام الدكتاتوريين الشموليين للتشدق بدعم "الديمقراطيا" وإجراء

[font=&quot]sindhi And Pushto[/font]
هي دڙو ڪوٽ ڏيجي جي قلعي جي اتر طرف کان ڪراچيءُ واري شاهراهه ڀرسان آهي. هن دڙي ۾ دفن ٿيل تهذيب جي نشاندهي 1935ع ۾ بمبئي يونيورسٽيءَ جي هڪ اسڪالر ماڌو سروپ ڪئي هئي. سندس خيال موجب اها تهذيب تاريخ کان اڳ واري زماني جي هئي. ان کان پوءِ 1955ع ۾ قديم آثارن جي محڪمي، نومبر مهيني ۾ کوٽائي جو ڪم شروع ڪي. وري سال 1957ع ۾ آڪٽوبر، نومبر ۽ ڊسمبر جا ٽي مهينا دوباره کوٽائي ڪرائي وئي. انهيءَ کوٽائي جو ڪم لاءِ مير علي مراد خان ثانيءَطرفان ويهه هزار روپين جو عطيو ڏنو ويو. کوٽائي مان جيڪي شيون هٿ آيون، تن کي ڏسي قديم آثارن جي ماهرن اندازو لڳايو تہ ڪوٽ ڏيجي جي تهذيب موهن جي دڙي واري تهذيب کان ست سئو سال پراڻي آهي. انهيءَ دڙي جي ڊيگه اوڀر کان اوله طرف تي ڇه سئو فوٽ ويڪر چار سئو فوٽ ۽ اچائي چاليه فوٽ آهيپه حکومت کې د اطلاعاتو او کلتور سرپرست وزير سيد مخدوم رهين ، د دولت په استازيتوب په خپل غبرگون کې بي بي سي ته ويلي چې په ځينو دغو تبصرو کې يا د افغانستان د حالاتو ځانگړتياوې له پامه ويستل کيږي او يا په تر لاسه شويو برياوو سترگې پټيږي . د ډيرو په باور دا داسې يو څه دي چې هم يې د حکومت د واکمنۍ اصل کمزورى کړى او هم يې ولسمشر کرزى د خپلو کړو ژمنو په پوره کولو کې پاتې راوستی دی په دې کې د نيوزويک ، نيويارک ټايمز او واشنگټن پوست په څيرد معتبره امريکايې ورځپاڼو هغو شننو ته گوته نيول کيږي چې د حالاتو په راکابو کولو کې افغان حکومت پاتې بولي .

[font=&quot]total Ghost Characters Are 44 Out Of Which 22 Are Kashties.[/font]

references:
Attash Durrani ,dr., Letter To Jonathon (rationale For Nuqta Proposal)
Attash Durrani ,dr., Letter To Mark Davis.
Attash Durrani, Dr., 2006. Nuqta Marks In Arabic Detailed Character Properties.
Jonathan Kew., 2003. Images Of Potential Extended Arabic Characters.
Mark Davis, Kamal Mansour,. 2002. Proposal To Amend Arabic Repertoire.
Mark Davis,. 2003. Unicode Technical Note # 13. P 1-5.
--------------------------------------------------------------------------------------------------
assistant Informatics Officer, Center Of Excellence For Urdu Informatics, National Language Authority, Islamabad-pakistan

یہ مضمون بطور معاون افسر اطلاعیات کے طور پر پوسٹ کیا گیا ہے آخری سطر غور سے ملاحظہ فرمائیے۔ بعد ازاں نظریہ ضرورت کے تحت یہ حیثیت تبدیل ہو جاتی ہے۔
غالبا انک سافٹ کے مالکان اور مرکز میں بھرتی ان کے ملازمین کو صورتحال کی سنگینی کا بخوبی اندازہ ہو گیا ہے اسی لیے گمشدہ ترجمان کو دفتری اوقات کے چار گھنٹے بعد شام سات بجے کے قریب صراحت کرنا پڑی کہ یہ ان کی ذاتی آرا ہیں، ان کا ادارے سے کوئی تعلق نہیں۔
واللہ خیر الماکرین۔

ظہور احمد سولنگی · جون 20، 2008

جیسا کہ میں نے کہا تھا کہ محبوب خان صاحب اور بھی فورمز پر اس قسم کی پوسٹیں کرتے رہے ہیں اس کے ثبوت میں یہ لنک دیکھ لیں:
http://www.paklinks.com/gs/showthread.php?t=278837
اس کے جواب میں ایک ہی ممبر کا رپلائی آیا جو یہ ہے
yeh to Sunday morning nashtay kay baad parhnay wala article hay, pooray 2 ghantay laga kar banda araam say parhay. Abhee issay parhna meray liyay Naa mumkin hay.
اس کو اردو میں بھی لکھ دیتا ہوں تاکہ لوگوں کو پتہ چلے۔
یہ تو سنڈے مارننگ ناشتے کے بعد پڑھنے والا آرٹیکل ہے، پورے دو گھنٹے لگا کر بندہ آرام سے پڑھے۔ ابھی اسے پڑھنا میرے لیے نا ممکن ہے۔

یعنی کسی نے سنجیدہ نوٹس نہیں لیا۔ یہ ایک پوری پروپیگنڈہ مہم تھی جس میں اس نے کم سے کم آٹھ دس فورمز پر پوسٹیں کیں سرکاری حیثیت میں جس میں انک سافٹ کا نام بھی پھیلانے کی کوشش کی گئی۔ یہاں اردو ویب پر بھی ان کی پوسٹوں کا مقصد لوگوں کو بتانا نہیں ہے بلکہ اگر کوئی واہ واہ کرتا ہے تو یہ اسے اوپر دکھا کر مزید فنڈ لینے کا چکر ہے۔ اور جب یہاں پکڑے گئے تو یہ کہہ دیا کہ میں ذاتی حیثیت میں کر رہا ہوں حالانکہ محبوب خان کو ذمہ داری دی گئی تھی کہ انٹرنیٹ پر پروپیگنڈا کرو اس قسم کا۔

ظہور احمد سولنگی · جون 20، 2008

یہ رہا جناب ایک اور لنک:
http://www.hallagulla.com/urdu/computers-64/institute-urdu-informatics-dr-attash-durrani-184934.html
یہاں بھی وہی مضمون محبوب صاحب نے اپنے ہی نام سے پوسٹ کیا ہے۔ میں پہلے بتا چکا ہوں کہ اسے ذمہ داری دی گئی تھی کہ یہ انٹرنیٹ پر انک سافٹ کی بالواسطہ مشہوری کرے۔
انک سافٹ سے متعلق لنک بھی پوسٹ کرتا ہوں تاکہ لوگوں کے سامنے یہ بات بھی آ جائے کہ انک سافٹ کیا ہے اور اس کا مرکز فضیلت سے کیا تعلق ہے

ظہور احمد سولنگی · جون 20، 2008

ایک اور فورم

جناب یہ اور فورم ہے جہاں یہ انک سافٹ والا آرٹیکل محبوب خان نے پوسٹ کیا ہے یاد رہے کہ مذکورہ فورم اردو سے پیار کرنے والے حضرات کا نہیں

http://www.paktribune.com/pforums/posts.php?t=6383&start=1#112554

ظہور احمد سولنگی · جون 20، 2008

یہ بھی نوٹ کیا جائے کہ ہر آرٹیکل کے نیچے ادارے کے سربراہ کا نام اور عہدہ سرکاری حیثیت میں ہے ملاحظہ کریں
*Project Director, Centre of Excellence for Urdu Informatics (CEUI), National Language Authority, Cabinet Division, Islamabad.
اب کیسے انکار کیا جا سکتا ہے کہ یہ سرکاری حیثیت میں نہیں؟

Institute for Urdu Informatics By Dr. Attash Durrani

محفلین

محفلین

لائبریرین

محفلین

محفلین

لائبریرین

محفلین

محفلین

محفلین

محفلین

محفلین

محفلین

تکنیکی معاون

محفلین

محفلین

محفلین

محفلین

محفلین

محفلین

محفلین