Unicode support for multilanguage data in Outlook 2007

Microsoft Outlook supports Unicode and provides full support for multilingual data. If you work in a multinational organization or share messages and items with people who use Outlook on systems that run in other languages, you can take advantage of Unicode support in Outlook.

Outlook can run in one of two mailbox modes on with Exchange accounts — Unicode or non-Unicode. Unicode mode is recommended and is the default mode if the configurations of your profile, the Exchange account, and administrator settings allow it. The mode is automatically determined by Outlook based on these settings and cannot be changed manually.

Running Outlook in Unicode mode will enable you to work with messages and items that are composed in different languages. If Outlook is running in non-Unicode mode with your Exchange account and if you would like to switch to Unicode mode, contact your administrator.

Note: Earlier versions of Outlook provided support for multilingual Unicode data in the body of Outlook items. However, Outlook data, such as the To and Subject lines of messages and the ContactName and BusinessTelephoneNumber properties of contact items, were limited to characters defined by your system code page. This limitation is no longer the case in Microsoft Office Outlook 2003 and Microsoft Office Outlook 2007, provided Outlook is running in Unicode mode with an Exchange account.

POP3 accounts also have the capability to support multilingual Unicode data in Microsoft Office Outlook 2003 and Office Outlook 2007, provided the items are delivered to a Personal Folders file (.pst) that can support multilingual Unicode data. By default, new POP3 profiles that deliver to a new .pst file created in Microsoft Office Outlook 2003 and Office Outlook 2007 support multilingual Unicode data.

Note: Other accounts such as IMAP and HTTP do not support Unicode.

Scripts

Multilingual messages and items can contain text in languages that require different scripts. A single script can be used to represent many languages.

For example, the Latin or Roman script has character shapes — glyphs — for the 26 letters (both uppercase and lowercase) of the English alphabet, as well as accented (extended) characters used to represent sounds in other Western European languages.

The Latin script has glyphs to represent all of the characters in most European languages and a few others. Other European languages, such as Greek or Russian, have characters for which there are no glyphs in the Latin script; these languages have their own scripts.

Some Asian languages use ideographic scripts that have glyphs based on Chinese characters. Other languages, such as Thai and Arabic, use scripts that have glyphs that are composed of several smaller glyphs or glyphs that must be shaped differently depending on adjacent characters.

A common way to store plain text is to represent each character by using a single byte. The value of each byte is a numeric index — or code point — in a table of characters; a code point corresponds to a character in the default code page of the computer on which the text document is created. For example, a byte value of decimal 189 (the code point for the decimal value 189) will represent different characters in different code pages.

Code pages

A table of characters grouped together is called a code page. For single-byte code pages, each code page contains a maximum of 256 byte values; because each character in the code page is represented by a single byte, a code page can contain as many as 256 characters.

One code page with its limit of 256 characters cannot accommodate all languages because all languages together use far more than 256 characters. Therefore, different scripts use separate code pages. There is one code page for Greek, another for Japanese, and so on.

In addition, single-byte code pages cannot accommodate most Asian languages, which commonly use more than 5,000 Chinese-based characters. Double-byte code pages were developed to support these languages.

The Unicode character encoding standard enables the sharing of messages and other items in a multilingual environment when the languages involved span multiple code pages.

Non-Unicode systems typically use a code page–based environment, in which each script has its own table of characters. Items based on the code page of one operating system rarely map well to the code page of another operating system. In some cases, the items cannot contain text that uses characters from more than one script.

For example, consider two people — one is running the English version of the Microsoft Windows XP operating system with the Latin code page and the second person is running the Japanese version of the Microsoft Windows XP operating system with the Japanese code page. The second person creates a meeting request in the Japanese version of Microsoft Outlook 2002 with Japanese characters in the Location field and sends it to the first person. When the person using the English version of Outlook 2002 opens the meeting request, the code points of the Japanese code page are mapped to unexpected or nonexistent characters in the Latin script, and the resulting text is unintelligible.

Note: Since Microsoft Outlook 2000, the body of Outlook items is Unicode, and the body of the item can be read irrespective of the language in which the item was created. However, all the other item properties such as the To, Location, and Subject lines of messages and meeting items and the ContactName and BusinessTelephoneNumber properties of contact items will be unintelligible in versions earlier than Outlook 2003.

The universal character set provided by Unicode eliminates this problem. Unicode was developed to create a universal character set that can accommodate most known scripts. Unicode uses a unique, multi-byte encoding for every character; so in contrast to code pages, every character has its own unique code point. For example, the Unicode code point of Greek lowercase zeta ( zeta ) is the hexadecimal value 03B6, and Cyrillic lowercase zhe ( zhe ) is 0436.

Microsoft Office Outlook 2003 and Outlook 2007 are fully capable of using Unicode. The code page system of representing text also exists in Outlook. However, Unicode mode is recommended and is the default mode if the configurations of your profile, Exchange account, and administrator settings allow it. Also, the mode is automatically determined by Outlook based on these settings and cannot be changed manually.

Running Outlook in Unicode mode with an Exchange account ensures that by default, the Offline Folder files (.ost) and Personal Folders files (.pst) used for the profile has the ability to store multilingual Unicode data and offers greater storage capacity for items and folders. If Outlook is running in non-Unicode mode with an Exchange account, and you would like to switch to Unicode mode, contact your administrator.

If you do not share messages and items with people who use Outlook on systems that run in other languages, you can run Outlook in Unicode or non-Unicode mode with an Exchange account.

If you work in a multinational organization or share messages and items with people who use Outlook on systems that run in other languages, Outlook should run in Unicode mode with an Exchange account. To switch to Unicode mode, contact your administrator.

When Outlook runs in non-Unicode mode with an Exchange account, the code page-based system is used for character mapping. In a code page-based system, a character entered in one language may not map to the same character in another language. Therefore, you are likely to see incorrect characters, including question marks.

For example, consider two people — one is running the English version of the Microsoft Windows XP operating system with the Latin code page and the second person is running the Japanese version of the Microsoft Windows XP operating system with the Japanese code page. The second person creates a meeting request in the Japanese version of Outlook 2002 and sends it to the first person. When the person using the English version of Outlook 2002 opens the meeting request, the code points of the Japanese code page are mapped to unexpected or nonexistent characters in the Latin script, and the resulting text is unintelligible.

Note: Since Outlook 2000, the body of Outlook items is Unicode, and the body of the item can be read irrespective of the language in which the item was created. However, all the other item properties such as the To, Location, and Subject lines of messages and meeting items and the ContactName and BusinessTelephoneNumber properties of contact items will be unintelligible in versions earlier than Outlook 2003.

Arial Unicode MS font is a full Unicode font. It contains all of the characters, ideographs, and symbols defined in the Unicode 2.1 standard. This universal font is automatically installed if you are using Windows Vista or Microsoft Windows XP.

Because of its considerable size and the typographic compromises required to make such a font, Arial Unicode MS should be used only when you can't use multiple fonts tuned for different writing systems. For example, if you have multilingual data from many different writing systems in Microsoft Office Access, you can use Arial Unicode MS as the font to display the data tables, because Access can't accept many different fonts.

  1. On the File menu, point to New, and then click Outlook Data File.

  2. To create a Personal Folders File (.pst) that offers greater storage capacity for items and folders and supports multilingual Unicode data, click the Personal Folders File for your version of Outlook, and click OK.

  3. In the File name box, type a name for the file, and then click OK.

  4. In the Name box, type a display name for the .pst folder.

  5. Select any other options you want, and then click OK.

    The name of the folder associated with the data file appears in the Folder List. To view the Folder List, on the Go menu, click Folder List. By default, the folder will be called Personal Folders.

If you are experiencing any problems using Unicode, check the following list for some solutions.

I upgraded to Outlook 2003 or Outlook 2007, but Outlook isn't running in Unicode mode with an Exchange account

There could be several reasons why Outlook is still not running in Unicode mode.

  • If your profile was configured to run in offline mode before you upgraded to Microsoft Office Outlook 2003 or Outlook 2007, you are still using the old Offline Folder file (.ost) that does not support Unicode. This will result in Outlook running in non-Unicode mode with an Exchange account. To switch to Unicode mode, disable the use of offline folders, then create a new Offline Folder file (.ost) and synchronize your data.

  • If your profile was configured to deliver to a Personal Folders file (.pst) before you upgraded to Microsoft Office Outlook 2003 or Outlook 2007, you are still using the old Personal Folders file (.pst) that does not support Unicode. This will result in Outlook running in non-Unicode mode with the Exchange account. To switch to Unicode mode, you should change the delivery location to a Personal Folders file that supports multilingual Unicode data, or you should change the default delivery location to the Exchange account.

  • If your profile was configured to use an AutoArchive Personal Folders file (.pst) before you upgraded to Microsoft Office Outlook 2003 or Outlook 2007, you are still using the old AutoArchive Personal Folders file (.pst) that does not support Unicode. We recommend that you create a new AutoArchive Personal Folders file that supports multilingual Unicode data.

  • The Exchange version or the policies set by your administrator may be preventing Outlook from running in Unicode mode.

If none of the above helped you switch to Unicode mode, contact your Exchange administrator.

I upgraded to Outlook 2003 or Outlook 2007, but my POP3 account still doesn't support multilingual Unicode data

If your profile was configured to deliver to a Personal Folders file (.pst) before you upgraded to Microsoft Office Outlook 2003 or Outlook 2007, you are still using the old Personal Folders file (.pst) that does not support Unicode for storing items delivered from the POP3 account. To resolve this, you should change the delivery location to a Personal Folders file (.pst) that supports multilingual Unicode data.

The Offline Folders file I selected caused Outlook to switch to non-Unicode mode, and now some items display '?' characters and are unreadable

When Outlook runs in non-Unicode mode with an Exchange account, the code page-based system is used for character mapping. In a code page-based system, a character entered in one language may not map to the same character in another language. Therefore, you are likely to see incorrect characters, including question marks.

For example, consider two people — one is running the English version of the Microsoft Windows XP operating system with the Latin code page and the second person is running the Japanese version of the Microsoft Windows XP operating system with the Japanese code page. The second person creates a meeting request in the Japanese version of Outlook 2002 and sends it to the first person. When the person using the English version of Outlook 2002 opens the meeting request, the code points of the Japanese code page are mapped to unexpected or nonexistent characters in the Latin script, and the resulting text is unintelligible. Therefore, in multilingual environments, we recommend that Outlook run in Unicode mode with an Exchange account.

To resolve this, disable offline folders, close and restart Outlook, and then create a new Offline Folder file and synchronize the data.

How will using a non-Unicode data file or running Outlook in non-Unicode mode with an Exchange account affect me?

If you do not share messages and items with people who use Outlook on computers that run in other languages, you can run Outlook in Unicode or non-Unicode mode with an Exchange account. A disadvantage of running in non-Unicode mode is that the Offline Folder file used for the profile will be created in the format that does not offer greater storage capacity for items and folders. Therefore, if size limit of the Offline Folder file is a concern for you, then you should run Outlook in Unicode mode with an Exchange account.

However, if you work in a multinational organization or share messages and items with people who use Outlook on systems that run in other languages, Outlook should run in Unicode mode with an Exchange account. This will also ensure that Unicode-capable .pst files are used for the profile that have the capability to store multilingual Unicode data. To switch to Unicode mode, see the "I upgraded to Outlook 2003 or Outlook 2007, but Outlook isn't running in Unicode mode with an Exchange account" section above.

When Outlook runs in non-Unicode mode with an Exchange account, the code page-based system is used for character mapping. In a code page-based system, a character entered in one language may not map to the same character in another language and therefore, if Outlook runs in non-Unicode mode with an Exchange account, you are likely to see incorrect characters, including question marks.

For example, consider two people — one is running the English version of the Microsoft Windows XP operating system with the Latin code page and the second person is running the Japanese version of the Microsoft Windows XP operating system with the Japanese code page. The second person creates a meeting request in the Japanese version of Outlook 2002 and sends it to the first person. When the person using the English version of Outlook 2002 opens the meeting request, the code points of the Japanese code page are mapped to unexpected or nonexistent characters in the Latin script, and the resulting text is unintelligible. Therefore, in multilingual environments, we recommend that Outlook run in Unicode mode with an Exchange account.

Note: Since Outlook 2000, the body of Outlook items has been Unicode, and the body can be read irrespective of the language in which the item was created. However, Outlook data, such as the To and Subject lines of messages and the ContactName and BusinessTelephoneNumber properties of contact items, will be limited to characters defined by your code page if Outlook runs in non-Unicode with an Exchange account.

See also

Add a language or set language preferences in Office 2007

Check spelling and grammar in a different language

Share Facebook Facebook Twitter Twitter Email Email

Was this information helpful?

Great! Any other feedback?

How can we improve it?

Thank you for your feedback!

×