Appendix A. Character Codes

HTTP is a protocol used mainly for downloading the data from the Web server to the browser, but is also used for uploading the data in the cases such as the HTML form.

The standards for passing on the character codes of the downloaded contents from the Web server to the browser are defined explicitly in the Servlet APIs as well. However, the handling of the character codes of the HTTP query strings and the HTTP request body was not clear in the earlier Servlet APIs (Servlet API 2.2 and earlier); therefore, each vendor handled the character codes differently.

From Servlet API 2.3, you can specify the character codes when you reference the HTTP query strings and HTTP request body through the Servlet APIs (however, excluding the HTTP request body with the multipart/form-data encoding type). Such data is considered as the data of the specified character codes, is transformed to Unicode, which is the internal expression of Java, and is then passed to the application. If the resource character codes and the character code conversion to Unicode are incorrect, 'garbled characters' might occur, so the execution processes and the resource character codes must be considered during application development.

This section describes the character codes used in the applications and the notes. This section also describes the notes on character code conversion when the data is exchanged with the browser.

Organization of this section
A.1 Character codes handled in an application
A.2 Character code conversion between the browser and application