Appendix A.1 Character codes handled in an application

The following figure shows the flow of character code conversion on the basis of the application configuration.

Figure A-1 Flow of character code conversion in an application

[Figure]

The numbers in the figure ((1) to (5)) indicate the programs and resources for which the character codes must be taken into account. Also, the letters in the figure (A to E) indicate the flow of data during character code conversion. The following tables describe the character codes handled in the programs and resources and the notes, and the operations and notes on character code conversion respectively.

Table A-1 Character codes handled in the programs and resources and notes

Program and resource namesItemsHandled character codes and notes
(1) Input pageURL pathThe character code of the URL path must be ISO-8859-1. Non-ASCII characters such as shift JIS cannot be coded.
Query stringThe characters sent using the HTML FORM tag are sent by URL encoding using the same character code as the HTML page displaying the form.
POST data
(2) JSPJSPA JSP can be created using any character code.
To specify the character encoding for the JSP documents in the Web applications of version 2.4 or later (JSP 2.0 specifications or later), specify the character encoding using XML declaration.
With the JSP 1.2 specifications, the JSP character code must be coded in the pageEncoding attribute of the page directive.
(3) HTMLHTMLAn HTML page can be created using any character code.
(4) DatabaseDatabaseDetermine the character code of the data stored in the database taking into account the character code displayed in the browser.
(5) Output pageResponse headerThe character code must be ISO-8859-1. When you use non-ASCII characters, the URL must be encoded.
Response bodyAny character code can be used for a user program.
Note
With a Web container, you cannot use character codes that express the alphanumeric characters in double bytes. The character codes that use double bytes for the alphanumeric characters are as follows:
  • UCS-2 (ISO/IEC 10646)
  • UCS-4 (ISO/IEC 10646-1)
  • UTF-16

    Table A-2 Operations and notes on character code conversion

    Conversion locationTargetOperations and notes on character code conversion
    A Browser to J2EE serverURL pathWith a Web container, the character code for the URL path is processed as ISO-8859-1.
    Query stringThe character code for the query string or POST data is determined in the application randomly. With the servlets and JSPs, the character code is handled with Unicode; therefore, convert the character code so that the character codes are consistent in the application.
    POST data
    B In the J2EE serverJSP filesThe file is read with the encoding coded in the pageEncoding attribute of the page directive. If the pageEncoding attribute is omitted, the contentType attribute is used.
    HTML filesSent to the browser using the character code of the HTML file as is.
    C J2EE server to databaseDatabaseUnicode is converted to the database storage character code using the JDBC driver.
    D Database to J2EE serverDatabaseThe database storage character code is converted to Unicode using the JDBC driver.
    E J2EE server to browserResponse headerThe Web container converts the character code of the response header to ISO-8859-1.
    Response body
    For servlets
    Specify the character code using the setContentType method of the ServletResponse class.
    For JSPs
    Specify the character code in the contentType attribute of the page directive.
    Reference note
    Note the following about possible garbled characters:
    The character code is handled as Unicode in the execution environment of an application. Therefore, a string sent from the browser is converted into Unicode once. Also, when the database is accessed, the character codes must be converted between Unicode and the database storage character codes, and during response, the character codes must be converted from Unicode to the response character code. If character code conversion is not performed appropriately, garbled characters are caused.
    This occurs due to inclusion of the machine dependent characters in the character code called shift JIS or due to the presence of characters in which the results of Unicode conversion for the same characters differ from the other character codes#. For example, if the browser sends character data containing machine dependent characters and this data is converted to Unicode, if this string is converted to shift JIS during response, the result has garbled characters.
    If the client OS is definable on Windows, garbled characters are avoided by specifying the character code as MS932 or Windows-31J, instead of shift JIS.
    #
    Includes characters such as -, ~, //, 「, and 」.