uCosminexus Stream Data Platform - Application Framework Setup and Operation Guide

[Contents][Index][Back][Next]

9.10.2 HTTP packet input connector definition

You specify an HTTP packet input connector definition (HttpPacketInputConnectorDefinition tag) as a child element of a CB definition for input (InputCBDefinition tag).

For details about HTTP packet input processing, see 10.3 HTTP packet input.

Organization of this subsection
(1) Format
(2) Details of definition
(3) Example
(4) Information to be specified in the command tag when WinDump is used
(5) Identifiers that can be specified as field names
(6) Java data types that are stored as field values

(1) Format

<HttpPacketInputConnectorDefinition>
  <input buffersize="I/O-buffer-size"
   assemblingtime="packet-segment-assembly-time">
    <packetdata
     globalheader="global-header-area-size"
     packetheader="packet-header-area-size"
     packetoffset="offset-for-packet-data-size-area"
     packetlength="packet-data-size-area-size"
     timeoffset="offset-for-timestamp-area"/>
    <command path="command-path-name"
     parameter="command-parameter"/>
  </input>
  <output unit="maximum-output-unit">
    <record name="record-name" type="{REQUEST|RESPONSE}">
      <field name="field-name"/>
    </record>
  </output>
</HttpPacketInputConnectorDefinition>

(2) Details of definition

HttpPacketInputConnectorDefinition tag (all definition information)
Defines all HTTP packet input connector definition information. You specify this definition only once.

input tag (input definition)
Defines the HTTP packet input or output information. You specify this definition only once.
buffersize="I/O-buffer-size"
Specifies as an integer from 1 to 12288 the maximum number of HTTP packets that can be stored in the input buffer or the maximum number of common records that can be stored in the output buffer. If this attribute is omitted, 4096 is assumed.
assemblingtime="packet-segment-assembly-time"
Specifies as an integer from 1 to 5000 the packet segment assembly time (in milliseconds). If this attribute is omitted, 2000 is assumed.
Packet segments are packet data that has been segmented in TCP hierarchy according to the maximum segment length (MSS: maximum segment size). If the data collected by the HTTP packet input connector is packet segments, packets are assembled on the basis of the TCP protocol header information. The packet segment assembly time means the time required to link the received packet segments to obtain a complete packet since a new packet segment arrived. Each time packet segments are assembled into a packet, the counter is reset. Note that the packet data segmented in the IP hierarchy is not assembled.
If packet segments cannot be assembled within the amount of time specified here, those packet segments undergoing assembly processing are discarded. The most recent time information in the packet segment used for assembly is used as the timestamp.

packetdata tag (HTTP packet definition)
Defines the format of the HTTP packets to be acquired. You specify this definition only once.
globalheader="global-header-area-size"
Specifies the size (in bytes) of the global header area, as an integer from 1 to 128. If this attribute is omitted, 24 is assumed.
packetheader="packet-header-area-size"
Specifies the size (in bytes) of the packet header area, as an integer from 9 to 128. If this attribute is omitted, 16 is assumed.
packetoffset="offset-for-packet-data-size-area"
Specifies the offset (in bytes) from the beginning of the packet header to the packet data size area, as an integer from 0 to 127. If this attribute is omitted, 8 is assumed.
packetlength="packet-data-size-area-size"
Specifies the size (in bytes) of the packet data size area, as an integer from 1 to 4. If this attribute is omitted, 4 is assumed.
timeoffset="offset-for-timestamp-area"
Specifies the offset (in bytes) from the beginning of the packet header to the timestamp area, as an integer from 0 to 120. If this attribute is omitted, 0 is assumed.
The timestamp area, which is located in the packet header area, stores the timestamp obtained when the packet analyzer captured the HTTP packet. The following figure shows the format of the timestamp area.

Figure 9-2 Format of timestamp area

[Figure]

command tag (command definition)
Defines the start command for the packet analyzer that is to be used. You specify this definition only once.
Stream Data Platform - AF supports WinDump as the packet analyzer. For details about the information to be defined here when WinDump is used, see (4) Information to be specified in the command tag when WinDump is used.
path="command-path-name"
Specifies the absolute path of the packet analyzer start command, as 1 to 100 single-byte characters (ASCII codes 32 to 126).
parameter="command-parameter"
Specifies a parameter to be passed to the packet analyzer start command, as 1 to 100 single-byte characters (ASCII codes 32 to 126).

output tag (output definition)
Defines the output information for common records when the HTTP packet input connector converts HTTP packets to common records. You specify this definition only once.
unit="maximum-output-unit"
Specifies the maximum number of output units (in records) for common records, as an integer from 1 to 1000. If this attribute is omitted, 100 is assumed.

record tag (record definition)
Defines record information for common records. You may specify 1 or 2 record definitions.
name="record-name"
Specifies a name for a common record, as 1 to 100 single-byte alphanumeric characters and the underscore (_). This record name must begin with a single-byte alphabetic character. This attribute cannot be omitted. This record name must be unique within the record tag.
type="{REQUEST|RESPONSE}"
Specifies the type of the common record. This attribute cannot be omitted.
The permitted values are as follows:
  • "REQUEST"
    Indicates a request record. This is a type of common record used to store data generated during request transmission from client to host in HTTP protocol communication.
  • "RESPONSE"
    Indicates a response record. This is a type of common record used to store data generated during response transmission from host to client in HTTP protocol communication.

field tag (field definition)
Defines information about the fields that constitute the common record. You may specify 1 to 16 field definitions.
name="field-name"
Specifies a field name.
Specify for a field name the identifier of data that is to be extracted from HTTP packets. The data corresponding to this identifier is stored as the field's value.
Each field name must be unique within the record definition.
The identifiers that can be specified as field names depend on the type of common record specified in the type attribute in the record tag. For details about the identifiers that can be specified as field names, see (5) Identifiers that can be specified as field names.
Extracted data is converted to the Java data type and then stored as field values. For details about the Java data types for the data that is stored as field values, see (6) Java data types that are stored as field values.

(3) Example

<?xml version="1.0" encoding="UTF-8"?>
<root:AdaptorCompositionDefinition
 xmlns:hpicon="http://www.hitachi.co.jp/soft/xml/sdp/adaptor/definition/callback/HttpPacketInputConnectorDefinition">
<!-- Omitted -->
 
<!-- CB definition for input -->
<cb:InputCBDefinition
 class="jp.co.Hitachi.soft.sdp.adaptor.callback.io.packetinput.HttpPacketInputCBImpl" name="inputer2">
  <!-- HTTP packet input connector definition -->
  <hpicon:HttpPacketInputConnectorDefinition>
    <!-- Input definition -->
    <hpicon:input buffersize="096" assemblingtime="2000">
      <!-- HTTP packet definition -->
      <hpicon:packetdata globalheader="24" packetheader="16" packetoffset="8"
       packetlength="4" timeoffset="0"/>
      <!-- Command definition -->
      <hpicon:command path="C:\Program Files\WinDump\WinDump.exe"
       parameter=" -i 1 -s 2048 -w - -n &quot;tcp port 80 and host 133.145.224.19&quot;"/>
    </hpicon:input>
    <!-- Output definition -->
    <hpicon:output unit="100">
      <!-- Record definition -->
      <hpicon:record name="RECORD1" type="REQUEST">
        <!-- Field definition -->
        <hpicon:field name="SEND_IP"/>
        <hpicon:field name="RECEIVE_IP"/>
        <hpicon:field name="SEND_PORT"/>
        <hpicon:field name="RECEIVE_PORT"/>
        <hpicon:field name="MESSAGE_TYPE"/>
        <hpicon:field name="TARGET_URI"/>
      </hpicon:record>
    </hpicon:output>
  </hpicon:HttpPacketInputConnectorDefinition>
</cb:InputCBDefinition>

(4) Information to be specified in the command tag when WinDump is used

This subsection discusses the information to be specified in the command tag when WinDump is used as the packet analyzer.

The example presented here uses WinDump version 3.9.5. For details about the WinDump start command, see the WinDump documentation.

The HTTP packet input connector supports the WinDump start command (WinDump.exe) specified in the following format.

WinDump.exe[Figure]-i[Figure]network-device-number[Figure]-s[Figure]internal-buffer-size[Figure]-w[Figure]-[Figure]-n[Figure]"tcp[Figure]port[Figure]port-number[Figure]and[Figure]host[Figure]IP-address"

Legend:
[Figure]: Single-byte space. It might be permissible to omit parameter delimiters (single-byte spaces) in the WinDump start command, but the single-byte spaces cannot be omitted from the parameter attribute in the command tag.

The following explains each value in the format.

WinDump.exe
This is the WinDump start command. Specify the absolute path of WinDump.exe in the path attribute in the command tag.

-i[Figure]network-device-number
This option specifies the number of the network device connected to the target computer that is to be analyzed. Specify this option and value in the parameter attribute in the command tag.

-s[Figure]internal-buffer-size
This option specifies the size (in bytes) of the internal buffer for storing the captured packet data. HTTP requires a larger packet size than TCP; normally, 2,048 bytes is sufficient. Specify this option and value in the parameter attribute in the command tag.

-w[Figure]-
This option specifies a file or the standard output as the output destination of the captured packets. If you use an HTTP packet input connector, specify -w[Figure]- because packets are output to the standard output. Specify this option and value in the parameter attribute in the command tag.

-n[Figure]"tcp[Figure]port[Figure]port-number[Figure]and[Figure]host[Figure]IP-address
This option specifies the capture filter in the filter format of the packet capture library (libpcap). If you use an HTTP packet input connector, specify the port number used with the HTTP protocol and the IP address of the computer to be analyzed. Specify this option and value in the parameter attribute in the command tag.
Note that a double quotation mark (") is treated as a special character, so code it as &quot;.

(5) Identifiers that can be specified as field names

The table below lists and describes the identifiers that can be specified as field names in the name attribute in the field tag.

Table 9-9 Identifiers that can be specified as field name

No. Identifier Data Description Protocol Whether or not specifiable according to record type
Request Response
1 TIME Time#1 Time packet data arrived -- Y Y
2 PACKET_LENGTH Packet size#2 Size of packet data (bytes) -- Y Y
3 SEND_MAC Sending MAC address MAC address at the packet sending end Ethernet Y Y
4 SEND_IP Sending IP address IP address at the packet sending end IP Y Y
5 RECEIVE_IP Receiving IP address IP address at the packet receiving end IP Y Y
6 SEND_PORT Sending port number Port number at the packet sending end TCP Y Y
7 RECEIVE_PORT Receiving port number Port number at the packet receiving end TCP Y Y
8 MESSAGE_TYPE Message type Request or Response HTTP Y Y
9 METHOD_NAME Method information Method information, such as GET and POST HTTP Y N
10 TARGET_URI URI information#3 Target URI information HTTP Y N
11 REFERER Referer#3 Link source URI information HTTP Y N
12 COOKIE Cookie#3 #4 Cookie information#5 HTTP Y Y
13 STATUS_CODE Status code Result of request processing HTTP N Y
14 CONNECTION Connection Connection persistency information HTTP Y Y
15 CONTENT_LENGTH Content-Length Contents size (bytes) HTTP Y Y
16 CONTENT_TYPE Content-Type Contents type HTTP Y Y
17 MESSAGE_BODY Message body#3 Real data HTTP Y#6 N

Legend:
Y: Identifier can be specified
N: identifier cannot be specified
--: Not applicable

#1
The time is obtained from the timestamp in the packet header.

#2
The packet size is the sum of the sizes of the HTTP message start line, header size, and value referenced by the HTTP header "Content-Length". If there is no "Content-Length", the value referenced by "Content-Length" is 0.

#3
A character string in percent-encoding is decoded in UTF-8.

#4
The method for acquiring cookie data is not the same for request records and response records.
For a request record, the data referenced by the HTTP header "Cookie" is acquired.
For a response record, the data referenced by the HTTP header "Set-Cookie" is acquired. The data referenced by the HTTP headers "Cookie2" and "Set-Cookie2" is not acquired.

#5
The cookie information might contain multiple cookies. If you want to treat each cookie as a field, specify regexsubstring in the function attribute in the map tag (mapping definition) and acquire a character string using a regular expression.

#6
You can obtain the message body only when the method information is POST, the media type in Content-Type is text (regardless of the value of subtype), and Content-Length exists.

(6) Java data types that are stored as field values

Each protocol data item extracted according to the identifier specified in the name attribute in the field tag is converted to a Java data type, as shown in the table below. During conversion to the Java data type, if the value of the protocol data is greater than the permitted maximum value, only up to the maximum value is stored as the field value. If the data corresponding to the specified identifier is not found in the protocol data, the null character is stored as the field value for the String type and -1 is stored for the Integer type.

Table 9-10 Java data types that are stored as field values

No. Data Protocol Java data type Value range
1 Time# -- Timestamp 1970/01/01 00:00:00.000000 to 2261/12/31 23:59:59.999999
2 Packet size -- Integer 0 to 2147483647
3 Sending MAC address Ethernet String 17 characters (00:00:00:00:00:00 to FF:FF:FF:FF:FF:FF)
4 Sending port number TCP Integer 0 to 65535
5 Receiving port number TCP Integer
6 Sending IP address IP String
  • In IPv4:
    7 to 15 characters (0.0.0.0 to 255.255.255.255)
  • In IPv6:
    40 characters (0000:0000:0000:0000: 0000:0000:0000:0000 to FFFF:FFFF:FFFF:FFFF: FFFF:FFFF:FFFF:FFFF)
7 Receiving IP address IP String
8 Data type HTTP String 7 or 8 characters (Request or Response)
9 Method information HTTP String 1 to 127 characters (such as GET or CONNECT)
10 URI information HTTP String 1 to 255 characters
11 Referer HTTP String
12 Cookie HTTP String 1 to 4,096 characters
13 Status code HTTP String 3 characters (such as 200, 404)
14 Connection HTTP String 1 to 127 characters (close or Keep-Alive)
15 Content-Length HTTP Integer 0 to 2147483647
16 Content-Type HTTP String 3 to 255 characters
17 Message body HTTP String 0 to 2,048 characters

Legend:
--: Not applicable

#
When you specify time data in the field, you must specify at least 6 digits including the significant digits for the CQL data type (TIMESTAMP) in the query definition file.