4.13.3 Division of data from a CSV/Excel file
Function
This plug-in divides a CSV (Comma Separated Values) file or a spreadsheet created in Microsoft Excel, into multiple files containing the specified number of rows. In a Windows environment, the plug-in can divide a CSV file or Excel file. In a UNIX environment, the plug-in can divide a CSV file.
You can also specify column filter conditions so that only rows that meet those conditions are output to the files resulting from the division (hereinafter referred to as the output files).
This plug-in can be used for files that meet the following conditions:
- In a Windows environment, the file can be edited in Microsoft Excel.
- The column names are entered in the first row. The items in the first row are copied, as column names, into each of the output files.
- For CSV files, the file must be a text file with the csv extension.
- For Excel files, the file must have the xls extension (Excel format saved as "Excel 97 - 2003 book") or the xlsx or xlsm extension (Excel book or Excel macro enabled book of Excel 2007 or later).
- In a UNIX environment, the CSV file must be written with the encoding specified for the LANG user environment variable, and must use LF as the line break code.
- In a UNIX environment built with Red Hat Enterprise Linux Server 6 or CentOS 6, this plug-in can divide a file into no more than 100 output files. If the number of output files exceeds 100, the plug-in terminates abnormally.
This plug-in requires the following server:
- Execution-target server
The server on which this plug-in is to be executed. In a Windows environment, the prerequisite version of Microsoft Excel must be installed on this server.
In a Windows environment, the script of this plug-in performs the following processes:
(1) The input file specified for the file name(common.fileName property) is copied to an output file (serial number 0)#.
(2) If a list of columns to undergo filter processing (excel.filterColumnList property) has been specified, rows that do not meet the column data's filter conditions (properties from excel.filterColumnString1 to excel.filterColumnString10) are deleted from an output file (serial number 0)#. If multiple conditions are specified, only rows that meet all conditions are extracted, and the rest of the rows are deleted.
(3) The rows are copied in batches of the number of rows specified for number of rows to be divided (common.divideLineCount property), from a temporary file (serial number 0)# to each of the other output files (serial number 1 onward)#. The first row of the temporary file (serial number 0)# is assumed as the column name, and is copied to the first row of each output file (serial number 1 onward)#. When dividing an Excel file, the sheet name of the divided file is "Sheet1".
In a UNIX environment, the script of this plug-in performs the following processes:
(1) Deletes existing files whose names use the same naming convention as output files (all of elements (a), (b), and (c) are the same)#.
(2) Copies the input file specified in the file name (common.fileName property) to generate a temporary file (serial number 0)# at the destination folder path for file output (common.outputFolderPath property).
(3) If a list of columns to undergo filter processing (excel.filterColumnList property) has been specified, rows that do not meet the column data's filter conditions (properties from excel.filterColumnString1 to excel.filterColumnString10) are deleted from a temporary file (serial number 0)#. If multiple conditions are specified, only rows that meet all conditions are extracted, and the rest of the rows are deleted.
(4) The rows are copied in batches of the number of rows specified for number of rows to be divided (common.divideLineCount property), from a temporary file (serial number 0)# to each of the other output files (serial number 00 onward)#. The first row of the temporary file (serial number 0)# is assumed as the column name, and is copied to the first row of each output file (serial number 00 onward)#.
#: The temporary file and output files are assigned a name that is made by concatenating the following elements:
(a) Path of the folder to which the file is output (value of the common.outputFolderPath property) (Example:C:\temp)
(b) Folder delimiter character "\" or "/"
(c) File prefix (value of the common.filePrefix property) (Example:result)
(d) Serial number based on the character string associated with the repeated execution plug-in (value of the common.loopInputProperties property) (Example:0, 1)
(e) File extension (value of the common.fileExtension property) (Example:.csv)
In the preceding example, the output files will have names "C:\temp\result0.csv" and "C:\temp\result1.csv".
If the folder specified as the destination folder path for file output (common.outputFolderPath property) does not exist, the folder is created and is used as the output destination for the output files. If an existing file has the same name as an output file, that file is overwritten.
In addition, when using 32-bit version of Microsoft Excel, if the following folder does not exist, the folder is created.
%SystemRoot%\SysWOW64\config\systemprofile\Desktop
When using 64-bit version of Microsoft Excel, you need to create the following folder in advance.
%SystemRoot%\System32\config\systemprofile\Desktop
%SystemRoot% mentioned above specifies the same folder as the environment variable "SystemRoot".
In a UNIX environment, the attributes of output files created by this plug-in do not depend on the target file.
- The local execution function is disabled
The attributes of output files are set according to the user who runs the plug-in and umask settings. If you change the umask, specify the umask settings by using the .bashrc file of the user who runs the plug-in.
- The local execution function is enabled
Execution user of a plug-in is root user, and the umask is fixed to "0066".
Use situation
When values entered in the rows of a CSV/Excel file are processed by using a repeated execution plug-in, dividing the CSV/Excel file by using this plug-in allows you to process a greater number of rows than the maximum number of repetitions allowed by the repeated execution plug-in.
In addition, by specifying filter conditions for columns of your choice, you can choose to extract and process only rows that meet those conditions.
Prerequisites
For the most recent information about the prerequisite product in the system, prerequisite products on the execution-target server, and supported OSs for the execution-target server, see the Release Notes.
Prerequisite product in the system:
JP1/Automatic Operation 11-10 or later
Prerequisite products on the execution-target server:
In a Windows environment
(1) Microsoft Excel 2007 or later
Supported OSs for the execution-target server:
(1) Windows Server 2008 R2 Standard/Enterprise/Datacenter
(2) Windows Server 2012 Standard/Datacenter, Windows Server 2012 R2 Standard/Datacenter
(3) Windows Server 2016 Standard/Datacenter
(4) Red Hat Enterprise Linux Server 6 (32-bit x86), Red Hat Enterprise Linux Server 6 (64-bit x86_64)
(5) Red Hat Enterprise Linux Server 7 (64-bit x86_64)
(6) CentOS 6 (64-bit x86_64)
(7) CentOS 7 (64-bit x86_64)
Conditions for using the prerequisite products on the execution-target server:
None
Cautions
- Notes when the server is running in a Windows environment
(1) Do not specify character strings containing double quotation marks (") or single quotation marks (') for properties of this plug-in.
(2) If a file with the number of columns of 257 or more is specified for the file name (common.fileName property), the component is abnormally terminated.
(3) Specify an integer value of 65535 or less for the number of rows to be divided (common.divideLineCount property). The component is abnormally terminated if a value of 65536 or more is specified.
(4) If no files are created, for attention because no rows met the filter conditions, the string for the repeated execution plug-in (serial numbers of output file names) (common.loopInputProperties property) is left empty. Therefore, if this property is passed on as is to a repeated execution plug-in, the repeated execution plug-in ends abnormally.
(5) In this plug-in, the number of the column to be filtered also includes the empty columns on the left side of the column where the column name or the data is described first. But it does not include them in "Acquisition of column data from a CSV/Excel file" plug-in and "Acquisition of row data from a CSV/Excel file" plug-in. Therefore, be careful when using in combination with "Acquisition of column data from a CSV/Excel file" plug-in and "Acquisition of row data from a CSV/Excel file" plug-in.
(6) If this plug-in terminated abnormally and "80080005 Server execution failed" was output in the task log, the Excel file might be open on the execution-target server. Confirm the status of the Excel file. At this time, the process "OfficeC2RClient.exe" might start up. If you want to terminate this process, operate with Task Manager, etc.
(7) Do not run two or more instances of this plug-in if the temporary files or output files created by the instances will have the same name.
- Notes when the server is running in a UNIX environment
(1) Do not specify character strings containing double quotation marks (") or single quotation marks (') for properties of this plug-in.
(2) The data of each cell in a CSV file cannot include a comma (,) or line break. Cells that include a comma (,) or line break are not correctly handled as columns.
(3) This plug-in does not ignore any halfwidth spaces and tabs at the beginning and end of each row in a CSV file. However, the "Acquisition of row data from a CSV/Excel file" plug-in ignores those spaces and tabs. Keep this in mind when using both of these plug-ins.
(4) If no files are created, for example because no rows met the filter conditions, the string for the repeated execution plug-in (serial numbers of output file names) (common.loopInputProperties property) is left empty. Therefore, if this property is passed on as is to a repeated execution plug-in, the repeated execution plug-in ends abnormally.
(5) If a locale environment variable (such as LC_ALL or LC_MESSAGES) that has higher priority than LANG has been specified, the higher-priority locale environment variable will take precedence over LANG. Therefore, when executing a plug-in, cancel the higher-priority locale environment variable (such as LC_ALL or LC_MESSAGES) or change its value to the same value as the LANG environment variable.
(6) Do not run two or more instances of this plug-in if the temporary files or output files created by the instances will have the same name.
Execution privilege
Windows environment: User with Administrator permission
UNIX environment: User with read permission for the CSV file and write permission for the output destination directory. If the output destination directory does not exist, write permission for the parent directory of the output destination directory is required.
Version
02.51.00
Plug-in tags
Control OS,File Operations,Windows,Linux
Plug-in name displayed in the task log
osDivideCSVExcelFile
Return code
0: Normal
12: Error (mistake by user) Invalid property
21: Error (invalid environment) Command cannot be found (error detected in the plug-in script)
27: Error (check task logs for the nature of error)
Property list
The following table lists the properties:
Property key |
Property name |
Description |
Default value |
I/O type |
Required |
---|---|---|---|---|---|
plugin.destinationHost |
Host name of the execution target server |
Specify the host name or IP address of the server on which this plugin will be executed. IPv6 addresses are not supported. |
-- |
Input |
R |
common.fileName |
File name |
Specify the full path of the target CSV or Excel file to be divided. |
-- |
Input |
R |
common.sheetName |
Worksheet name in Excel file |
Specifies the name of the worksheet in the Excel file. Does not need to be specified in the case of a CSV file. |
Sheet1 |
Input |
O |
common.outputFolderPath |
Destination folder path for file output |
Specify the full path of the folder to which to output the output files. |
-- |
Input |
R |
common.filePrefix |
File prefix |
Specify the prefix of the output files. |
-- |
Input |
R |
common.divideLineCount |
Number of rows to be divided |
Specify the number of rows (excluding the header) in each output file, in halfwidth numerical values. |
-- |
Input |
R |
excel.isNumericHeader |
Specified category of the header |
If columns to undergo filter processing are specified by using numbers, specify TRUE. If columns are specified by name (string), specify FALSE. |
True |
Input |
O |
excel.filterColumnList |
List of columns to undergo filter processing |
In a Windows environment, specify the columns to undergo filter processing by either their column numbers or their column names (string), separated by commas. You can specify up to 10 column numbers (or column names). In a UNIX environment, specify a comma-separated list of column numbers. You can specify up to 10 column numbers. |
-- |
Input |
O |
excel.filterColumnString1 |
Filter condition 1 for column data |
Specify the regular expression of the filter to be applied to the first column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString2 |
Filter condition 2 for column data |
Specify the regular expression of the filter to be applied to the second column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString3 |
Filter condition 3 for column data |
Specify the regular expression of the filter to be applied to the third column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString4 |
Filter condition 4 for column data |
Specify the regular expression of the filter to be applied to the fourth column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString5 |
Filter condition 5 for column data |
Specify the regular expression of the filter to be applied to the fifth column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString6 |
Filter condition 6 for column data |
Specify the regular expression of the filter to be applied to the sixth column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString7 |
Filter condition 7 for column data |
Specify the regular expression of the filter to be applied to the seventh column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString8 |
Filter condition 8 for column data |
Specify the regular expression of the filter to be applied to the eighth column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString9 |
Filter condition 9 for column data |
Specify the regular expression of the filter to be applied to the ninth column specified in excel.filterColumnList. |
-- |
Input |
O |
excel.filterColumnString10 |
Filter condition 10 for column data |
Specify the regular expression of the filter to be applied to the tenth column specified in excel.filterColumnList. |
-- |
Input |
O |
common.loopInputProperties |
String for the repeated execution plug-in (serial numbers of output file names) |
The serial number of an output file name can be stored as one or more numerical values separated by commas. You can process each output file by passing this property on to a repeated execution plug-in. |
-- |
Output |
O |
common.fileExtension |
File extension |
This stores the output file extension including the initial period (.). |
-- |
Output |
O |
common.returnValue |
Return value for the plugin |
The return value of this plugin stored. |
-- |
Output |
O |