CSV Formats
WorldServer supports two general CSV formats: Simple and Advanced.
In Simple CSV format, which might be used to contain a glossary, each row is a term entry. Columns might contain locale instances (terms) for the term entry. Other columns might contain term entry attribute values.
In Advanced format, each row contains a locale instance (that is, one term) for an entry, with columns for the language, the term, and each attribute. An entry therefore is comprised of multiple rows, one for each term in the entry.
Example of a Simple CSV File
A Simple CSV file, in Excel, might look like this:
Example of an Advanced CSV File
An Advanced CSV file, in Excel, might look like this:
CSV Conventions
There is no formal internationally recognized standard for CSV format. Because of this, SDL uses the de facto standard developed by Microsoft Corporation.
A description of this format follows and can be found on the Web at http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm.
CSV records have the following conventions:
  • Each record is one line ...but
    A record separator may consist of a line feed (ASCII/LF=0x0A), or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A).
    ..but: fields may contain embedded line-breaks (see below) so a record may span more than one line.
  • Fields are separated with commas
    Example: John,Doe,120 Main St.,"Anytown, WW",08123
  • Leading and trailing space-characters adjacent to comma field separators are ignored
    So, John , Doe ,... resolves to "John" and "Doe", etc. Space characters can be spaces or tabs.
  • Fields with embedded commas must be delimited with double-quote characters
    In the above example. "Anytown, WW" had to be delimited in double quotes because it contained an embedded comma.
  • If a field contains a double-quote character, surround the entire field with double quotes and represent the original double quote with two consecutive double quotes.
    So, John "Lefty" Doe would convert to "John ""Lefty"" Doe", 120 Main St.,...
  • A field that contains embedded line-breaks must be surrounded by double-quotes
    So:
    Field 1: Conference room 1 Field 2:
    John,
    Please bring the M. Mathers file for review.
    -J.L. Field 3: 10/18/2002 ...
    would convert to:
    Conference room 1, "John,
    Please bring the M. Mathers file for review
    -J.L.
    ",10/18/2002,...
    Note that this is a single CSV record, even though it takes up more than one line in the CSV file. This works because the line breaks are embedded inside the double quotes of the field.
  • Fields with leading or trailing spaces must be delimited with double-quote characters
    So to preserve the leading and trailing spaces around the last name above:
    John ," Doe ",...
  • Fields may always be delimited with double quotes
    The delimiters will always be discarded.
  • The first record in a CSV file may be a header record containing column (field) names
    There is no mechanism for automatically discerning if the first record is a header row, so in the general case, this will have to be provided by an outside process (such as prompting the user). The header row is encoded just like any other CSV record in accordance with the rules above. A header row for the multi-line example above, might be:
    Location, Notes, "Start Date", ...
Definitions of WorldServer Advanced Format CSV Records
WorldServer Advanced format CSV terminology database import files have two types of records: header records and data records.
The Header Record
The inclusion of this record is mandatory. There is only one such record, and it must appear as the first record in the file. The values of this record provide field names for the records that follow. These field names are used by WorldServer to identify the meaning of the field data. The ordering of the field names is not specified, but of course, the ordering must be consistent with the field values in subsequent records.
There are some well-known field names. These correspond to WorldServer system defined attributes. Some are mandatory; others are optional with default values provided. These are described in the table below.
Name
Meaning
Required
Default
Language
The Locale associated with the term value.
Yes
N/A
Term
The value (text) of the term.
Yes
N/A
Created On
Creation time of the term.
No
Current timestamp
Created By
User who created the term.
No
Current user
Modified On
Modification time of the term.
No
Current timestamp
Modified By
User who modified the term.
No
Current user
Created On-Entry
Creation time of the term entry.
No
Current timestamp
Created By-Entry
User who created the term entry.
No
Current user
Modified On-Entry
Modification time of the term entry.
No
Current timestamp
Modified By-Entry
User who modified the term entry.
No
Current user
All other field names are assumed to be user-defined within WorldServer. Because user-defined attributes on term and term entries can have the same name, by convention all field names for term entry attributes must append a suffix of -Entry. For example, if there is a user-defined term entry attribute named MyAttribute then the corresponding field name in the header record must be MyAttribute-Entry. This is true regardless of whether or not there is a term attribute defined with the same name.
The Data Record
The data records provide the field values that correspond to the field names as defined in the header record. The important aspects of data records are listed below.
  • Values for entry attributes are only collected on the first record for each term entry.
  • A new term entry is recognized when one or more blank lines are encountered. A blank line is a line with only whitespace followed by an end of line. All records that follow are assumed to belong to the same term entry until another blank line is encountered or end of file is reached.
  • Timestamp values for all WorldServer defined term and term entry attribute values must use the following format: mm/dd/yy hh:mm am/pm (or the equivalent format for the Regional Setting you have selected).
Additional Notes
Although the ordering of the fields is not important for importing term data, SDL recommends the following field ordering to make viewing and editing of data in a spreadsheet easier. This is the field ordering that is used by the WorldServer CSV Term Export Function.
Field #
Field Name
Field Description
1
Language
The Locale of the term.
2
Term
The text value of the term.
3-N
User defined
All user-defined term attributes.
N+1
Created On
Creation time of the term.
N+2
Created By
User who created the term.
N+3
Modified On
Modification time of the term.
N+4
Modified By
User who modified the term.
(N+4)+I
User defined
All user-defined term entry attributes.
(N+4)+I+1
Created On-Entry
Creation time of the term entry.
(N+4)+I+2
Created By-Entry
User who created the term entry.
(N+4)+I+3
Modified On-Entry
Modification time of the term entry.
(N+4)+I+4
Modified By-Entry
User who modified the term entry.
Also, CSV files exported by WorldServer include two special system-defined attributes as the last two field values of the record – namely Entry Id and Term Id. These values are private to WorldServer exported CSV files. The values of these fields should not be altered. CSV term import files created by means other than through the WorldServer export function must not include these fields.
Parent Topic
Importing a Terminology Database