Introduction

Data availability is an important asset in every ONE DATA project, yet what is also important is the type in which the data is stored. Understanding datatype specifications is a first step to every dataset creation procedure and can prevent overcomplications. Therefore, different datatypes cannot and should not be handled similarly.

ONE DATA Platform presents numerous datatypes, and in this article, we will present some of the major differences between them, as well as how to choose what best represents your data.


Background

Before we dive into specific ONE DATA datatypes, it is best to have a general idea of the commonly used datatypes.

The following section explains the four fundamental levels of measurements used in our datatype definition:


  • Nominal: A nominal scale describes a variable with categories that do not have a natural order or ranking.
  • Ordinal: An ordinal scale is one where the order matters but not the difference between values. 
  • Interval: An interval scale is one where there is order and the difference between two values is meaningful. 
  • Ratio: A ratio variable, has all the properties of an interval variable, and also has a clear definition of 0.0. When the variable equals 0.0, there is none of that variable.



Further information can be found under the following link.


Datatypes in ONE DATA

Direct contact with datatypes in ONE DATA can be established in a number of resources, such as Data Tables where it can be visualized, and Workflows where it can be manipulated by some Processors.


String Datatypes

The default value for a declared string is an empty string.

string / ordinal

A string affecting a certain qualitative order to the dataset. Usually used for classification and labeling groups, and meant to be ordered and redundant. 

Examples include: 

  • Degree level: "BS", "MS", "PhD"
  • Income classification: "Low income", "Middle income", "High income"

string / nominal

A string affecting a description or labeling where the order doesn't matter. Sorting can be performed but is meaningless.

Examples include gender, blood type, city names, etc.


Integer Datatypes

int / interval

A numerical datatype representing discrete values. Operations to use are addition, subtraction and multiplications.

Examples include employee count, number of sold records, etc.


The interval of integers allowed in Custom Input Tables is [-9007199254740991, 9007199254740991] or [-2^53 - 1, 2^53 - 1]


When integers are displayed in other processors (e.g. Result Table), they are rounded if they exceed the limit interval [-2^53 - 1, 2^53 - 1].
Furthermore, integers are written in their standard format as long as they have at most 20 digits. Values out of the range [-1e+20, 1e+20] are represented in their scientific notation.
The scientific notation's precision is 16 digits


Big integers are correctly calculated in ONE DATA, only the representation lacks precision.
In case big numbers need to be displayed in full precision (e.g. Integers representing unique identifications), converting to the string format is recommended.


Within a Custom Input Table Processor, negative integers are represented in red and entered decimal numbers are rounded.


Double Datatypes

The default value for doubles is 0.

double / ratio

Doubles are a datatype representing very delicate decimal numbers. It basically provides extra storage space for precision purposes.

Examples include percentage representations where values can be close to each other and add up to a 100%.


The interval of double values allowed in Custom Input Tables is [-1e+308, 1e+308].
Values outside of this range (in both directions) are considered as Infinity.
Furthermore, double values are written in their standard format as long as they have at most 20 natural digits. Values out of the range [-1e+20, 1e+20] are represented in their scientific notation.
The scientific notation's precision is 16 digits


Numeric Datatypes

Numeric variables cover more representational use cases but have less precision than doubles. The default value for doubles is 0.


Numeric datatypes are represented in the form x.x and using a comma to separate integer and decimal fields leads to an error.


numeric / interval

Numeric intervals represent data where the the difference between values is the most interesting feature.

Examples include distance, altitude, pH, etc.

numeric / ratio

The numeric ratio covers all properties of a numeric interval plus a clear definition of the 0.0.

Examples include weight, length, temperature, etc.

numeric / ordinal

Numeric ordinals describe a set of numbers where the order counts but not the difference between values.

Examples include annual income, expenses, ratings, etc.

numeric / nominal

Numeric nominals are meant to give a digital description to certain variables.

Examples include price tags and any type of code.


The interval of numeric values allowed in Custom Input Tables is [-1e+308, 1e+308].
Values outside of this range (in both directions) are considered as Infinity.
Furthermore, numeric values are written in their standard format as long as they have at most 20 natural digits. Values out of the range [-1e+20, 1e+20] are represented in their scientific notation.
The scientific notation's precision is 16 digits.


Datetime Datatypes

datetime / interval 

Datetime variables represent both a date and a time variable allowing a certain specific operations using the Mathematical Timestamp Operation Processor. The representation form follows the default datetime value which is 1970-01-01 01:00:00 (YYYY-MM-DD HH:MM:SS). 


Usage information can be found under the following link.


The Custom Input Table Processor can only accept Datetime entries within a certain interval that goes from 0100-01-01 00:00:00 to 9999-12-31 23:59:59.
Nonetheless, dates outside of this interval can be achieved using Mathematical Timestamp Operation Processors.


A hint to access the default datetime value in a Custom Input Table Processor is to change the datatype, save the Processor changes and open the edit configuration again. 


Relevant Processors

Custom Input Table

The Custom Input Table Processor allows the user to set up a new customizable dataset/table with various options. 


Data Type Conversion

The Data Type Conversion Processor takes a given input and converts it, if possible, to the specified data type. 


Mathematical Operation MC

The Mathematical Operation MC (Multiple Column) Processor executes an arithmetic operation on the input columns and overwrites them with the result.


Mathematical Operation SC

The Mathematical Operation SC (Single Column) Processor executes an arithmetic operation on the chosen input column and delivers the result in a new column. 


Mathematical Timestamp Operation

The Mathematical Timestamp Operation  Processor adds or subtracts a user specified time value from one or several selected columns. 


Timestamp Difference

The Timestamp Difference  Processor calculates the difference between two datetime values (located in different columns of the same dataset) and returns the result in an additional column.