Data availability is an important asset in every ONE DATA project, yet what is also important is the type in which the data is stored. Understanding datatype specifications is a first step to every dataset creation procedure and can prevent overcomplications. Therefore, different datatypes cannot and should not be handled similarly.
ONE DATA Platform presents numerous datatypes, and in this article, we will present some of the major differences between them, as well as how to choose what best represents your data.
Before we dive into specific ONE DATA datatypes, it is best to have a general idea of the commonly used datatypes.
The following section explains the four fundamental levels of measurements used in our datatype definition:
- Nominal: A nominal scale describes a variable with categories that do not have a natural order or ranking.
- Ordinal: An ordinal scale is one where the order matters but not the difference between values.
- Interval: An interval scale is one where there is order and the difference between two values is meaningful.
- Ratio: A ratio variable, has all the properties of an interval variable, and also has a clear definition of 0.0. When the variable equals 0.0, there is none of that variable.
Further information can be found under the following link.
Datatypes in ONE DATA
Direct contact with datatypes in ONE DATA can be established in a number of resources, such as Data Tables where it can be visualized, and Workflows where it can be manipulated by some Processors.
The default value for a declared string is an empty string.
string / ordinal
A string affecting a certain qualitative order to the dataset. Usually used for classification and labeling groups, and meant to be ordered and redundant.
- Degree level: "BS", "MS", "PhD"
- Income classification: "Low income", "Middle income", "High income"
string / nominal
A string affecting a description or labeling where the order doesn't matter. Sorting can be performed but is meaningless.
Examples include gender, blood type, city names, etc.
int / interval
A numerical datatype representing discrete values. Operations to use are addition, subtraction and multiplications.
Examples include employee count, number of sold records, etc.
The interval of integers allowed in is [-9007199254740991, 9007199254740991] or [-2^53 - 1, ]
integers are correctly calculated in ONE DATA, only the representation lacks precision. In case big numbers need to be displayed in full precision (e.g. Integers representing unique identifications), converting to the string format is recommended.
Within a Custom Input Table Processor, negative integers are represented in red and entered decimal numbers are rounded.
The default value for doubles is 0.
double / ratio
Doubles are a datatype representing very delicate decimal numbers. It basically provides extra storage space for precision purposes.
Examples include percentage representations where values can be close to each other and add up to a 100%.
The interval of double values allowed in is [-1e+308, 1e+308]. Values outside of this range (in both directions) are considered as Infinity. Furthermore, double values are written in their standard format as long as they have at most 20 natural digits. Values out of the range [-1e+20, 1e+20] are represented in their scientific notation. The scientific notation's precision is 16 digits
Numeric variables cover more representational use cases but have less precision than doubles. The default value for doubles is 0.
Numeric datatypes are represented in the form x.x and using a comma to separate integer and decimal fields leads to an error.
numeric / interval
Numeric intervals represent data where the the difference between values is the most interesting feature.
Examples include distance, altitude, pH, etc.
numeric / ratio
The numeric ratio covers all properties of a numeric interval plus a clear definition of the 0.0.
Examples include weight, length, temperature, etc.
numeric / ordinal
Numeric ordinals describe a set of numbers where the order counts but not the difference between values.
Examples include annual income, expenses, ratings, etc.
numeric / nominal
Numeric nominals are meant to give a digital description to certain variables.
Examples include price tags and any type of code.
The interval of numeric values allowed in is [-1e+308, 1e+308]. Values outside of this range (in both directions) are considered as Infinity. Furthermore, numeric values are written in their standard format as long as they have at most 20 natural digits. Values out of the range [-1e+20, 1e+20] are represented in their scientific notation.
datetime / interval
Datetime variables represent both a date and a time variable allowing a certain specific operations using the P. The representation form follows the default datetime value which is 1970-01-01 01:00:00 (YYYY-MM-DD HH:MM:SS).
Usage information can be found under the following link.
A hint to access the default datetime value in a P
Custom Input Table
The Processor allows the user to set up a new customizable dataset/table with various options.
Data Type Conversion
The Processor takes a given input and converts it, if possible, to the specified data type.
Mathematical Operation MC
The (Multiple Column) Processor executes an arithmetic operation on the input columns and overwrites them with the result.
Mathematical Operation SC
Mathematical Timestamp Operation