Performance of your App is a vital part of its user experience. This page discusses several optimization techniques and tips for improving the performance of your App.


Content


Optimizing Transformation Sequences

You can define a sequence of transformations on your Datasources via the transformations property of the Datasource. This allows providing a view on the data by e.g. filtering, sorting or applying a custom SQL query. It is also possible to define further transformations in the sourceOptions of your Elements. More information about the the definition can be found in the documentation on Datasources and Elements. The transformations defined on a Datasource or Element are executed on the referenced data table one after the other. It is thus possible to optimize the overall execution time by having a look at the defined transformation steps. This is especially important if you're using custom SQL transformations on data tables that are backed by Postgres database in version 11.x or older. This section will discuss possible optimization techniques for the transformation-sequence defined on your Data sources that are applicable to any Postgres/Spark backed Data sources. The next section will discuss possible optimizations specific to Postgres 11 and earlier.

 

When writing the transformation sequence for your Datasources, always consider the amount of data that needs to be processed by each step. The more data a specific step needs to operate on, the longer it's execution will take. Therefore if steps are interchangeable, first put the steps that reduce the amount of data most. Please have a look at the following transformation sequence as an example. It computes the sum of the assets of companies that are located in either  Germany or Austria.


  "transformations": [
      {
          "type": "aggregation",
          "config": [
              {
                  "column": "country",
                  "aggregationOperation": "GROUP_OPERATION"
              },
              {
                  "column": "assets",
                  "aggregationOperation": "SUM_OPERATION"
              }
          ]
      },
      {
          "type": "filter",
          "config": {
              "column": "country",
              "columnType": "STRING",
              "values": [
                  "Germany",
                  "Austria"
              ]
          }
      }
    ]


In this transformation sequence the aggregation step is first, followed by a filter step on country. For the aggregation result to be computed, the whole data of the data table needs to be read. This is a quite expensive operation as the complete data needs to be fetched from disk. In case of a Spark-based execution the data also has to be shuffled over the network. Afterwards the filter is applied on this intermediate result reducing the amount of data to only those records that have Germany or Austria as their country. For improved performance it makes sense to reorder the steps here. If the filter step is first, the amount of data is typically reduced a lot compared to the overall data size. Furthermore the filter operation can highly benefit from an index on country in this case if you're using a data table backed by a database (for information on how to set indices have look here). As the filter step reduces the amount of data, the subsequent aggregation step needs to operate on far less data than before. Thus the overall query execution is sped up. In general it makes sense to rethink the transformation steps on your Datasources and Elements in order to get the best performance. The steps that reduce the data most should be as far in front of the sequence as possible. Also it makes sense to put any filters first and define indices on the filtered columns. If the filter steps are on different columns, a composite index should be added on all of these columns.


Reordering of transformation steps

  • Have a look at your data and your transformations and try to estimate how much the data is reduced by each step.
  • Put transformations that reduce the number of rows most in front of other steps if possible.
  • Try to put filter transformations first and define an index on the filtered columns.


Speeding up SQL Transformations with Postgres 11

When you apply tranformations on a data table based on PostgreSQL, these transformations are translated into a SQL-query and executed by the database. The queries sent to a database system are optimized by a database optimizer. It creates a plan on how the query should be actually executed and if e.g. indices or other metadata can be used to speed up the execution. Until version 12 of Postgres the optimizer could only optimize until the first SQL transformation. Therefore the evaluation of the custom SQL query and all transformation steps after it could not benefit from any optimization potential. Starting with version 12 of Postgres the optimizer can now optimize over custom SQL queries and therefore the overall execution time is speed up. The following section will give some pointers on how SQL queries can be sped up when using Postgres 11. If you're using Postgres in version 12 or later, these optimizations don't need to be done. For information about the version used by your ONE DATA instance, please contact your instance administrator. As mentioned before, the query can only be optimized until the first SQL transformation step in Postgres 11. Therefore it makes sense to put as much transformations in front of the SQL transformation as possible and thereby reducing the amount of data that needs to be processed by the SQL transformation itself. This way the the database can optimize and speed up the query until the custom SQL query is executed. Thus as much of the SQL query as possible should be represented as preceding transformations of other types. By splitting up your SQL query and applying the optimizations discussed in the previous section, the query execution time can be sped up resulting in a better loading performance of your App. For example a common scenario for SQL transformations is to aggregate over the data and apply some functions in the projection. An example of this is shown in the following query. It computes the rounded sum of the total company assets of a country.


  "transformations": [
      {
          "type": "sql",
          "config": "SELECT country, ROUND(SUM(assets)) FROM inputTable GROUP BY country"
      }
  ]


This query is not entirely representable by other transformation types as the ROUND function can only be applied via a SQL transformation. But the aggregation part can be represented as an aggregation operation. Thus the SQL transformation can be restructured in the following way.


  "transformations": [
      {
          "type": "aggregation",
          "config": [
              {
                "column": "country",
                "aggregationOperation": "GROUP_OPERATION"
              },
              {
                "column": "assets",
                "aggregationOperation": "SUM_OPERATION"
              }
          ]
      },
      {
          "type": "sql",
          "config": "SELECT country, ROUND(assets) FROM inputTable"
      }
    ]


In this sequence the aggregation is done by a transformation of type aggregation before the SQL transformation doing the rounding is executed. Therefore the aggregation itself can be optimized by the database and reduces the data that is passed to the SQL transformation. This way the overall query execution time is sped up as the database can impose further optimization measures.

Optimizing SQL transformations with Postgres 11

  • A lot of common SQL transformations used in apps are composed of multiple sub-transformations (e.g. aggregations, filters, etc.).
  • For better performance these sub-transformations should be represented as preceding transformation steps.
  • Some parts of a SQL query can be represented in a different way as standard transformations. For example a COUNT(DISTINCT col) could be represented by first grouping over col with an aggregation transformation and than doing a COUNT(*) with a SQL transformation on the result.


This section discussed why SQL transformations can impose optimization barriers in Postgres databases prior to version 12. By representing parts of your query with preceding transformations of other types, this optimization barrier can be partially circumvented. By rethinking the transformation sequences used in your App and restructuring them in a way such that the intermediate results are small early in the sequence, the overall loading speed of your App can be increased.