TABLE OF CONTENTS


This is an advanced feature. To use the functionality efficiently, the user has to be familiar with CSV import.


General Description

In order to get an overview of which user accessed which Data Tables at what time, logging of data accesses is required. This requires a log file to be created for each access, containing the information date and time of access, user ID and Data Table ID. Data access logging then allows reconstruction of accesses in data sets, which could, for example, be important for reasons of IT security. 


Necessary Rights

To enable the feature, DevOps support is required. Users can also import the logs back into ONE DATA for further analysis. To enable them to do so, a Super Admin is required. 


Users who can enable access logging: DevOps

Users who can enable access log import: Super Admins 

 

How to Enable Access Logging

By default, logging is disabled. It can be turned on by setting the environment variable DATAACCESS_LOGGING_ACTIVATED to true ("true" (string) for Kubernetes). 

If access logging is activated, all data accesses are written to a log file. Each day produces its own log file and is archived as a gzip. Archived logs will be deleted automatically after 14 days.


Configure access logging

Use DATAACCESS_LOGGING_PATTERN to change the file pattern. You can for example set yyyy-MM for one file per month, yyyy-ww for one file per calendar week, or yyyy-MM-dd-HH for one file per hour.


Use DATAACCESS_LOGGING_MAXHISTORY to change the delay until archived logs are deleted. The value represents the number of files and is therefore dependent on the pattern. For example, pattern yyyy-MM and maxHistory 6 will keep logs for 6 months, pattern yyyy-MM-dd-HH and maxHistory will keep logs for 6 hours.


What accesses are actually logged

All accesses to a Database Connection, Filesystem Connection, Data Tables (from upload) and Virtual Data Tables are logged. Failed attempts, for example due to missing access rights, are also logged. 

The table below shows all verified scenarios. The access to Data Tables via Apps is also covered. 



Data Table TypeOpen (DT, Statistics, Sample, Apps)Usage in WorkflowAnalysis Autorization Preview
Virtual Data Tableyesyesnot available
Filesystem Connectionyes yesyes
Database Connectionyesyesyes
Data Tables (from upload)yes yesyes



Step-by-step

In this section, we will explain how to use the functionality step by step.


1. Set environment variables. This step needs to be done by DevOps.

    a. Depending on your setup, set the environment variables accordingly. Note that:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#Docker
  onedata-server:
    [...]
    environment:
      [...]
      - "DATAACCESS_LOGGING_ACTIVATED=true"     # optional, default "false"
      - "DATAACCESS_LOGGING_PATTERN=yyyy-MM-dd" # optional, default "yyyy-MM-dd"
      - "DATAACCESS_LOGGING_MAXHISTORY=14"      # optional, default "14"
  
# Kubernetes
  onedata:
    version:
      server: random
      [...]
    environment:
      server:
        [...]
        DATAACCESS_LOGGING_ACTIVATED: "true"      # optional, default "false"
        DATAACCESS_LOGGING_PATTERN: "yyyy-MM-dd"  # optional, default "yyyy-MM-dd"
        DATAACCESS_LOGGING_MAXHISTORY: 14         # optional, default "14"

                    


2. Check that logback.xml is up-to-date. This step needs to be done by DevOps. 

    a. Location and other logging options are configurable in the ${onedata.root}/logback.xml 

    b. Make sure that the log outpout folder persists (for example, docker mount volume)

    c. There are two possibilities to get the logback.xml:

        i. Use the provided logback.xml and replace yours. Make sure you have no custom local changes.

        ii. Merge the following appender and logger into your logback.xml:


logback.xml (changes)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
<!--
    Data access logging can be turned on by setting the environment variable DATAACCESS_LOGGING_ACTIVATED to true ("true" (string) for Kubernetes).
    If access logging is activated, all data accesses are written to a log file. Each day will produce its own log file and is archived as gzip.
    Archived logs are automatically deleted after 14 days.
-->
<variable name="DATAACCESS_LOGGING_ACTIVATED" value="${DATAACCESS_LOGGING_ACTIVATED:-false}" />
<!--
    Use DATAACCESS_LOGGING_PATTERN to change the file pattern.
    E.g., yyyy-MM for one file per month, yyyy-ww for one file per calendar week, or yyyy-MM-dd-HH for one file per hour.
-->
<variable name="DATAACCESS_LOGGING_PATTERN" value="${DATAACCESS_LOGGING_PATTERN:-yyyy-MM-dd}" />
<!--
    Use DATAACCESS_LOGGING_MAXHISTORY to change delay until archived logs are deleted.
    The value represents the number of files and is therefore dependent on the pattern.
    E.g., pattern yyyy-MM and maxHistory 6 will keep logs for 6 months, pattern yyyy-MM-dd-HH and maxHistory 6 will keep logs for 6 hours
-->
<variable name="DATAACCESS_LOGGING_MAXHISTORY" value="${DATAACCESS_LOGGING_MAXHISTORY:-14}" />
 
 
<!-- Log messages with marker DATA_ACCESS to a separate file (use MarkerFactory.getMarker("DATA_ACCESS") to obtain) -->
<appender name="DATA_ACCESS_FILE" class="de.onelogic.onedata.util.RollingFileAppenderWithHeader">
    <filter class="ch.qos.logback.core.filter.EvaluatorFilter">
        <evaluator class="ch.qos.logback.classic.boolex.OnMarkerEvaluator">
            <marker>DATA_ACCESS</marker>
        </evaluator>
        <onMismatch>DENY</onMismatch>
        <onMatch>NEUTRAL</onMatch>
    </filter>
    <header>Timestamp,Timezone,UserId,ResourceId,ResourceType</header>
    <file>${onedata.root}/logs/data_access/data_access.log</file>
    <append>true</append>
    <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
        <pattern>%d{yyyy-MM-dd HH:mm:ss},UTC,%m%n</pattern>
    </encoder>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
        <fileNamePattern>${onedata.root}/logs/data_access/data_access.%d{${DATAACCESS_LOGGING_PATTERN}}.log.gz</fileNamePattern>
        <maxHistory>${DATAACCESS_LOGGING_MAXHISTORY}</maxHistory>
    </rollingPolicy>
</appender>
 
<if condition='"${DATAACCESS_LOGGING_ACTIVATED}".equals("true")'>
    <then>
        <logger name="de.onelogic" level="DEBUG" additivity="false">
            <appender-ref ref="DATA_ACCESS_FILE" />
        </logger>
    </then>
</if>

   

3. Whitelist data access log folder for csv import. This step needs to be done by a Super Admin.

    a. Go to Settings => Filesystem Connections (more information on FILESYSTEM Connection can be found here)

4. (User) Import logs via merged file Connection

    a. Create new file Connection (part of the FILESYSTEM Connection Feature)


    b. Set merge rule    

[
  {
    "mergeRule": "data_access.*",
    "fileName": "mergedAccessLog.csv"
  }
]

    c. Create Data Table from mergedAccessLog.csv and use as desired



Boundaries and current restrictions

Boundaries

  • Logs for Data Connection Load Processor (reading data from external database without persisting the data in ONE DATA) is not included for now. That means we only can track if the Connection is used, not a certain table.
  • With only data access in ONE DATA being tracked, REST API calls in the Flexible API Processor are also not covered. These should be covered by other means.
            

Current restrictions

  • Save access to concrete line in Data Table.
  • Make sure that logs are saved in case instance is re-started. This has to be configured by DevOps by mounting persistent storage in the container.