SAP HANA Sample Interview Questions

SAP HANA Interview Questions with Answers & Explanations Sample Questions

SAP HANA interview questions can cover a very wide range.

Here is a sample of questions that a typical HANA Consultant can be expected to face.

Sample Questions

Q1. What are the different ways in which you may perform a SAP HANA database data backup? What happens, if a scheduled data backup starts before the previous backup is completed?

Answer;
There are different ways and tools to perform a data backup.

a)        Performing a data backup using SAP HANA studio
1. In the Navigator view, select the database that you want to back up.

2. From the context menu, choose Back Up…

3. Select a destination type, File or Backint:

o        File Backs up the database to files in the file system. Each SAP HANA service writes backup data to the specified destination in the file system.

o        Backint Backs up the database using a third-party backup tool. Each SAP HANA service starts the Backint agent and sends the backup data to the third-party backup tool.

4. Specify the backup destination and the backup prefix.

The default destination is the path specified in the Backup section. The backup destination can be the same, regardless of whether you select File or Backint.

The third-party backup tool can distinguish between multiple backups with the same name. You do not need to use a different name for each backup.

For file-based backups, you need to ensure that there is sufficient space at the specified backup destination.

For backups with third-party backup tools, you can only change the backup prefix. The destination is always /usr/sap/<SID>/SYS/global/hdb/backint

5. Choose Next.
A summary of the backup settings is displayed.

6. When all the settings are correct, choose Finish. The backup then starts. The progress of the backup is shown for all the services (for example, the statistics server, name server, and index servers).

When all the volumes have been backed up, a confirmation message is displayed.

b) Performing a data backup using SQL commands
You can enter SQL commands either by using the SQL editor in SAP HANA studio, or by using the command line program hdbsql. You can start the backup using the following SQL command:

For file-based backup:
           BACKUP DATA USING FILE (‘<path><prefix>’)
           If no absolute path is specified, the default backup destination is prepended to the location. If the directory does not exist, you need to create it before you start the backup.

For third-party backup tools:
           BACKUP DATA USING BACKINT (‘/usr/sap/<SID>/SYS/global/hdb/backint/<prefix>’)

           For third-party backup tools, you can optionally omit the path specification.

c) Performing a data backup in batch mode
           To perform data backups in batch mode, we recommend that you use the command line tool hdbsql on operating system level. hdbsql enables you to trigger backups through crontab.

d) Performing a data backup using DBACockpit (from the ABAP system only)
You can find the DBA Cockpit documentation in the SAP Help Portal http://help.sap.com/

If a scheduled data backup starts before the previous backup is completed, the first data backup will continue normally; the second data backup will not start and you will receive an appropriate error message.

Q2. In case of a file-based backup can I configure the directories where the backups should be located? What are the naming conventions for data backup files?

Answer;
Yes. The location of file-based data and log backups can be changed.

For improved data safety, it is recommended that you specify a path to an external backup destination. The backup destination should never be on the same file system as the data or log areas.

It is recommended that you create the destination directory structures before the backup is started.

a) Data backup
By default, data backups are written to the following destination: $DIR_INSTANCE/backup/data.

You can specify a different destination when you perform the backup. Alternatively, you can change the default backup destination in the Backup section in SAP HANA studio.

To change the default destination for data backups:
1. In SAP HANA studio navigation pane go to the Backup section of your system.

2. Go to the Configuration tab and choose Data Backup Settings.

3. Specify the new default destination in the Destination field & Save.

Different data backups can be written to different destinations, but all the parts of one particular data backup are written to the same destination.

b) Log backup
By default, log backups are written to the following destination: $DIR_INSTANCE/backup/log.

You can change the backup destination in the Backup section in SAP HANA studio.

To change the default destination for log backups:

1. In SAP HANA studio navigation pane go to the Backup section of your system.

2. Go to the Configuration tab and choose Log Backup Settings.

3. Specify the new default destination in the Destination field & Save.

In general changes to the data or log backup destinations take effect immediately. Each backup naming convention comprises the following elements: <path><prefix>

If no complete path is specified, the default backup destination is prepended to the backup name. For file-based backups, the backups are written to a destination in the file system. For backups made using third-party backup tools, a pipe is created in the file system.

The pipe is always created in the directory /usr/sap/<SID>/SYS/global/hdb/backint. The backup tool decides where the backups are actually written.

You can use the prefix proposed by the system or you can specify a different prefix for the backup name. During backup, each service backs up its data to the specified backup destination.

Below is an example of a set of backup files from one data backup created with SAP HANA studio. The backups can have different names.

•        /backup/data/COMPLETE_DATA_BACKUP_databackup_0_1 (name server topology)

•        /backup/data/COMPLETE_DATA_BACKUP_databackup_1_1 (name server)

•        /backup/data/COMPLETE_DATA_BACKUP_databackup_2_1 (for example, index server)…

In the above example, the <path> is /backup/data/, the <prefix> is COMPLETE_DATA_BACKUP. databackup_0_1 is the suffix, which is automatically added by the system.

Q3. What general configuration options do I have for saving the log area? Where can I find the backup catalog?

Answer;
You can configure the log backup behaviour using the Backup section in SAP HANA studio except for the log mode. In that case you have to change the configuration parameter inthe global.ini configuration file.

Automatic Log Backup
Automatic log backup can be enabled or disabled using parameter enable_auto_log_backup. Per default the automatic log backup is enabled.

During normal system operation (log mode normal), it is recommended that automatic log backup be kept activated. If automatic log backup is disabled, the log area grows until the file system is full. If the file system is full, the database will freeze until the situation is solved.

Log Modes
After changing the log mode parameter log_mode, you must restart the SAP HANA database to activate the changes.

We also recommend that you create a full data backup of the database. This parameter can have the following values:

o        log_mode = normal (default)
        Log segments are automatically backed up if automatic log backup is enabled. log_mode = normal is recommended to provide support for point-in-time recovery.

o        log_mode = overwrite
        Log segments are freed by savepoints and no log backup is performed. This can be useful for test installations that do not need to be backed up or recovered.

log_mode = overwrite is not recommended for production systems. No point-in-time recovery is possible. For recovery, only data backups are used; the logs must be disregarded.

o        log_mode = legacy
        log_mode = legacy is still supported, but not recommended by SAP

Log Backup Timeout
        You can specify a timeout for log backups. The timeout determines the intervals at which the logs are backed up. If the log segments become full before the timeout, the logs will be backed up anyway. The log backup timeout is controlled by the parameter log_backup_timeout_s.

Specifying an appropriate time interval for log backups enables you to recover a SAP HANA database with minimum data loss. For example, if you need to recover the database in a situation where the log area is unusable, and only the data and log backups are available.

Recommended: Specify a time interval. For example 900s.

If you specify a timeout 0, the log backups are only made when a log segment is full and when services are restarted.

Note: The backup timeout only takes effect if automatic log backups are enabled.

The backup catalog is stored in the SAP HANA database and is part of the log backups. This allows access to the backup catalog during recovery.

Even in situations such as when log mode overwrite is set, where logs backups are not written, the backup catalog is still backed up. The backup catalog is backed up and versioned after every completed backup operation. If the backup catalog is saved using a third-party backup tool, the versioning of the backup catalog is handled by the backup tool.

Q4. How do I recover my SAP HANA database? How do I recover my SAP HANA database, if the data/log area is unusable? Or if a logical error occurred (e. g. some data were deleted by mistake)?

Answer;
The SAP HANA studio provides point-in-time recovery with GUI support.

In a recovery situation, you can use SAP HANA studio to restore customer-specific changes (configuration settings) either before or at the end of the recovery. To recover your database from SAP HANA studio -> navigator view -> open context menu of the database -> choose “Recover…”. Confirm that the database can be shut down – recovery is only done when the database is offline. In the recovery wizard choose one of the recovery types:

•        Recover the database to its most recent state.

•        Recover the database to a certain point in time.

•        Recover the database to a specific data backup. (Just a specific data backup is used for recovery; Log entries are not replayed either from the log backups or from the log area.)

Note: The SAP HANA database will be initialized with the specified data backup. This backup becomes the first backup of a new database life cycle. Older backups are not compatible with the log written after the recovery.

•        Recover the database to a certain log position. (Advanced option to be used in exceptional cases if a previous recovery failed.)

The SAP HANA database uses the backup catalog to identify the destination of the backups. You do not need to specify whether File or Backint was used for the backup.

If the data area is unusable, and all the data changes after the last complete data backup are still available in the log backups and log area, the data from committed transactions that was in memory at the time of failure can be recovered.

No committed data is lost.

For recovery, all relevant log backups made after the data backup are needed.

Perform a database recovery to the most recent state.

If the log area is unusable, it is only possible to replay the log backups. As a consequence, any changes that were made after the most recent log backup will be lost. In addition, all the transactions that were open during the log backup will be rolled back.

For recovery, the data backups and the log backups are needed.

Perform a database recovery to the most recent state and in the Recovery Wizard choose the option Initialize log area to prevent the recovery of entries from the unusable log area.

To reset the database to a particular point in time, you need a data backup from before the point in time to recover to, the subsequent log backups, and the log area. All changes made after the recovery time will be lost.

Perform a database recovery to a point-in-time before the logical error.

Q5.  When modeling with Analytic Views in SAP HANA, you need to deal with billions of rows of data. What features/capabilities of HANA could you use?

Answer;
Partitioning SAP HANA database tables
When modeling with Analytic Views in SAP HANA, you can use the table partitioning feature of the database. This allows tables to be partitioned horizontally into disjunctive sub-tables or partitions as they are also known.

Partitioning supports the creation of very large tables by decomposing them into smaller and more manageable pieces.

It is transparent for most SQL queries and Data Manipulation Language (DML) statements.

SAP HANA database tables with more than two billion rows
A non-partitioned SAP HANA database table can store 2 billion rows. By utilizing SAP HANA partitioning as described in the guide SAP HANA Database – Partitioning and Distributing Large Tables; this limitation of 2 billion rows for each database table can be overcome by distributing the table rows to several partitions.

Please note, each partition can contain up to 2 billion rows. For example a 3 billion row table can be partitioned into 2 separate tables each containing 1.5 billion rows. Having more than 2 billion rows in an internal table (like for intermediate results) is not supported.

Characteristics of modeling view types with big data tables
By utilizing SAP HANA database partitioning large datasets of over 2 billion rows can be operated upon. When modeling with, or querying on, these large data sets you should use Analytic Views.

Queries on this large amount of data might lead into an “out of memory” situation if a large (intermediate) result set is requested.

It is recommended to  avoid the following query types/modeling constructs in analytic views as they might lead into an “out of memory” situation:

•        Do not use group by on key or key like constructs

•        Do not reference to key or key like constructs

Do not use formulas, which are based on calculated attributes/keys/key like constructs

Q6. A user has written a complex SQL statement (or it was autogenerated by a BI Client Tool) and would like to find out where most of the execution time is spent.

Answer;
You can use the functionality of EXPLAIN PLAN to find out where execution time is spent. The code structure is as below;

Populating QUERY_PLANS view
EXPLAIN PLAN SET PLAN_ID = <string> FOR <SQL statement>
Querying QUERY_PLANS view
SELECT
LPAD (‘ ‘, LEVEL * 2) || OPERATOR_NAME,  <additional columns>
FROM
QUERY_PLANS
WHERE
PLAN_ID = <string>
ORDER BY
OPERATOR_ID

Example:
alter session set current_schema=tpchsmoke;
delete from query_plans;
explain plan for select * from r_lineitem left outer join r_part on l_partkey = p_partkey where l_quantity < 300;

select lpad(‘ ‘, level) || operator_name, operator_details from query_plans order by operator_id;

Q7. Alerts are triggered in HANA regarding memory usage and upon further investigation, HANA Studio and DBACockpit  seem to provide 2 different views on the resident memory. Can you explain what may be going on?

Answer;
Before proceeding with analysing the different HANA specific memory areas, it has to be clarified that indeed the database and not processes running outside the SAP Software are responsible.

This involves calculating how much resident memory is used by the HANA database. On the HANA appliance, resident memory used outside HANA (OS, 3rd party processes) is typically quite small, not exceeding 2 GB.

HANA Studio and DBACockpit provide 2 different views on the resident memory:

HANA Studio:

The following SQL statements are behind this output:
‘Database Resident’:

SELECT SUM(PHYSICAL_MEMORY_SIZE) FROM M_SERVICE_MEMORY

‘Total Resident’:
SELECT T1.HOST, T1.USED_PHYSICAL_MEMORY + T2.SHARED_MEMORY_ALLOCATED_SIZE

FROM M_HOST_RESOURCE_UTILIZATION AS T1

JOIN (SELECT M_SERVICE_MEMORY.HOST, SUM(M_SERVICE_MEMORY.SHARED_MEMORY_ALLOCATED_SIZE) AS
SHARED_MEMORY_ALLOCATED_SIZE

FROM SYS.M_SERVICE_MEMORY

GROUP BY M_SERVICE_MEMORY.HOST) AS T2

ON T2.HOST = T1.HOST;

Since the difference between ‘Total Resident’ and ‘Database Resident’ is well below 2 GB, there is no indication that processes outside the database contribute significantly to memory consumption.

DBACockpit:

This is effectively the output of the following statement:
SELECT HOST,

ROUND(USED_PHYSICAL_MEMORY/1024/1024/1024, 2) AS “Resident GB”,

ROUND((USED_PHYSICAL_MEMORY + FREE_PHYSICAL_MEMORY)/1024/1024/1024, 2) AS “Physical Memory GB”

FROM PUBLIC.M_HOST_RESOURCE_UTILIZATION

DBACockpit does not currently consider shared memory as part of the resident memory, hence the difference to what HANA Studio reports.

Q8. An ‘out of memory’ error is not yet triggered, but further investigation on the current memory usage of the database might be required.

How would you proceed with the detailed analysis?

Answer;
There are 2 cases that have to be distinguished in general:

1.        An ‘out of memory’ error has already occurred.

2.        An ‘out of memory’ error was not yet triggered, but further investigation on the current memory usage of the database might be required.

For case (1), the foundation for further analysis is the oom trace file which follows the naming convention <processname>_<hostname>.<number>.rtedump.<number>.oom.trc (for example indexserver_hdbnode1.39503.rtedump.72290.oom.trc).

In case (2), such a trace file is not yet available and an rte dump has to be generated which has a structure similar to the oom trace.

For case (2), further analysis can also be done ‘top-down’, making use of the information provided by the different administrative frontends (HANA Studio / DBACockpit) and monitoring views.

2. Top down approach

2.1 Which HANA process on which host is using most of the memory?

SQL Statement:

SELECT TOP 3 HOST, PORT, SERVICE_NAME, TOTAL_MEMORY_USED_SIZE

FROM M_SERVICE_MEMORY

ORDER BY TOTAL_MEMORY_USED_SIZE DESC

2.2 Is shared or heap memory using the largest share?
So far, the sum of used heap and shared memory has been determined. Now, this total has to be split:

SQL Statement:
SELECT TOP 3 HOST, PORT, SERVICE_NAME, HEAP_MEMORY_USED_SIZE, SHARED_MEMORY_USED_SIZE,
TOTAL_MEMORY_USED_SIZE
FROM M_SERVICE_MEMORY
ORDER BY SHARED_MEMORY_USED_SIZE DESC

If it is shared memory, further analysis is required on ‘Shared Memory Usage’, otherwise, further analyse ‘Heap memory usage’ (see further details below)

3. Trace file approach
The following sections of those trace files are usually relevant:
[IPMM_MEMORY]

The first part of this section lists the local (heap) memory the processes that make up the HANA database are currently using.

For each process, the allocation limit (PAL), the amount of memory currently allocated (AB) and used (U) are displayed.

An AB value that is significantly larger than an U value does not necessarily mean a problem: If necessary, allocated, but not used memory will be released.

Of particular interest are lines where AB, U and PAL have approximately the same value. This particular process has then reached its allocation limit. Further analysis based on the output of M_HEAP_MEMORY in section [PROCESS_INFO] has to be done then.

The second part of section [IPMM_MEMORY] contains information regarding the shared memory.

Shared Memory is used for various purposes. For practical purposes, it is sufficient to assume that mainly row store tables occupy shared memory. In case of the ‘early exits since’ output in the oom trace, the error was triggered by the inability to further increase the TPAL (temporary allocation limit) of the failing process.

The allocation limit of a process is not set to the maximum right at process start, but increased (and decreased) over time. In this specific case, ‘sum of temp process allocation limits’ + SHARED_MEMORY exceeds the GLOBAL_MAX_ALLOCATION_LIMIT which consequently causes the error.

MEMORY_OOM
Before any detailed analysis is done, the information from section [MEMORY_OOM] has to be reviewed as well. It might be the case that the request was that large that it was anyway impossible to fulfil it:

The following checks can then be done based on the information available so far:

4. Shared Memory Usage
Contrary to heap memory, which is allocated using the ‘malloc’ system call, shared memory is provided using the ‘shmget’ call. The results of shared memory creation can then be viewed on OS level using the ipcs command.

4.1 Memory usage of the row store

1.        The row store is organized in 64MB segments. Deleting large number of rows can lead to sparse segments and unnecessary use of memory.

2.        Indirectly, high memory usage of the row store can cause problems when parameter client_distribution_mode is set to ‘off’ in distributed environments. When default value ‘statement’ is used, the statement is sent to the node where table is located. With the setting ‘off’, the statement might then also be directed to the master node.

Since all row store tables are usually located on the master node and consequently, the row store uses a lot of memory, materializing a large amount of data on the master node (from a table that is actually located on another node) can then simply be too much.

3.        Too many large tables were created as row store tables. The largest tables currently contained in the row store can be retrieved by the following SQL Statement:

SELECT TOP 50 *

FROM M_RS_TABLES

ORDER BY (ALLOCATED_FIXED_PART_SIZE +

ALLOCATED_VARIABLE_PART_SIZE) DESC

It is however important to keep in mind that in many scenarios, switching a table from rowstore to columnstore and vice versa must not be done. This applies to Suite on HANA / BW systems which are delivered with particular settings for different table types.

5. Heap memory usage
If it can be ruled out that the rowstore is responsible for high memory consumption, the heap memory allocation of the individual processes has to be analyzed next. In most cases, the process of interest is the indexserver, since it usually allocates the greatest share of the available memory. The following possibilities exist to do this:

1.        If an oom error has already occurred or a rte dump was explicitly triggered, the content of the system view M_HEAP_MEMORY (section[PROCESS_INFO]) that was written into the trace file can be used.

2.        If an adhoc analysis of the heap memory consumption should be done, the view M_HEAP_MEMORY can be queried directly, for example using HANA Studio.

In case of (1), the output can be directly copied into a csv file and further be analyzed for example using excel. Each line represents an allocator, which is a facility for allocating heap memory in HANA.

Of interest are those pools that are on the top of the list and have an inclusive size close to the exclusive size. Depending on the pool names, the following known issues exist: Pool/parallel/ihm

There are cases where this allocator used 90% of the available memory. If such a problem is observed, the system should be upgraded to revision 54 when available.

Pool/itab
This allocator is needed for the intermediate result of join operation and translation table used for join operation. This could indicate use of a suboptimal model.

Pool/FemsCompression/CompositeFemsCompressionPool/FemsCompression/CompositeFemsCompression
A known issue is associated with high memory usage that is solved with revision 53. FEMS compression is used in BW systems to keep the size of the dataset that is transferred to the application server as small as possible.

Pool/ValueArray
This is the allocator which keeps the complete resultset in uncompressed form. A known issue has been solved with revision 55 which addresses memory used by this allocator not released after oom situations, thus steadily increasing over time.

Pool/JoinEvaluator/TranslationTable
Translation tables are created for caching join column entries of join engine queries. The number of translation tables that are kept in memory is determined by the following parameter in indexserver.ini:

[joins]

translator_cache_size = n

The default value is 2000. In case of excessive memory usage for this allocator, the parameter can be set to 500. From SP6 onwards it will be possible to adjust a memory limit for these tables instead of specifying a number.

6. Memory Usage of the Statistics server
Even though the statisticsserver can also run into oom errors, it should not consume a large share of the available memory. The statisticsserver is supposed to have a much smaller allocation limit compared to the indexserver. This following two configuration parameters are relevant:
allocationlimit

minallocationlimit

The allocation limit of the statisticsserver is either a percentage of the availably RAM or just the value of minallocationlimit, depending on what is the bigger value.

In case of oom errors that indicate that the statisticsserver has reached its process allocation limit, it can make sense to increase the allocationlimit to 10% or 15% to have at least a workaround for the time being.

More Interview Questions? Have a look at:

151 SAP HANA Interview Questions with Answers & Explanations – eBook