Return to site

Hive Variable Dictionary

Hive技术提升系列

· Hive,bigdata

中文粗糙版

  1. hive.server2.session.check.interval = 1 hour
    - 用于检查 session/operation是否超时的间隔时间,
    单位为毫秒, 若设置为0或者负值, 表示禁用。
    - 例子,  若取 "3600000"  表示每1个小时检查一次
  2. hive.server2.idle.operation.timeout = 1 day  |  
     - 如果operation | Session在指定时间内无access操作, 则会被关闭。
    - 单位为毫秒, 若设置为0或者负值, 表示禁用。
  3. hive.server2.idle.session.timeout = 3 days

4.hive.exec.scratchdir
- 在Hive提交的 map/reduce 任务中间过程产生的临时文件所存储的目录。

5. hive.execution.engine=spark;

- 设置 hive execution 引擎。 除 spark外还可以设置 tez, mr. mr 是默认。 以下相关子参数:
- spark.executor.memory , 分配给Remote Spark Context (RSC)内存大小
- spark.executor.cores 每个executor使用核的数量

6. hive.default.fileformat
- hive建立表时默认的文件格式 (默认值:TextFile) 包括:

7.hive.exec.dynamic.partition.mode

如果是strict的话,将会禁止所有动态分区的查询操作。

当我们要进入动态分区插入时, 应该 设置为 nonstrict.

English version

1). hive.server2.idle.session.timeout
Session will be closed when not accessed for this duration of time, in milliseconds; disable by setting to zero or a negative value.
For example, the value of “86400000” indicate that the session will be timed out after 1 day of inactivity.

2). hive.server2.session.check.interval
The check interval for session/operation timeout, in milliseconds, which can be disabled by setting to zero or a negative value.
For example, the value of “3600000” indicate that the session will be checked every 1 hour.

3) hive.server2.idle.operation.timeout
Operation will be closed when not accessed for this duration of time, in milliseconds; disable by setting to zero. For a positive value, checked for operations in terminal state only (FINISHED, CANCELED, CLOSED, ERROR). For a negative value, checked for all of the operations regardless of state.
For example, the value of “7200000” indicate that the query/operation will be timed out after 2 hours if it is still running.

4) hive.exec.scratchdir

Scratch space for Hive jobs. This directory is used by Hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages.

5) hive.execution.engine

  • spark.executor.memory: Amount of memory to use per executor process.
  • spark.executor.cores: Number of cores per executor.
  • spark.yarn.executor.memoryOverhead: The amount of off heap memory (in megabytes) to be allocated per executor, when running Spark on Yarn. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. In addition to the executor's memory, the container in which the executor is launched needs some extra memory for system processes, and this is what this overhead is for.

  • spark.executor.instances: The number of executors assigned to each application.
  • spark.driver.memory: The amount of memory assigned to the Remote Spark Context (RSC). We recommend 4GB.
  • spark.yarn.driver.memoryOverhead: We recommend 400 (MB).
6) hive.default.fileformat
Default file format for CREATE TABLE statement. Options are TextFile, SequenceFile, RCFile, and Orc.

7) hive.exec.dynamic.partition.mode

to protect against dynamic partition insert is that the user may accidentally specify all partitions to be dynamic partitions without specifying one static partition, while the original intention is to just overwrite the sub-partitions of one root partition. We define another parameter hive.exec.dynamic.partition.mode=strict to prevent the all-dynamic partition case. In the strict mode, you have to specify at least one static partition. The default mode is strict. In addition, we have a parameter hive.exec.dynamic.partition=true/false to control whether to allow dynamic partition at all. The default value is false prior to Hive 0.9.0 and true in Hive 0.9.0 and later.

More

All Posts
×

Almost done…

We just sent you an email. Please click the link in the email to confirm your subscription!

OK