Return to site

Azkaban

ETL任务的协调管理工具

· 技术,ETL

Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.

 

Azkaban 是我目前几乎每天都要打开的界面, http://192.168.79.xxx:8443/history 则是我每天几乎都要浏览的页面。

2018-09 月开始, 我开始接手azkaban这套etl工具。 目前使用版本是在2.5.x。

azkaban 的官方文档

Features and Sources

  • Compatible with any version of Hadoop
  • Easy to use web UI
  • Simple web and http workflow uploads
  • Project workspaces
  • Scheduling of workflows
  • Modular and pluginable
  • Authentication and Authorization
  • Tracking of user actions
  • Email alerts on failure and successes
  • SLA alerting and auto killing
  • Retrying of failed jobs
Sources

Configurations

Azkaban Web Server Configurations

General Properties

- azkaban.name

The name of the azkaban instance that will show up in the UI.

(UI网页中显示的Azkaban实例名称) 默认值是 Local

- azkaban.label

A label to describe Azkaban instance Default: My Local Azkaban

- azkaban.color

Hex value that allows you to set a style color for the Azkaban UI.

(UI网页中描述样式颜色的16进制值) 默认值是 #FF3601

- web.resource.dir

Sets the directory for the UI's css and javascript files Default: web/

- default.timezone

The timezone that will be displayed by Azkaban Default: America/Los_Angeles

- viewer.plugin.dir

Directory where viewer plugins are installed. Default: plugins/viewer

- job.max.Xms

The maximum initial amount of memory each job can request.

The validation is performed at project at upload time. Default: 1GB

- job.max.Xmx

The maximum amount of memory each job can request.

This validation is performed at project upload time. Default: 2GB

Multiple Executor Mode Parameters

- azkaban.use.multiple.executors

Should azkaban run in mtulti-executor mode. Default: false

- azkaban.executorselector.filters

A common separated list of hard filters to be used while dispatching.

To be choosen from StaticRemaining, FlowSize, MinumumFreeMemory and CpuStatus.

- azkaban.executorselector.comparator.{ComparatorName}

Integer weight to be rank available executors for a given flow.

Currently, {ComparatorName} can be NumberOfAssignedFlow Comparator, Memory, LastDispatched and CpuUsage as ComparatorName.

For example: azkaban.executorselect.comparator.Memory=2

 

- azkaban.queueprocessing.enabled

Should queue processor be enabled from webserver initialization. Default: true

- azkaban.use.multiple.executors

Should azkaban run in mtulti-executor mode. Default: false

- azkaban.executorselector.filters

A common separated list of hard filters to be used while dispatching.

To be choosen from StaticRemaining, FlowSize, MinumumFreeMemory and CpuStatus.

- azkaban.executorselector.comparator.{ComparatorName}

Integer weight to be rank available executors for a given flow.

Currently, {ComparatorName} can be NumberOfAssignedFlow Comparator, Memory, LastDispatched and CpuUsage as ComparatorName.

For example: azkaban.executorselect.comparator.Memory=2

- azkaban.queueprocessing.enabled

Should queue processor be enabled from webserver initialization. Default: true

All Posts
×

Almost done…

We just sent you an email. Please click the link in the email to confirm your subscription!

OKSubscriptions powered by Strikingly