Data extractor

DX offers a self-hosted Data Extractor for customers who need to keep API credentials within their network or cannot allowlist incoming requests from DX. It connects to on-prem tools like GitLab and Jira, and pushes metadata to your Data Cloud database.

The Extractor is distributed as a Docker image. You’ll run a separate instance (e.g., a K8s pod) for each data source. For example, to connect both GitLab and Jira, you would deploy two Extractor instances, each configured with environment variables for its respective tool.

Docker images are distributed via GitHub Package Registry:

https://github.com/orgs/get-dx/packages/container/package/extractor

Requirements

  • 2 GiB RAM and 1 vCPU per extractor instance
  • Each instance must:
    • Run in the same security context as the data source
    • Have outbound access to https://yourinstance.getdx.net
  • DX Data Cloud credentials: API key link
  • Tokens and credentials for each data source:

Deployment

Recommended method: Kubernetes (GKE, EKS, AKS)

  1. Create a new Kubernetes cluster
  2. Set up logging for support/debugging
  3. Copy and customize the appropriate deployment YAML (see below)
  4. Run kubectl apply to deploy
  5. Use kubectl logs to verify startup

Monitoring

DX monitors import success. For additional monitoring:

  • Check logs for crashes or failed imports
  • Monitor pod restartCount
  • Alert on log patterns

YAML Templates

GitHub

Required environment variables

Name Description
EXTRACTION_TYPE Must be set to github

Example:
github
DATACLOUD_URL Your Data Cloud instance URL.

Example:
https://yourinstance.getdx.net
DATACLOUD_KEY Data Cloud API key.

Example:
mPB5sf6w3JahSLMherWp8B7nTps13FKY
GITHUB_URL API base URL of your GitHub instance.

Example:
https://github.myteam.com/api/v3/
GITHUB_APP_ID GitHub App ID

Example:
320840
GITHUB_PEM_64 Base64 encoded content of your PEM file.
EXTRACTOR_PROXY_URL Proxy URL - Optional. Acts as middleware to forward API requests to DataCloud.

Example:
proxy.getdx.net
EXTRACTOR_PROXY_PORT Proxy port

Example:
80
EXTRACTOR_PROXY_USER Proxy username

Example:
dxuser
EXTRACTOR_PROXY_PASS Proxy password

Kubernetes deployment YAML template (GitHub)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dx-extractor-github
spec:
  replicas: 1
  selector:
	matchLabels:
  	app: dx-extractor-github
  template:
	metadata:
  	labels:
    	app: dx-extractor-github
	spec:
  	containers:
  	- name: dx-extractor
    	image: ghcr.io/get-dx/extractor:latest
    	env:
    	- name: DATACLOUD_URL
      	valueFrom:
        	secretKeyRef:
          	name: github-connector-secrets
          	key: DATACLOUD_URL
    	- name: DATACLOUD_KEY
      	valueFrom:
        	secretKeyRef:
          	name: github-connector-secrets
          	key: DATACLOUD_KEY
    	- name: EXTRACTION_TYPE
      	value: "github"
    	- name: GITHUB_PEM_64
      	valueFrom:
        	secretKeyRef:
          	name: github-connector-secrets
          	key: GITHUB_PEM_64
    	- name: GITHUB_URL
      	value: "https://api.github.com"
    	- name: GITHUB_APP_ID
      	valueFrom:
        	secretKeyRef:
          	name: github-connector-secrets
          	key: GITHUB_APP_ID
    	- name: LOG_LEVEL
      	value: "DEBUG"
    	- name: LOG_FORMAT
      	value: "json"
  	restartPolicy: Always

GitLab

Required environment variables

Name Description
EXTRACTION_TYPE Must be set to gitlab

Example:
gitlab
DATACLOUD_URL Your Data Cloud instance URL.

Example:
https://yourinstance.getdx.net
DATACLOUD_KEY Data Cloud API key.

Example:
mPB5sf6w3JahSLMherWp8B7nTps13FKY
GITLAB_URL API base URL of your GitHub instance.

Example:
https://gitlab.com/
GITLAB_API_TOKEN GitHub App ID

Example:
glpat-31RAZpMWxzX\_m9BBnLyY
EXTRACTOR_PROXY_URL Proxy URL for to send api request to datacloud

Example: proxy.getdx.net
EXTRACTOR_PROXY_PORT Proxy port

Example:
80
EXTRACTOR_PROXY_USER Proxy username

Example:
dxuser
EXTRACTOR_PROXY_PASS Proxy password

Kubernetes deployment YAML template (GitLab)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dx-extractor-gitlab
spec:
  replicas: 1
  selector:
	matchLabels:
  	app: dx-extractor-gitlab
  template:
	metadata:
  	labels:
    	app: dx-extractor-gitlab
	spec:
  	containers:
  	- name: dx-extractor
    	image: ghcr.io/get-dx/extractor:latest
    	env:
    	- name: DATACLOUD_URL
      	valueFrom:
        	secretKeyRef:
          	name: gitlab-connector-secrets
          	key: DATACLOUD_URL
    	- name: DATACLOUD_KEY
      	valueFrom:
        	secretKeyRef:
          	name: gitlab-connector-secrets
          	key: DATACLOUD_KEY
    	- name: EXTRACTION_TYPE
      	value: "gitlab"
    	- name: GITLAB_URL
      	value: "https://gitlab.com/"
    	- name: GITLAB_API_TOKEN
      	valueFrom:
        	secretKeyRef:
          	name: github-connector-secrets
          	key: GITLAB_API_TOKEN
    	- name: LOG_LEVEL
      	value: "DEBUG"
    	- name: LOG_FORMAT
      	value: "json"
  	restartPolicy: Always

Bitbucket Data Center

Required environment variables

Name Description
EXTRACTION_TYPE Must be set to bitbucket_data_center

Example: bitbucket_data_center
DATACLOUD_URL Your Data Cloud instance URL.

Example:
https://yourinstance.getdx.net
DATACLOUD_KEY Data Cloud API key.

Example:
mPB5sf6w3JahSLMherWp8B7nTps13FKY
BITBUCKET_URL API base URL of your Bitbucket Data Center instance.

Example:
https://bitbucket.somehost.net
BITBUCKET_USERNAME Username of your Bitbucket service account (if using basic auth).

Example:
dxuser
BITBUCKET_PASSWORD Password of your Bitbucket service account (if using basic auth).

Example:
password
BITBUCKET_API_KEY API key of your Bitbucket service account if not using Basic Auth

Example:
api\_key
EXTRACTOR_PROXY_URL Proxy URL for to send api request to datacloud

Example:
proxy.getdx.net
EXTRACTOR_PROXY_PORT Proxy port

Example:
80
EXTRACTOR_PROXY_USER Proxy username

Example:
dxuser
EXTRACTOR_PROXY_PASS Proxy password
BITBUCKET_PROJECT_KEYS_ALLOWLIST (optional) Comma-delimited list of project keys for DX to import

Example:
PROJ1,PROJ2

Kubernetes deployment YAML template (Bitbucket Data Center)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dx-extractor-bitbucket
spec:
  replicas: 1
  selector:
	matchLabels:
  	app: dx-extractor-bitbucket
  template:
	metadata:
  	labels:
    	app: dx-extractor-bitbucket
	spec:
  	containers:
  	- name: dx-extractor
    	image: ghcr.io/get-dx/extractor:latest
    	env:
    	- name: DATACLOUD_URL
      	valueFrom:
        	secretKeyRef:
          	name: dx-secrets
          	key: DATACLOUD_URL
    	- name: DATACLOUD_KEY
      	valueFrom:
        	secretKeyRef:
          	name: dx-secrets
          	key: DATACLOUD_KEY
    	- name: EXTRACTION_TYPE
      	value: "bitbucket_data_center"
    	- name: BITBUCKET_URL
      	value: "https://bitbucket.somehost.net"
      - name: BITBUCKET_API_KEY
            valueFrom:
            secretKeyRef:
            name: dx-secrets
            key: BITBUCKET_API_KEY
    	- name: BITBUCKET_USERNAME    # Required for basic auth only
      	valueFrom:
        	secretKeyRef:
          	name: dx-secrets
          	key: BITBUCKET_USERNAME
    	- name: BITBUCKET_PASSWORD    # Required for basic auth only
      	valueFrom:
        	secretKeyRef:
          	name: dx-secrets
          	key: BITBUCKET_PASSWORD
    	- name: LOG_LEVEL
      	value: "DEBUG"
    	- name: LOG_FORMAT
      	value: "json"
  	restartPolicy: Always

Jira Data Center

Required environment variables

Name Description
EXTRACTION_TYPE Must be set to jira_data_center

Example:
jira_data_center
DATACLOUD_URL Your Data Cloud instance URL.

Example:
https://yourinstance.getdx.net
DATACLOUD_KEY Data Cloud API key.

Example:
mPB5sf6w3JahSLMherWp8B7nTps13FKY
JIRA_URL API base URL of your Jira Data Center instance.

Example:
https://jira.somehost.net/rest/api/2/
JIRA_API_TOKEN Personal Access Token (PAT) for your Jira service account.

Example:
mPB5sf6w3JahSLMherWp8B7nTps13FKY
JIRA_USERNAME Username of your Jira service account (if using basic auth).

Example:
dxuser
JIRA_PASSWORD Password of your Jira service account (if using basic auth).

Example:
password
EXTRACTOR_PROXY_URL Proxy URL for to send api request to datacloud

Example:
proxy.getdx.net
EXTRACTOR_PROXY_PORT Proxy port

Example:
80
EXTRACTOR_PROXY_USER Proxy username

Example:
dxuser
EXTRACTOR_PROXY_PASS Proxy password

User Linking
Unlike other Jira integrations, the Jira extractor does NOT extract user data by itself. Instead, as Jira issues come in, DX looks at the creator/assignee and create/updates the Jira user record in the database accordingly. This may cause delays in syncing user data or unlinked Jira usernames.

Kubernetes deployment YAML template (Jira Data Center)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dx-extractor-jira
spec:
  replicas: 1
  selector:
	matchLabels:
  	app: dx-extractor-jira
  template:
	metadata:
  	labels:
    	app: dx-extractor-jira
	spec:
  	containers:
  	- name: dx-extractor
    	image: ghcr.io/get-dx/extractor:latest
    	env:
    	- name: DATACLOUD_URL
      	valueFrom:
        	secretKeyRef:
          	name: dx-secrets
          	key: DATACLOUD_URL
    	- name: DATACLOUD_KEY
      	valueFrom:
        	secretKeyRef:
          	name: dx-secrets
          	key: DATACLOUD_KEY
    	- name: EXTRACTION_TYPE
      	value: "jira_data_center"
    	- name: JIRA_URL
      	value: "https://jira.somehost.net/rest/api/2/"
      - name: JIRA_API_TOKEN
            valueFrom:
            secretKeyRef:
            name: dx-secrets
            key: JIRA_API_TOKEN
    	- name: JIRA_USERNAME    # Required for basic auth only
      	valueFrom:
        	secretKeyRef:
          	name: dx-secrets
          	key: JIRA_USERNAME
    	- name: JIRA_PASSWORD    # Required for basic auth only
      	valueFrom:
        	secretKeyRef:
          	name: dx-secrets
          	key: JIRA_PASSWORD
    	- name: LOG_LEVEL
      	value: "DEBUG"
    	- name: LOG_FORMAT
      	value: "json"
  	restartPolicy: Always