Data extractor
DX offers a self-hosted Data Extractor for customers who need to keep API credentials within their network or cannot allowlist incoming requests from DX. It connects to on-prem tools like GitLab and Jira, and pushes metadata to your Data Cloud database.
The Extractor is distributed as a Docker image. You’ll run a separate instance (e.g., a K8s pod) for each data source. For example, to connect both GitLab and Jira, you would deploy two Extractor instances, each configured with environment variables for its respective tool.
Docker images are distributed via GitHub Package Registry:
https://github.com/orgs/get-dx/packages/container/package/extractor
Requirements
- 2 GiB RAM and 1 vCPU per extractor instance
- Each instance must:
- Run in the same security context as the data source
- Have outbound access to
https://yourinstance.getdx.net
- DX Data Cloud credentials: API key link
- Tokens and credentials for each data source:
Deployment
Recommended method: Kubernetes (GKE, EKS, AKS)
- Create a new Kubernetes cluster
- Set up logging for support/debugging
- Copy and customize the appropriate deployment YAML (see below)
- Run
kubectl apply
to deploy - Use
kubectl logs
to verify startup
Monitoring
DX monitors import success. For additional monitoring:
- Check logs for crashes or failed imports
- Monitor pod
restartCount
- Alert on log patterns
YAML Templates
GitHub
Required environment variables
Name | Description |
---|---|
EXTRACTION_TYPE | Must be set to github Example: github |
DATACLOUD_URL | Your Data Cloud instance URL. Example: https://yourinstance.getdx.net |
DATACLOUD_KEY | Data Cloud API key. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
GITHUB_URL | API base URL of your GitHub instance. Example: https://github.myteam.com/api/v3/ |
GITHUB_APP_ID | GitHub App ID Example: 320840 |
GITHUB_PEM_64 | Base64 encoded content of your PEM file. |
EXTRACTOR_PROXY_URL | Proxy URL - Optional. Acts as middleware to forward API requests to DataCloud. Example: proxy.getdx.net |
EXTRACTOR_PROXY_PORT | Proxy port Example: 80 |
EXTRACTOR_PROXY_USER | Proxy username Example: dxuser |
EXTRACTOR_PROXY_PASS | Proxy password |
Kubernetes deployment YAML template (GitHub)
apiVersion: apps/v1
kind: Deployment
metadata:
name: dx-extractor-github
spec:
replicas: 1
selector:
matchLabels:
app: dx-extractor-github
template:
metadata:
labels:
app: dx-extractor-github
spec:
containers:
- name: dx-extractor
image: ghcr.io/get-dx/extractor:latest
env:
- name: DATACLOUD_URL
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: DATACLOUD_URL
- name: DATACLOUD_KEY
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: DATACLOUD_KEY
- name: EXTRACTION_TYPE
value: "github"
- name: GITHUB_PEM_64
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: GITHUB_PEM_64
- name: GITHUB_URL
value: "https://api.github.com"
- name: GITHUB_APP_ID
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: GITHUB_APP_ID
- name: LOG_LEVEL
value: "DEBUG"
- name: LOG_FORMAT
value: "json"
restartPolicy: Always
GitLab
Required environment variables
Name | Description |
---|---|
EXTRACTION_TYPE | Must be set to gitlab Example: gitlab |
DATACLOUD_URL | Your Data Cloud instance URL. Example: https://yourinstance.getdx.net |
DATACLOUD_KEY | Data Cloud API key. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
GITLAB_URL | API base URL of your GitHub instance. Example: https://gitlab.com/ |
GITLAB_API_TOKEN | GitHub App ID Example: glpat-31RAZpMWxzX\_m9BBnLyY |
EXTRACTOR_PROXY_URL | Proxy URL for to send api request to datacloud Example: proxy.getdx.net |
EXTRACTOR_PROXY_PORT | Proxy port Example: 80 |
EXTRACTOR_PROXY_USER | Proxy username Example: dxuser |
EXTRACTOR_PROXY_PASS | Proxy password |
Kubernetes deployment YAML template (GitLab)
apiVersion: apps/v1
kind: Deployment
metadata:
name: dx-extractor-gitlab
spec:
replicas: 1
selector:
matchLabels:
app: dx-extractor-gitlab
template:
metadata:
labels:
app: dx-extractor-gitlab
spec:
containers:
- name: dx-extractor
image: ghcr.io/get-dx/extractor:latest
env:
- name: DATACLOUD_URL
valueFrom:
secretKeyRef:
name: gitlab-connector-secrets
key: DATACLOUD_URL
- name: DATACLOUD_KEY
valueFrom:
secretKeyRef:
name: gitlab-connector-secrets
key: DATACLOUD_KEY
- name: EXTRACTION_TYPE
value: "gitlab"
- name: GITLAB_URL
value: "https://gitlab.com/"
- name: GITLAB_API_TOKEN
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: GITLAB_API_TOKEN
- name: LOG_LEVEL
value: "DEBUG"
- name: LOG_FORMAT
value: "json"
restartPolicy: Always
Bitbucket Data Center
Required environment variables
Name | Description |
---|---|
EXTRACTION_TYPE | Must be set to bitbucket_data_center Example: bitbucket_data_center |
DATACLOUD_URL | Your Data Cloud instance URL. Example: https://yourinstance.getdx.net |
DATACLOUD_KEY | Data Cloud API key. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
BITBUCKET_URL | API base URL of your Bitbucket Data Center instance. Example: https://bitbucket.somehost.net |
BITBUCKET_USERNAME | Username of your Bitbucket service account (if using basic auth). Example: dxuser |
BITBUCKET_PASSWORD | Password of your Bitbucket service account (if using basic auth). Example: password |
BITBUCKET_API_KEY | API key of your Bitbucket service account if not using Basic Auth Example: api\_key |
EXTRACTOR_PROXY_URL | Proxy URL for to send api request to datacloud Example: proxy.getdx.net |
EXTRACTOR_PROXY_PORT | Proxy port Example: 80 |
EXTRACTOR_PROXY_USER | Proxy username Example: dxuser |
EXTRACTOR_PROXY_PASS | Proxy password |
BITBUCKET_PROJECT_KEYS_ALLOWLIST | (optional) Comma-delimited list of project keys for DX to import Example: PROJ1,PROJ2 |
Kubernetes deployment YAML template (Bitbucket Data Center)
apiVersion: apps/v1
kind: Deployment
metadata:
name: dx-extractor-bitbucket
spec:
replicas: 1
selector:
matchLabels:
app: dx-extractor-bitbucket
template:
metadata:
labels:
app: dx-extractor-bitbucket
spec:
containers:
- name: dx-extractor
image: ghcr.io/get-dx/extractor:latest
env:
- name: DATACLOUD_URL
valueFrom:
secretKeyRef:
name: dx-secrets
key: DATACLOUD_URL
- name: DATACLOUD_KEY
valueFrom:
secretKeyRef:
name: dx-secrets
key: DATACLOUD_KEY
- name: EXTRACTION_TYPE
value: "bitbucket_data_center"
- name: BITBUCKET_URL
value: "https://bitbucket.somehost.net"
- name: BITBUCKET_API_KEY
valueFrom:
secretKeyRef:
name: dx-secrets
key: BITBUCKET_API_KEY
- name: BITBUCKET_USERNAME # Required for basic auth only
valueFrom:
secretKeyRef:
name: dx-secrets
key: BITBUCKET_USERNAME
- name: BITBUCKET_PASSWORD # Required for basic auth only
valueFrom:
secretKeyRef:
name: dx-secrets
key: BITBUCKET_PASSWORD
- name: LOG_LEVEL
value: "DEBUG"
- name: LOG_FORMAT
value: "json"
restartPolicy: Always
Jira Data Center
Required environment variables
Name | Description |
---|---|
EXTRACTION_TYPE | Must be set to jira_data_center Example: jira_data_center |
DATACLOUD_URL | Your Data Cloud instance URL. Example: https://yourinstance.getdx.net |
DATACLOUD_KEY | Data Cloud API key. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
JIRA_URL | API base URL of your Jira Data Center instance. Example: https://jira.somehost.net/rest/api/2/ |
JIRA_API_TOKEN | Personal Access Token (PAT) for your Jira service account. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
JIRA_USERNAME | Username of your Jira service account (if using basic auth). Example: dxuser |
JIRA_PASSWORD | Password of your Jira service account (if using basic auth). Example: password |
EXTRACTOR_PROXY_URL | Proxy URL for to send api request to datacloud Example: proxy.getdx.net |
EXTRACTOR_PROXY_PORT | Proxy port Example: 80 |
EXTRACTOR_PROXY_USER | Proxy username Example: dxuser |
EXTRACTOR_PROXY_PASS | Proxy password |
User Linking
Unlike other Jira integrations, the Jira extractor does NOT extract user data by itself. Instead, as Jira issues come in, DX looks at the creator/assignee and create/updates the Jira user record in the database accordingly. This may cause delays in syncing user data or unlinked Jira usernames.
Kubernetes deployment YAML template (Jira Data Center)
apiVersion: apps/v1
kind: Deployment
metadata:
name: dx-extractor-jira
spec:
replicas: 1
selector:
matchLabels:
app: dx-extractor-jira
template:
metadata:
labels:
app: dx-extractor-jira
spec:
containers:
- name: dx-extractor
image: ghcr.io/get-dx/extractor:latest
env:
- name: DATACLOUD_URL
valueFrom:
secretKeyRef:
name: dx-secrets
key: DATACLOUD_URL
- name: DATACLOUD_KEY
valueFrom:
secretKeyRef:
name: dx-secrets
key: DATACLOUD_KEY
- name: EXTRACTION_TYPE
value: "jira_data_center"
- name: JIRA_URL
value: "https://jira.somehost.net/rest/api/2/"
- name: JIRA_API_TOKEN
valueFrom:
secretKeyRef:
name: dx-secrets
key: JIRA_API_TOKEN
- name: JIRA_USERNAME # Required for basic auth only
valueFrom:
secretKeyRef:
name: dx-secrets
key: JIRA_USERNAME
- name: JIRA_PASSWORD # Required for basic auth only
valueFrom:
secretKeyRef:
name: dx-secrets
key: JIRA_PASSWORD
- name: LOG_LEVEL
value: "DEBUG"
- name: LOG_FORMAT
value: "json"
restartPolicy: Always