Data extractor
DX provides a self-hosted data extractor for customers that need to keep their API credentials stored within their networks or cannot allowlist API requests from DX. The self-hosted data extractor connects to on premise instances such as Gitlab and Jira, and pushes metadata to DX servers.
DX data extractors are provided as a Docker image (please contact your DX account manager for access). You’ll run separate instances (e.g., K8 pods) of the data extractor for each data source you want to import from. For example, if you would like to connect to both Gitlab and Jira, you would set up two data extractor instances that are each configured with environment variables specific to the data source they are connecting to.
Requirements
The infrastructure requirements are 2 GiB of memory and 1 vCPU, multiplied by the number of data extractor instances you are deploying. Each data extractor instance should be run in the same security context as systems that it will make API requests to (e.g. Gitlab, Jira, etc). Each data extractor instance also needs to be able to make outbound requests to your Data Cloud instance.
If you have expertise with Kubernetes, a managed service such as GKE, EKS, and AKS is the recommended method of deployment by following the steps below:
- Create a new Kubernetes cluster.
- Set up logging so logs can be easily retrieved for support.
- Make a copy of the appropriate Deployment YAML template below and set up secrets.
- Run
kubectl apply
to create the Deployment. - Tail the logs to verify successful execution.
DX monitors data imports to ensure that the data extractor is running successfully. If you would like additional system monitoring of the data extractor itself, we recommend monitoring for log output that indicates process failures or crashes (e.g., pod restartCount).
The remainder of this document outlines environment variables and Kubernetes YAML templates for each supported data source that you may configure a data extractor instance for. Remember that you’ll set up separate instances of the Data Extractor for each data source you want to import from.
GitHub
Required environment variables
Name | Description |
---|---|
EXTRACTION_TYPE | Must be set to github Example: gitlab |
DATACLOUD_URL | Your Data Cloud instance URL. Example: https://yourinstance.getdx.net |
DATACLOUD_KEY | Data Cloud API key. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
GITHUB_URL | API base URL of your GitHub instance. Example: https://github.myteam.com/api/v3/ |
GITHUB_APP_ID | GitHub App ID Example: 320840 |
GITHUB_PEM_64 | Base64 encoded content of your PEM file. |
EXTRACTOR_PROXY_URL | Proxy URL - Optional. Acts as middleware to forward API requests to DataCloud. Example: proxy.getdx.net |
EXTRACTOR_PROXY_PORT | Proxy port Example: 80 |
EXTRACTOR_PROXY_USER | Proxy username Example: dxuser |
EXTRACTOR_PROXY_PASS | Proxy password |
Kubernetes deployment YAML template
apiVersion: apps/v1
kind: Deployment
metadata:
name: dx-extractor-github
spec:
replicas: 1
selector:
matchLabels:
app: dx-extractor-github
template:
metadata:
labels:
app: dx-extractor-github
spec:
containers:
- name: dx-extractor
image: ghcr.io/get-dx/extractor:latest
env:
- name: DATACLOUD_URL
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: DATACLOUD_URL
- name: DATACLOUD_KEY
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: DATACLOUD_KEY
- name: EXTRACTION_TYPE
value: "github"
- name: GITHUB_PEM_64
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: GITHUB_PEM_64
- name: GITHUB_URL
value: "https://api.github.com"
- name: GITHUB_APP_ID
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: GITHUB_APP_ID
- name: LOG_LEVEL
value: "DEBUG"
- name: LOG_FORMAT
value: "json"
restartPolicy: Always
GitLab
Required environment variables
Name | Description |
---|---|
EXTRACTION_TYPE | Must be set to gitlab Example: gitlab |
DATACLOUD_URL | Your Data Cloud instance URL. Example: https://yourinstance.getdx.net |
DATACLOUD_KEY | Data Cloud API key. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
GITLAB_URL | API base URL of your GitHub instance. Example: https://gitlab.com/ |
GITLAB_API_TOKEN | GitHub App ID Example: glpat-31RAZpMWxzX\_m9BBnLyY |
EXTRACTOR_PROXY_URL | Proxy URL for to send api request to datacloud Example: proxy.getdx.net |
EXTRACTOR_PROXY_PORT | Proxy port Example: 80 |
EXTRACTOR_PROXY_USER | Proxy username Example: dxuser |
EXTRACTOR_PROXY_PASS | Proxy password |
Kubernetes deployment YAML template
apiVersion: apps/v1
kind: Deployment
metadata:
name: dx-extractor-gitlab
spec:
replicas: 1
selector:
matchLabels:
app: dx-extractor-gitlab
template:
metadata:
labels:
app: dx-extractor-gitlab
spec:
containers:
- name: dx-extractor
image: ghcr.io/get-dx/extractor:latest
env:
- name: DATACLOUD_URL
valueFrom:
secretKeyRef:
name: gitlab-connector-secrets
key: DATACLOUD_URL
- name: DATACLOUD_KEY
valueFrom:
secretKeyRef:
name: gitlab-connector-secrets
key: DATACLOUD_KEY
- name: EXTRACTION_TYPE
value: "gitlab"
- name: GITLAB_URL
value: "https://gitlab.com/"
- name: GITLAB_API_TOKEN
valueFrom:
secretKeyRef:
name: github-connector-secrets
key: GITLAB_API_TOKEN
- name: LOG_LEVEL
value: "DEBUG"
- name: LOG_FORMAT
value: "json"
restartPolicy: Always
Jira Data Center
Required environment variables
Name | Description |
---|---|
EXTRACTION_TYPE | Must be set to jira_data_center Example: jira_data_center |
DATACLOUD_URL | Your Data Cloud instance URL. Example: https://yourinstance.getdx.net |
DATACLOUD_KEY | Data Cloud API key. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
JIRA_URL | API base URL of your Jira Data Center instance. Example: https://jira.somehost.net/rest/api/2/ |
JIRA_API_TOKEN | Personal Access Token (PAT) for your Jira service account. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
JIRA_USERNAME | Username of your Jira service account (if using basic auth). Example: dxuser |
JIRA_PASSWORD | Password of your Jira service account (if using basic auth). Example: password |
EXTRACTOR_PROXY_URL | Proxy URL for to send api request to datacloud Example: proxy.getdx.net |
EXTRACTOR_PROXY_PORT | Proxy port Example: 80 |
EXTRACTOR_PROXY_USER | Proxy username Example: dxuser |
EXTRACTOR_PROXY_PASS | Proxy password |
Kubernetes deployment YAML template
apiVersion: apps/v1
kind: Deployment
metadata:
name: dx-extractor-jira
spec:
replicas: 1
selector:
matchLabels:
app: dx-extractor-jira
template:
metadata:
labels:
app: dx-extractor-jira
spec:
containers:
- name: dx-extractor
image: ghcr.io/get-dx/extractor:latest
env:
- name: DATACLOUD_URL
valueFrom:
secretKeyRef:
name: dx-secrets
key: DATACLOUD_URL
- name: DATACLOUD_KEY
valueFrom:
secretKeyRef:
name: dx-secrets
key: DATACLOUD_KEY
- name: EXTRACTION_TYPE
value: "jira_data_center"
- name: JIRA_URL
value: "https://jira.somehost.net/rest/api/2/"
- name: JIRA_API_TOKEN
valueFrom:
secretKeyRef:
name: dx-secrets
key: JIRA_API_TOKEN
- name: JIRA_USERNAME # Required for basic auth only
valueFrom:
secretKeyRef:
name: dx-secrets
key: JIRA_USERNAME
- name: JIRA_PASSWORD # Required for basic auth only
valueFrom:
secretKeyRef:
name: dx-secrets
key: JIRA_PASSWORD
- name: LOG_LEVEL
value: "DEBUG"
- name: LOG_FORMAT
value: "json"
restartPolicy: Always
Bitbucket Data Center
Required environment variables
Name | Description |
---|---|
EXTRACTION_TYPE | Must be set to bitbucket_data_center Example: bitbucket_data_center |
DATACLOUD_URL | Your Data Cloud instance URL. Example: https://yourinstance.getdx.net |
DATACLOUD_KEY | Data Cloud API key. Example: mPB5sf6w3JahSLMherWp8B7nTps13FKY |
BITBUCKET_URL | API base URL of your Bitbucket Data Center instance. Example: https://bitbucket.somehost.net |
BITBUCKET_USERNAME | Username of your Bitbucket service account (if using basic auth). Example: dxuser |
BITBUCKET_PASSWORD | Password of your Bitbucket service account (if using basic auth). Example: password |
BITBUCKET_API_KEY | API key of your Bitbucket service account if not using Basic Auth Example: api\_key |
EXTRACTOR_PROXY_URL | Proxy URL for to send api request to datacloud Example: proxy.getdx.net |
EXTRACTOR_PROXY_PORT | Proxy port Example: 80 |
EXTRACTOR_PROXY_USER | Proxy username Example: dxuser |
EXTRACTOR_PROXY_PASS | Proxy password |
BITBUCKET_PROJECT_KEYS_ALLOWLIST | (optional) Comma-delimited list of project keys for DX to import Example: PROJ1,PROJ2 |
Kubernetes deployment YAML template
apiVersion: apps/v1
kind: Deployment
metadata:
name: dx-extractor-bitbucket
spec:
replicas: 1
selector:
matchLabels:
app: dx-extractor-bitbucket
template:
metadata:
labels:
app: dx-extractor-bitbucket
spec:
containers:
- name: dx-extractor
image: ghcr.io/get-dx/extractor:latest
env:
- name: DATACLOUD_URL
valueFrom:
secretKeyRef:
name: dx-secrets
key: DATACLOUD_URL
- name: DATACLOUD_KEY
valueFrom:
secretKeyRef:
name: dx-secrets
key: DATACLOUD_KEY
- name: EXTRACTION_TYPE
value: "bitbucket_data_center"
- name: BITBUCKET_URL
value: "https://bitbucket.somehost.net"
- name: BITBUCKET_API_KEY
valueFrom:
secretKeyRef:
name: dx-secrets
key: BITBUCKET_API_KEY
- name: BITBUCKET_USERNAME # Required for basic auth only
valueFrom:
secretKeyRef:
name: dx-secrets
key: BITBUCKET_USERNAME
- name: BITBUCKET_PASSWORD # Required for basic auth only
valueFrom:
secretKeyRef:
name: dx-secrets
key: BITBUCKET_PASSWORD
- name: LOG_LEVEL
value: "DEBUG"
- name: LOG_FORMAT
value: "json"
restartPolicy: Always