DX Managed for AWS
DX Managed for AWS is a single-tenant deployment of DX in your AWS organization within an isolated sub-account.
Your DX cluster resides in a private VPC that is peered with your corporate networks and VPCs which contain data sources such as GitHub and GitLab. When needed, your DX cluster may also connect to cloud tools that reside outside of your network.
The DX cluster consists of several Kubernetes services that run in AWS EKS. In addition, DX utilizes managed AWS services including RDS, S3, Bedrock, and Aurora. Please see the next section for more details on required AWS services.
Required services
The table below lists the AWS services required for your DX cluster. All AWS services are in compliance with FedRAMP High (GovCloud).
Service | Purpose | Minimum Requirements |
---|---|---|
ELB | Application Load Balancer | 1 instance |
EKS | Managed Kubernetes service | 1 cluster (2+ EC2 instances) |
RDS Aurora PostgreSQL | Application database and data lake | 1 cluster (2 EC2 instances) |
S3 | Application data storage | Usage-based |
Bedrock | DX AI product features | Usage-based |
CloudWatch | Logging and monitoring | Usage-based |
The DX EKS cluster is configured to use two or more M8g or M7g instances for all of our application services. This cluster automatically scales to fit your data connection and user reporting workloads, starting with 4 vCPUs and 32GB of RAM for each Kubernetes node.
Each node also allocates 100GiB of EBS ephemeral storage, and there is 1 additional 50GiB stateful storage volume per-cluster. DX uses the latest generation AWS Graviton4 CPUs depending on availability (the M8 generation is only available with reserved instance pricing in some regions as of May 2025).
Org Size | EKS Instance Type | vCPU | RAM (GiB) |
---|---|---|---|
0–500 | M8g.xlarge | 4 | 32 |
500–2,500 | M8g.2xlarge | 8 | 64 |
5,000+ | M8g.4xlarge | 16 | 128 |
RDS Aurora PostgreSQL is configured to run with 2 R8g or R7g memory-optimized instances in high-availability (HA) mode. The base instance sizing for the Aurora cluster has 8 vCPUs and 64GB of RAM.
RDS Aurora is configured with I/O-Optimized storage. Please see below for recommended database requirements based on organization size. We configure a backup retention period of 14 days by default, which typically amounts to 2-3x of the volume of your database storage.
Org Size | RDS Instance Type | vCPUs | RAM (GiB) | Storage (GB) |
---|---|---|---|---|
0–500 | R8g.2xlarge | 8 | 64 | 200 |
500–2,500 | R8g.4xlarge | 16 | 128 | 500 |
5,000+ | R8g.8xlarge | 32 | 256 | 1,000 |
By default, Amazon Simple Email Service (Amazon SES) is used to send emails securely. As an alternative, you can configure your own email service using SMTP.
Setup and installation
Prerequisites
Before starting the DX installation process you’ll need a MacOS computer with the following tools installed:
- AWS CLI
$ brew install awscli
- Terraform
$ brew tap hashicorp/tap
$ brew install hashicorp/tap/terraform
- Kubectl
$ brew install kubectl
- Helm
$ brew install helm
- Git
$ brew install git
- Jq
$ brew install jq
You will need access to an AWS user account with Administrator capabilities. We recommend creating a new AWS member account (i.e., dedicated AWS sub-account), if possible. You’ll need an ~/.aws/config
file containing the profile you’ll be using for the setup process, for example:
[profile dx-setup]
sso_start_url = https://d-1234567890.awsapps.com/start
sso_region = us-east-1
sso_account_id = 123456789012
sso_role_name = AdministratorAccess
region = us-east-1
output = json
To verify your access, log in to the profile (ex: aws sso login --profile dx-setup
) and then run aws sts get-caller-identity --profile dx-setup
from the admin user’s computer.
Please note: as part of the installation process, you will be asked what AWS account ID you wish to install to. This may be different than the one you are logged in as. You will also create a VPC for your DX cluster to be hosted within. This VPC will need to be able to connect to any data sources that DX data connectors import data from.
Installation steps
Installation is intended to be completed jointly with a DX forward deployed engineer (FDE). Setup can be performed exclusively within your AWS organization or managed using a cross-account AWS role, depending on your security requirements.
1/ System configuration
Clone the Git repository provided to you by DX. The FDE will have this information for you.
Once you have the installation folder, enter the install
folder and run the ./configure.sh
command. This is a guided process meant to verify connectivity to the AWS account and other prerequisites. It will ask for the following information:
- The name of the AWS CLI profile you’ll be using for the setup. (default:
dx-setup
) - The AWS region you wish to install the cluster into.
2/ Infrastructure provisioning
Once your system is configured, follow the steps below to provision infrastructure needed for your DX cluster:
- Initialize the Terraform project with
terraform init
- Preview the changes that will be made with
terraform plan
- Apply the Terraform plan with the
terraform apply
command. This will create all the AWS resources necessary for the DX installation. - Follow any on-screen instructions at the end of the apply process.
- Once successful, verify that the EKS cluster and Aurora RDS instance are healthy in the AWS console.
3/ DX application installation
Run the application installation script from the repository and follow the on-screen instructions:
$ ./install.sh
This will set up the required databases and application configuration and install the applications.
4/ DX application update
Run this command to update your DX application to the latest release:
$ ./update.sh
Your application will be available at an allowlisted load balancer URL, which will be printed on-screen once the update is complete.
5/ Finalize setup
Once initial setup is complete, the next step is to configure DNS and SSL and enable users in your organization to access the DX application. This process might involve direct VPC-to-VPC peering and connections with VPN gateways, or allowlisting internet gateway IP addresses.
Network connectivity will also need to be established with any data sources that your organization has (Github, Gitlab, Jira, etc.) that are in their own VPCs or that require IP allowlisting. In addition, you may want to expose your DX data lake to direct connections.