기사

Connect Teradata QueryGrid to Azure HDInsight

Many Teradata customers are interested in integrating Vantage with Microsoft Azure first party services. This guide will help you connect Teradata QueryGrid to Azure HDInsight.

Rupal Shah
Rupal Shah
2021년 11월 17일 11 최소 읽기
Teradata QueryGrid and Azure HDInsight
Many Teradata customers are interested in integrating Teradata Vantage with Microsoft Azure first party services. This Getting Started Guide will help you connect Teradata QueryGrid to Azure HDInsight service.

Although this approach has been implemented and tested internally, it is offered on an as-is basis. Teradata QueryGrid does support Azure HDInsight and it is highly encouraged to work with Teradata to determine the optimal configuration and architecture. This paper approach does not replace official product documentation.

We encourage your feedback. We want to understand what you found useful and how we can improve this guide. Please send your feedback to rupal.shah@teradata.com.

Disclaimer: This guide includes content from both Microsoft and Teradata product documentation.

Overview

This article describes the minimal steps to connect Teradata QueryGrid to a HDInsight (Hadoop) cluster and query it from Teradata Vantage on Azure. It will highlight deployment requirements for Teradata (DIY) on Azure, Teradata Viewpoint, Teradata QueryGrid and Azure HDInsight. We will use Teradata Viewpoint to configure these services to connect seamlessly between platforms. Finally, we will create a foreign server connection and leverage QueryGrid fabric to execute SQL from Vantage to a HDInsight cluster.

This is a diagram of the workflow.
Screen-Shot-2021-11-18-at-10-49-14-AM.png

About Azure HDInsight

Azure HDInsight is a Big Data service from Microsoft that brings 100% Apache Hadoop and other popular Big Data solutions to the cloud. A modern, cloud-based data platform that manages data of any type. Whether your data is structured or unstructured, and of any size, HDInsight makes it possible for you to gain the full value of Big Data.

With HDInsight, you can seamlessly process data of all types through Microsoft’s modern data platform. Our platform provides simplicity, ease of management, and an open Enterprise-ready Big Data solution. HDInsight provides a platform for all of your Big Data needs including Batch, Interactive, No SQL and Streaming. It also comes with a strong eco-system of tools and developer environment.

Supported cluster types include: Hadoop (Hive), HBase, Storm, Spark, Kafka, Interactive Hive (LLAP), and ML Services.

For more information see documentation.

About Teradata Vantage

Vantage is the platform for Pervasive Data Intelligence; delivering real-time intelligent answers to users and systems across all parts of an organization. It leverages 100 percent of a business’s data, regardless of scale, volume, or complexity. Running Teradata Database software on Azure is similar to running a Teradata Database in your on-site data center.
Vantage combines descriptive, predictive, prescriptive analytics, autonomous decision-making, ML functions, and visualization tools into a unified, integrated platform that uncovers real-time business intelligence at scale, no matter where the data resides.
Vantage enables companies to start small and elastically scale compute or storage, paying only for what they use, harnessing low-cost object stores and integrating their analytic workloads.
Vantage supports R, Python, Teradata Studio, and any other SQL-based tools. You can deploy Vantage across public clouds, on-premises, on optimized or commodity infrastructure, or as-a-service.
For more information see documentation.

About Teradata QueryGrid

Teradata QueryGrid 2.x is a data analytics fabric that provides seamless, high-performing data access, processing, and movement across one or more data sources. Teradata QueryGrid supports the following connectors:
  • Teradata Database
  • Hive
  • Spark SQL
  • Oracle (only as a target connector)
  • Starburst Enterprise Presto
For more information see documentation.

Prerequisites

You are expected to be familiar with Teradata Vantage, Teradata QueryGrid and Azure HDInsight
You will need the following accounts, and systems:
  • Teradata Vantage (DIY) on Azure instance (version 16.20) or later.

Procedure

Once you have met the prerequisites, follow these steps:
  1. Requirements
  2. Deploy Teradata (DIY) on Azure with Teradata Viewpoint and Teradata QueryGrid Manager
  3. Create subnet for Azure HDInsight cluster
  4. Create Azure HDInsight cluster
  5. Create public IP address and inbound port
  6. Configure Teradata QueryGrid Manager
  7. Configure Teradata Viewpoint
  8. Configure QueryGrid in the Viewpoint QueryGrid Portlet
  9. Configuring and using QueryGrid connectors
  10. Load Data to HDInsight
  11. Query HDInsight data from Vantage
  12. Cleanup
Note: Capture all usernames, passwords, private and public IP addresses for later use.

Step 1: Requirements
Before deploying a Teradata ecosystem, your Azure subscription must have sufficient level of permissions assigned with Contributor role at the subscription level and sufficient quota limits.
  1. Logon to Azure portal, click on Subscription icon for quick check. Otherwise, on the left pane click on Cost Management + Billing > Cost Management > Azure subscriptions
Screen-Shot-2021-11-18-at-10-52-05-AM.pngFor more information, see the Azure Documentation Center and search for Contributor role azure subscription.
  1. Next, check if your subscription has sufficient quota limits (recommended: 128 cores). On your subscription page, Click Usage + quotas, filter on Select a Provider (e.g., Microsoft Compute) and All locations (e.g., deployment region). If nothing shows up, change Show only items with usage to Show All.
Screen-Shot-2021-11-18-at-10-52-53-AM.png
  1. Click Request Increase if quota limit is not sufficient, click Quota type and choose Compute-VM (cores-vCPUs) subscription limit increases and follow instructions to submit request.
For more information see this link for instructions on requesting a core quota increase and on Azure subscription and service limits, quotas, and constraints, read this

Step 2: Deploy Teradata on Azure (DIY)

A solution template allows you to deploy multiple Teradata products simultaneously. At deployment, you can create a new resource group or use an existing, empty resource group that contains the VNet, VMs, storage accounts, network security groups, and so on.
  1. In the portal, click Screen-Shot-2021-11-18-at-10-54-15-AM.png Create a resource sign located in the upper left and search for Teradata. A list of Teradata products appears under the Results pane.
  2. Select the Teradata Vantage (DIY) solution template and click Create
  3. Configure Basic blade settings – Enter new Resource group click Create new, choose Region, enter Password for VM and click Next
  4. Configure Database blade settings – Enter DBC Password, choose Database Tier: Enterprise and Number of nodes: 2 and click Next
  5. Configure Viewpoint blade settings – Choose Yes and enter Password and Viewpoint Image version: Multiple Systems and click Next through to QueryGrid Manager blade
  6. Configure QueryGrid Manager blade settings – Choose Yes and click Next  through to General Settings (blade11)
  7. Configure General Settings – Choose defaults and click Next
  8. In Review + create blade, click Create after validation.
Deployment process can take anywhere between 40 to 60 minutes depending on your configuration and the availability of resources in your region.

Note: Unlike Teradata Vantage (DIY) Developer tier, Enterprise tier software is not free and Azure infrastructure for VM, Virtual Network and Storage cost exists. 

For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.

Step 3: Create Subnet for Azure HDInsight Cluster

Before we deploy HDInsight, we need to create a subnet for HDInsight cluster in the Teradata ecosystem virtual network.
  1. In the portal, open the Teradata resource group or click on Go to resource group
  2. Click on vnet-teradataSubnets and click + Subnet and add subnet name (e.g., HDISubnet) and leave options at their default values and click OK
Screen-Shot-2021-11-18-at-10-55-41-AM.png
3. Refresh Subnets to view newly added HDISubnet

Step 4: Create Azure HDInsight cluster

Next, create HDInsight (Hadoop) cluster with 2 headnodes and 2 workernodes.
  1. In the portal, click Screen-Shot-2021-11-18-at-10-54-15-AM-(1).png Create a resource sign and search for HDInsight
  2. Select the Azure HDInsight solution template and click Create
  3. Configure Basics blade settings – Enter new Resource group click Create new, enter Cluster name, choose Region (same as Teradata deployment), select Cluster type and version (e.g., Hadoop 2.7.3/HDI 3.6) and enter Cluster login password and click Next: Storage
Screen-Shot-2021-11-18-at-10-55-41-AM-(1).pngNote: Check Teradata® QueryGrid™ Components and Connectors Compatibility Matrix for HDInsight version support details.

4.Configure Storage blade settings – Choose Primary storage account select (New) <HDI Cluster name> and leave other properties default (blank) values and click Next: Security + networking
Screen-Shot-2021-11-18-at-10-57-50-AM.png
5. Configure Security + networking blade settings – Choose Virtual network vnet-teradata and the Subnet you created in step 3 (e.g., HDISubnet) and click Next: Configuration + pricing
Screen-Shot-2021-11-18-at-10-58-25-AM.png
6Configure Configuration + pricing blade settings – Change Worker node Number of nodes from 4 to 2 for this article and click Review + create

Screen-Shot-2021-11-18-at-10-59-04-AM.png
7. Click Create once validation passes. Expect deployment to take 15-20 minutes.

Step 5: Create public IP address and inbound port

For outside clients to gain access to Teradata Database and VMs you need to create a public IP address to a NIC interface and create an Inbound security rule for port 1025. By default, Teradata on Azure and Teradata products (Teradata Viewpoint and Teradata QueryGrid Manager) does not deploy with a public IP address or inbound port to access the database.

Create Public IP addresses
  1. In the portal, open the Teradata on Azure resource group and click on a NIC interface: database-nic00
  2. Click on IP configurations and click on an ip Name
    1. Set Public IP address to Associate
    2. Under the Choose public IP address dropdown, click Create new
    3. In the Add a public IP address prompt enter Name and change Assignment to Static
    4. Click OK and Save
Screen-Shot-2021-11-18-at-11-00-26-AM.png
Screen-Shot-2021-11-18-at-11-02-24-AM.png
4. Refresh Inbound security rules to view new port entry

Step 6: Configure Teradata QueryGrid Manager

After deploying a QueryGrid Manager VM, you need to logon to the VM to add a user and set a password for the Viewpoint account to access the QueryGrid Manager VM. You will need the QueryGrid Manager VM username and password created during deployment.
  1. Log on to the QueryGrid Manager VM using PuTTY and public IP address created in the previous step.
  2. Switch to the root user environment.
> sudo su -
  1. Add user ‘tdqgm’ to the QueryGrid Manager user group to prevent permission errors.
# sudo usermod -A tdqgm azureuser
  1. Set the default viewpoint password
# /opt/teradata/tdqgm/bin/reset-password.sh
For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.

Step 7: Configure Teradata Viewpoint

Before we can configure QueryGrid, we first need to add Teradata and QueryGrid Manager systems to the Viewpoint Monitored Systems portlet.
  1. From your browser, open the Viewpoint portal logon screen using public IP address created in step 4 for Viewpoint VM.
Note: Ignore Your connection is not private message and click on Advance and click Proceed to link
  1. Log in to the Viewpoint portal as admin and password created during VM deployment.
  2. Click admin portlet (gear icon) and click on Monitored Systems portlet.
Screen-Shot-2021-11-18-at-11-03-53-AM.png
For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.

       e. Add QueryGrid Manager in the Monitored Systems portlet:First, install a QueryGrid Root Certificate
  1. Click Select Portlet dropdown
  2. Click the Certificates.
  3. From the Setup list, click Certificate Authority.
  4. Click Install Certificate.
  5. Type an alias (e.g., cert) for the certificate, up to 30 characters.
  6. Select the A trusted SSL-enabled service option
  7. Type the private-IP address for hostname of a Teradata QueryGrid Manager instance (e.g.10.0.0.9)
  8. Type 9443 as the port number.
  9. Click Install
Screen-Shot-2021-11-18-at-11-17-48-AM.png
Next, add QueryGrid Manager in the Monitored Systems portlet
  1. Click Select Portlet dropdown
  2. Click Monitored Systems.
  3. Click + next to Systems and select Add QueryGrid.
  4. Under General System Details, enter a system nickname (e.g., QGM), up to 8 characters.
  5. Select the Enable system check box.
  6. Enter the private IP address for Host ID of the Teradata QueryGrid Manager.
  7. Under Login, enter the credentials viewpoint and the reset password performed in step 6 to enable Viewpoint to access QueryGrid Manager.
  8. Click Apply and Close Admin portlet
Screen-Shot-2021-11-18-at-11-18-19-AM.pngFor more information see Teradata® QueryGrid™ Installation and User Guide.

Step 8: Configure QueryGrid in the Viewpoint QueryGrid Portlet

In this section, we will add the QueryGrid portlet to Viewpoint and configure QueryGrid to access the HDInsight cluster. This will include adding a Data Center, Data Source Systems for Teradata and HDInsight cluster and add appropriate Teradata and HDInsight nodes and QueryGrid software to each Data Source System. Next, define a Fabric and add Connectors for Teradata and HDInsight. Finally, create a Link to define Teradata (as the initiating source) and HDInsight (as the target).  

Note: This article will use the QueryGrid software version deployed in the solution template. We will not cover uploading the latest version.
  1. Add QueryGrid portlet to the Viewpoint Portal
  1. In the portal, click Add Content.
  2. Locate and click the QueryGrid portlet.
  3. Click Add.
Screen-Shot-2021-11-18-at-11-19-14-AM.pngb. Review Data Center
  1. Under Fabric Components, select Data Centers.
  2. (Optional) Click Edit on default Data Center created during QueryGrid Manager deployment to rename and click Save
Screen-Shot-2021-11-18-at-11-20-51-AM.pngc. Adding a Data Source System
  1. Under Fabric Components, select Systems.
  2. Click + next to Systems.
  3. At Add System, enter system name (e.g.TD) for Teradata Data Source System
  4. Select Data center
  5. Select Node software version deployed from solution template
  6. Resource Allocation, set Max memory per node to 1GB
  7. Click Save
  8. Repeat to add HDInsight Data Source System (e.g., HDI) with same properties
d. Adding Nodes to a System

You will need to add the private IP addresses of your Nodes to the respective Data Source Systems created in the previous step. Teradata nodes to the TD Data Source System and HDInsight (Hadoop) nodes (head and worker nodes) to the HDI Data Source System. This will enable the tdqg-node package to auto install across all nodes.
  1. Under Fabric Components, select Systems.
  2. Select the Data Source System you want to add nodes to. (e.g., TD)
  3. On the Nodes tab, click + next to Nodes.
Screen-Shot-2021-11-18-at-11-21-49-AM.png
Screen-Shot-2021-11-18-at-11-23-46-AM.pngScreen-Shot-2021-11-18-at-11-24-20-AM.png
Screen-Shot-2021-11-18-at-1-10-07-PM.pngScreen-Shot-2021-11-18-at-1-12-14-PM.pngScreen-Shot-2021-11-18-at-1-12-45-PM.pngScreen-Shot-2021-11-18-at-1-13-25-PM.pngScreen-Shot-2021-11-18-at-1-13-54-PM.pngFor more information see Teradata® QueryGrid™ Installation and User Guide.

Step 9: Configuring and Using Connectors

For this article and simple demonstration, use any Teradata client tool to create an authorization object, grant privileges and create a foreign server connection to our HDInsight cluster.
  1. Log on as an Administrator, such as dbc, to the initiating Teradata Database system, and create an authorization object for the target server, for example:

CREATE AUTHORIZATION td_server_db.hiveservice AS DEFINER TRUSTED USER 'hive' PASSWORD 'hive' ;
An authorization object is created in the td_server_db database. Using the DEFINER clause make the authorization available globally to all users.
  1. Grant the CREATE SERVER and EXECUTE FUNCTION privileges on the td_server_db database to the Administrator user, for example:

GRANT CREATE SERVER ON td_server_db TO dbc;
GRANT EXECUTE FUNCTION ON TD_SYSFNLIB TO dbc;
  1. Create the foreign server connection to HDInsight (Hadoop) cluster

CREATE FOREIGN SERVER target_server_name
EXTERNAL SECURITY DEFINER TRUSTED target_server_auth
USING
LINK('linkname')
VERSION ('version')
DO IMPORT WITH TD_SYSFNLIB.QGInitiatorImport,
DO EXPORT WITH TD_SYSFNLIB.QGInitiatorExport;

For example, where Teradata Database is the initiating system and HDInsight cluster is the target/remote system using the QueryGrid Link definition called TD2HDI.

CREATE FOREIGN SERVER remotehdi
EXTERNAL SECURITY DEFINER TRUSTED hiveservice
USING
LINK('TD2HDI')
VERSION('active')
DO IMPORT WITH TD_SYSFNLIB.QGInitiatorImport,
DO EXPORT WITH TD_SYSFNLIB.QGInitiatorExport;

For more information see Teradata® QueryGrid™ Installation and User Guide.

Step 10: Load Data to HDInsight

Before we can query our HDInsight (Hadoop) cluster, we need to create a table and some data.
  1. In the portal, open HDInsight resource group and click on publicIpheadnode- for the public IP address for your HDI headnode0 VM.
  2. Log on as sshuser and password using Putty
  3. Change to root user: sudo su –
  4. At the prompt use Beeline (Hive server2 command line interface), type:
beeline -u "jdbc:hive2://<headnode0 private IP address>:10001/default;transportMode=http"
  1. create table t_test(id int);
  2. insert into t_test values(100);
  3. insert into t_test values(101);

Step 11: Query HDInsight Data from Vantage

Finally, use any Teradata client tool to query HDInsight data.
  1. Log on as dbc to the initiating Teradata Database system using the public IP address
  2. Insert and Select data from HDInsight
    1. insert into t_test@remotehdi values(1);
    2. select * from t_test@remotehdi;
Screen-Shot-2021-11-18-at-1-15-05-PM.png
3. Join data from HDInsight and Vantage
  1. create table td_server_db.localtbl (c1 int)
  2. insert into td_server_db.localtbl values (200)
  3. insert into td_server_db.localtbl values (201)
  4. insert into td_server_db.localtbl values (2)
  5. select * from t_test@remotehdi union all select * from td_server_db.localtbl
Screen-Shot-2021-11-18-at-1-16-02-PM.png
4. Create a View
  1. create view td_server_db.t_test_v as select * from t_test@remotehdi;
  2. select * from td_server_db.t_test_v;
Users do not even know they are accessing a foreign server

5. Show Foreign Server - Allows you to see a server object definition that contains the name value pairs that the associated table operators use to connect to the foreign serverSHOW FOREIGN SERVER remotehdi

6. Drop Foreign Server - In addition to deleting the server object and its associated information from the dictionary tables, all dependent entries on the associated table operators are deleted.DROP FOREIGN SERVER TD_SERVER_DB.remotehdi;

Step 12: Cleanup

To avoid incurring charges to your Azure subscription for the resources used, follow these steps.
Delete Resource Group
  1. Logon in Azure portal, click on Resource groups in left pane.
  2. Click on Resource group and click on Delete Resource Group in menu
  3. In Delete Resource group pane type in resource group name to verify deletion.
  4. Click Delete
  5. Repeat to delete other Resource Groups
Tags

약 Rupal Shah

Rupal Shah is a member of Teradata Partners Technical Consultant team. Prior to consulting on the Microsoft partnership, he was a technical consultant for the IBM Cognos and Oracle Hyperion partnerships. Along with his extensive experience working with business intelligence and ‘in-database’ solutions, Rupal has worked with various Teradata application organizations for whom he provided database consulting. He received his B.A. in Math and Computer Science from the University of California at San Diego, and he is currently based in San Diego. 모든 게시물 보기Rupal Shah

알고 있어

테라데이트의 블로그를 구독하여 주간 통찰력을 얻을 수 있습니다



I consent that Teradata Corporation, as provider of this website, may occasionally send me Teradata Marketing Communications emails with information regarding products, data analytics, and event and webinar invitations. I understand that I may unsubscribe at any time by following the unsubscribe link at the bottom of any email I receive.

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Policy.