Question AWS Glue Knowledge Catalog views utilizing Amazon Athena and Amazon Redshift

[ad_1]

In the present day’s information lakes are increasing throughout traces of enterprise working in numerous landscapes and utilizing varied engines to course of and analyze information. Historically, SQL views have been used to outline and share filtered information units that meet the necessities of those traces of enterprise for simpler consumption. Nevertheless, with clients utilizing completely different processing engines of their information lakes, every with its personal model of views, they’re creating separate views per engine, including to upkeep overhead. Moreover, accessing these engine-defined views requires clients to have elevated entry ranges, granting them entry to each the SQL view itself and the underlying databases and tables referenced within the view’s SQL definition. This method impedes granting constant entry to a subset of knowledge utilizing SQL views, hampering productiveness and rising administration overhead.

Glue Knowledge Catalog views is a brand new characteristic of the AWS Glue Knowledge Catalog that clients can use to create a standard view schema and single metadata container that may maintain view-definitions in numerous dialects that can be utilized throughout engines similar to Amazon Redshift and Amazon Athena. By defining a single view object that may be queried from a number of engines, Knowledge Catalog views allow clients to handle permissions on a single view schema constantly utilizing AWS Lake Formation. A view could be shared throughout completely different AWS accounts as effectively. For querying these views, customers want entry to the view object solely and don’t want entry to the referenced databases and tables within the view definition. Additional, all requests towards the Knowledge Catalog views, similar to requests for entry credentials on underlying sources, might be logged as AWS CloudTrail administration occasions for auditing functions.

On this weblog put up, we’ll present how one can outline and question a Knowledge Catalog view on prime of open supply desk codecs similar to Iceberg throughout Athena and Amazon Redshift. We can even present you the configurations wanted to limit entry to the underlying database and tables. To observe alongside, we now have supplied an AWS CloudFormation template.

Use case

An Instance Corp has two enterprise models: Gross sales and Advertising. The Gross sales enterprise unit owns buyer datasets, together with buyer particulars and buyer addresses. The Advertising enterprise unit needs to conduct a focused advertising and marketing marketing campaign primarily based on a most well-liked buyer listing and has requested information from the Gross sales enterprise unit. The Gross sales enterprise unit’s information steward (AWS Identification and Entry Administration (IAM) position: product_owner_role), who owns the client and buyer tackle datasets, plans to create and share non-sensitive particulars of most well-liked clients with the Advertising unit’s information analyst (business_analyst_role) for his or her marketing campaign use case. The Advertising staff analyst plans to make use of Athena for interactive evaluation for the advertising and marketing marketing campaign and later, use Amazon Redshift to generate the marketing campaign report.

On this answer, we show how you need to use Knowledge Catalog views to share a subset of buyer particulars saved in Iceberg format filtered by the most well-liked flag. This view could be seamlessly queried utilizing Athena and Amazon Redshift Spectrum, with information entry centrally managed by AWS Lake Formation.

Conditions

For the answer on this weblog put up, you want the next:

  • An AWS account. For those who don’t have an account, you may create one.
  • You’ve gotten created a knowledge lake administrator Pay attention to this position’s Amazon Useful resource Identify (ARN) to make use of later. For simplicity’s sake, this put up will use IAM Admin position because the Datalake Admin and Redshift Admin however ensure that in your surroundings you observe the precept of least privilege.
  • Below Knowledge Catalog settings, have the default settings in place. Each of the next choices must be chosen:
    • Use solely IAM entry management for brand new databases
    • Use solely IAM entry management for brand new tables in new databases

Get began

To observe the steps on this put up, check in to the AWS Administration Console because the IAM Admin and deploy the next CloudFormation stack to create the mandatory sources:

  1. Select to deploy the CloudFormation template.
    Launch Cloudformation Stack
  2. Present an IAM position that you’ve already configured as a Lake Formation administrator.
  3. Full the steps to deploy the template. Depart all settings as default.
  4. Choose I acknowledge that AWS CloudFormation would possibly create IAM sources, then select Submit.

The CloudFormation stack creates the next sources. Make an observation of those values—you’ll use them later.

  • Amazon Easy Storage Service (Amazon S3) buckets that retailer the desk information and Athena question end result
  • IAM roles: product_owner_role and business_analyst_role
  • Digital non-public cloud (VPC) with the required community configuration, which might be used for compute
  • AWS Glue database: customerdb, which incorporates the buyer and customer_address tables in Iceberg format
  • Glue database: customerviewdb, which can include the Knowledge Catalog views
  • Redshift Serverless cluster

The CloudFormation stack additionally registers the information lake bucket with Lake Formation in Lake Formation entry mode. You possibly can confirm this by navigating to the Lake Formation console and choosing Knowledge lake places below Administration.

Resolution overview

The next determine exhibits the structure of the answer.

Question AWS Glue Knowledge Catalog views utilizing Amazon Athena and Amazon Redshift

As a requirement to create a Knowledge Catalog view, the information lake S3 places for the tables (buyer and customer_address) must be registered with Lake Formation and granted full permission to product_owner_role.

The Gross sales product proprietor: product_owner_role can also be granted permission to create views below customerviewdb utilizing Lake Formation.

After the Glue Knowledge Catalog View (customer_view) is created on the client dataset with the required subset of buyer data, the view is shared with the Advertising analyst (business_analyst_role), who can then question the popular buyer’s non delicate data as outlined by the view with out getting access to underlying buyer tables.

  1. Allow Lake Formation permission mode on the customerdbdatabase and its tables.
  2. Grant the database (customerdb) and tables (buyer and customer_address) full permission to product_owner_role utilizing Lake Formation.
  3. Allow Lake Formation permission mode on the database (customerviewdb) the place the a number of dialect Knowledge Catalog view might be created.
  4. Grant full database permission to product_owner_role utilizing Lake Formation.
  5. Create Knowledge Catalog views as product_owner_role utilizing Athena and Amazon Redshift so as to add engine dialects.
  6. Share the database and Knowledge Catalog views learn permission to business_analyst_role utilizing Lake Formation.
  7. Question the Knowledge Catalog view utilizing business_analyst_role from Athena and Amazon Redshift engine.

With the stipulations in place and an understanding of the general answer, you’re able to arrange the answer.

Arrange Lake Formation permissions for product_owner_role

Check in to the LakeFormation console as a knowledge lake administrator. For the examples on this put up, we use the IAM Admin position, Admin as the information lake admin.

Allow Lake Formation permission mode on customerdb and its tables

  1. Within the Lake Formation console, below Knowledge Catalog within the navigation pane, select Databases.
  2. Select customerdb and select Edit.
  3. Below Default permissions for newly created tables, clear Use solely IAM entry management for brand new tables on this database.
  4. Select Save.
  5. Below Knowledge Catalog within the navigation pane, select Databases.
  6. Choose customerdb and below Motion, choose View
  7. Choose the IAMAllowedPrincipal from the listing and select Revoke.
  8. Repeat the identical for all tables below the database customerdb.

Grant the product_owner_role entry to customerdb and its tables

Grant product_owner_role all permissions to the customerdb database.

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permissions.
  2. Select Grant.
  3. Below Principals, choose IAM customers and roles.
  4. Choose product_owner_role.
  5. Below LF-Tags or catalog sources, choose Named Knowledge Catalog sourcesand choose customerdb for Databases.
  6. Choose SUPER for Database permissions.
  7. Select Grant to use the permissions.

Grant product_owner_role all permissions to the buyer and customer_address tables.

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permission
  2. Select Grant.
  3. Below Principals, choose IAM customers and roles.
  4. Select the product_owner_role.
  5. Below LF-Tags or catalog sources, select Named Knowledge Catalog sourcesand choose customerdb for databases and buyer and customer_address for tables.
  6. Select SUPER for Desk permissions.
  7. Select Grant to use the permissions.

Allow Lake Formation permission mode

Allow Lake Formation permission mode on the database the place the Knowledge Catalog view might be created.

  1. Within the Lake Formation console, below Knowledge Catalog within the navigation pane, select Databases.
  2. Choose customerviewdb and select Edit.
  3. Below Default permissions for newly created tables, clear Use solely IAM entry management for brand new tables on this database.
  4. Select Save.
  5. Select Databases from Knowledge Catalog within the navigation pane.
  6. Choose customerviewdb and below Motion choose View.
  7. Choose the IAMAllowedPrincipal from the listing and select Revoke.

Grant the product_owner_role entry to customerviewdb utilizing Lake Formation mode

Grant product_owner_role all permissions to the customerviewdb database.

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permissions.
  2. Select Grant
  3. Below Principals, choose IAM customers and roles.
  4. Select product_owner_role
  5. Below LF-Tags or catalog sources, select Named Knowledge Catalog sourcesand choose customerviewdb for Databases.
  6. Choose SUPER for Database permissions.
  7. Select Grant to use the permissions.

Create Glue Knowledge Catalog views as product_owner_role

Now that you’ve Lake Formation permissions set on the databases and tables, you’ll use the product_owner_role to create Knowledge Catalog views utilizing Athena and Amazon Redshift. This can even add the engine dialects for Athena and Amazon Redshift.

Add the Athena dialect

  1. Within the AWS console, both check in utilizing product_owner_role or, if you happen to’re already signed in as an Admin, swap to product_owner_role.
  2. Launch question editor and choose the workgroup athena_glueview from the higher proper aspect of the console. You’ll create a view that mixes information from the buyer and customer_address tables, particularly for purchasers who’re marked as most well-liked. The tables embrace private details about the client, similar to their identify, date of beginning, nation of beginning, and e mail tackle.
  3. Run the next within the question editor to create the customer_view view below the customerviewdb database.
    create protected multi dialect view customerviewdb.customer_view
    safety definer
    as
    choose c_customer_id, c_first_name, c_last_name, c_birth_day, c_birth_month,
    c_birth_year, c_birth_country, c_email_address,
    ca_country,ca_zip
    from customerdb.buyer, customerdb.customer_address
    the place c_current_addr_sk = ca_address_sk and c_preferred_cust_flag='Y';

  4. Run the next question to preview the view you simply created.
    choose * from customerviewdb.customer_view restrict 10;

  5. Run following question to seek out the highest three beginning years with the very best buyer counts from the customer_view view and show the beginning yr and corresponding buyer depend for every.
    choose c_birth_year,
    	depend(*) as depend
    from "customerviewdb"."customer_view"
    group by c_birth_year
    order by depend desc
    restrict 3

Output:

  1. To validate that the view is created, go to the navigation pane and select Views below Knowledge catalog on the Lake Formation console
  2. Choose customer_view and go to the SQL definition part to validate the Athena engine dialect.

Whenever you created the view in Athena, it added the dialect for Athena engine. Subsequent, to assist the use case described earlier, the advertising and marketing marketing campaign report must be generated utilizing Amazon Redshift. For this, you have to add the Redshift dialect to the view so you may question it utilizing Amazon Redshift as an engine.

Add the Amazon Redshift dialect

  1. Check in to the AWS console as an Admin, navigate to Amazon Redshift console and check in to Redshift Qurey editor v2.
  2. Connect with the Serverless cluster as Admin (federated consumer) and run the next statements to grant permission on the Glue automount database (awsdatacatalog) entry to product_owner_role and business_analyst_role.
    create consumer  "IAMR:product_owner_role" password disable;
    create consumer  "IAMR:business_analyst_role" password disable;
    
    grant utilization on database awsdatacatalog to "IAMR:product_owner_role";
    grant utilization on database awsdatacatalog to "IAMR:business_analyst_role";

  3. Check in to the Amazon Redshift console as product_owner_role and check in to the QEv2 editor utilizing product_owner_role (as a federated consumer). You’ll use the next ALTER VIEW question so as to add the Amazon Redshift engine dialect to the view created beforehand utilizing Athena.
  4. Run the next within the question editor:
    alter exterior view awsdatacatalog.customerviewdb.customer_view AS
    choose c_customer_id, c_first_name, c_last_name, c_birth_day, c_birth_month,
    c_birth_year, c_birth_country, c_email_address,
    ca_country, ca_zip
    from awsdatacatalog.customerdb.buyer, awsdatacatalog.customerdb.customer_address
    the place c_current_addr_sk = ca_address_sk and c_preferred_cust_flag='Y'

  5. Run following question to preview the view.
    choose * from awsdatacatalog.customerviewdb.customer_view restrict 10;

  6. Run the identical question that you just ran in Athena to seek out the highest three beginning years with the very best buyer counts from the customer_view view and show the beginning yr and corresponding buyer depend for every.
    choose c_birth_year,
    	depend(*) as depend
    from awsdatacatalog.customerviewdb.customer_view
    group by c_birth_year
    order by depend desc
    restrict 3

By querying the identical view and operating the identical question in Redshift, you obtained the identical end result set as you noticed in Athena.

Validate the dialects added

Now that you’ve added all of the dialects, navigate to the Lake Formation console to see how the dialects are saved.

  1. On the Lake Formation console, below Knowledge catalog within the navigation pane, select Views.
  2. Choose customer_view and go to SQL definitions part to validate that the Athena and Amazon Redshift dialects have been added.

Alternatively, you can too create the view utilizing Redshift so as to add Redshift dialect and replace in Athena so as to add the Athena dialect.

Subsequent, you will note how the business_analyst_role can question the view with out getting access to question the underlying tables and the Amazon S3 location the place the information exists.

Arrange Lake Formation permissions for business_analyst_role

Check in to the Lake Formation console because the DataLake administrator (For this weblog, we use the IAM Admin position, Admin, because the Datalake admin).

Grant business_analyst_role entry to the database and view utilizing Lake Formation

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permissions.
  2. Select Grant
  3. Below Principals, choose IAM customers and roles.
  4. Choose business_analyst_role.
  5. Below LF-Tags or catalog sources, choose Named Knowledge Catalog sources and choose customerviewdb for Databases.
  6. Choose DESCRIBE for Database permissions.
  7. Select Grant to use the permissions.

Grant the business_analyst_role SELECT and DESCRIBE permissions to customer_view

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permission.
  2. Select Grant.
  3. Below Principals, choose IAM customers and roles.
  4. Choose  business_analyst_role.
  5. Below LF-Tags or catalog sources, select Named Knowledge Catalog sources and choose customerviewdb for Databases and customer_view for Views.
  6. Select SELECT and DESCRIBE for View permissions.
  7. Select Grant to use the permissions.

Question the Knowledge Catalog views utilizing business_analyst_role

Now that you’ve arrange the answer, check it by querying the information utilizing Athena and Amazon Redshift.

Utilizing Athena

  1. Check in to the Athena console as business_analyst_role.
  2. Launch question editor and choose the workgroup athena_glueview. Choose database customerviewdb from the dropdown on the left and you must be capable of see the view created beforehand utilizing product_owner_role. Additionally, discover that no tables are proven as a result of business_analyst_role doesn’t have entry granted for the bottom tables.
  3. Run the next within the question editor to question the view question.
    choose * from customerviewdb.customer_view restrict 10

As you may see within the previous determine, business_analyst_role can question the view with out getting access to the underlying tables.

  1. Subsequent, question the desk buyer on which the view is created. It ought to give an error.
    SELECT * FROM customerdb.buyer restrict 10

Utilizing Amazon Redshift

  1. Navigate to the Amazon Redshift console and check in to Amazon Redshift question editor v2. Connect with the Serverless cluster as business_analyst_role (federated consumer) and run the next within the question editor to question the view.
  2. Choose the customerviewdb on the left aspect of the console. It’s best to see the view customer_view. Additionally, observe that you just can’t see the tables from which the view is created. Run the next within the question editor to question the view.
    SELECT * FROM "awsdatacatalog"."customerviewdb"."customer_view";

The enterprise analyst consumer can run the evaluation on the Knowledge Catalog view while not having entry to the underlying databases and tables on from which the view is created.

Glue Knowledge Catalog views provide options for varied information entry and governance situations. Organizations can use this characteristic to outline granular entry controls on delicate information—similar to personally identifiable data (PII) or monetary information—to assist them adjust to information privateness rules. Moreover, you need to use Knowledge Catalog views to implement row-level, column-level, and even cell-level filtering primarily based on the particular privileges assigned to completely different consumer roles or personas, permitting for fine-grained information entry management. Moreover, Knowledge Catalog views can be utilized in information mesh patterns, enabling safe, domain-specific information sharing throughout the group for self-service analytics, whereas permitting customers to make use of most well-liked analytics engines like Athena or Amazon Redshift on the identical views for governance and constant information entry.

Clear up

To keep away from incurring future costs, delete the CloudFormation stack. For directions, see Deleting a stack on the AWS CloudFormation console. Be sure that the next sources created for this weblog put up are eliminated:

  • S3 buckets
  • IAM roles
  • VPC with community elements
  • Knowledge Catalog database, tables and views
  • Amazon Redshift Serverless cluster
  • Athena workgroup

Conclusion

On this put up, we demonstrated the right way to use AWS Glue Knowledge Catalog views throughout a number of engines similar to Athena and Redshift. You possibly can share Knowledge Catalog views in order that completely different personas can question them. For extra details about this new characteristic, see Utilizing AWS Glue Knowledge Catalog views.


Concerning the Authors

Pathik Shah is a Sr. Analytics Architect on Amazon Athena. He joined AWS in 2015 and has been focusing within the massive information analytics house since then, serving to clients construct scalable and strong options utilizing AWS analytics companies.

Srividya Parthasarathy is a Senior Huge Knowledge Architect on the AWS Lake Formation staff. She enjoys constructing information mesh options and sharing them with the neighborhood.

Paul Villena is a Senior Analytics Options Architect in AWS with experience in constructing trendy information and analytics options to drive enterprise worth. He works with clients to assist them harness the facility of the cloud. His areas of pursuits are infrastructure as code, serverless applied sciences, and coding in Python.

Derek Liu is a Senior Options Architect primarily based out of Vancouver, BC. He enjoys serving to clients remedy massive information challenges by AWS analytic companies.

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *