Skip to content
Home » Use AWS Glue Knowledge Catalog views to investigate information

Use AWS Glue Knowledge Catalog views to investigate information


On this submit, we present you easy methods to use the brand new views characteristic the AWS Glue Knowledge Catalog. SQL views are a strong object used throughout relational databases. You should utilize views to lower the time to insights of knowledge by tailoring the info that’s queried. Moreover, you should use the ability of SQL in a view to specific advanced boundaries in information throughout a number of tables that may’t be expressed with easier permissions. Knowledge lakes present prospects the flexibleness required to derive helpful insights from information throughout many sources and lots of use circumstances. Knowledge shoppers can eat information the place they should throughout strains of enterprise, growing the speed of insights technology.

Prospects use many alternative processing engines of their information lakes, every of which have their very own model of views with totally different capabilities. The AWS Glue Knowledge Catalog and AWS Lake Formation present a central location to handle your information throughout information lake engines.

AWS Glue has launched a new characteristic, SQL views, which lets you handle a single view object within the Knowledge Catalog that may be queried from SQL engines. You may create a single view object with a distinct SQL model for every engine you wish to question, reminiscent of Amazon Athena, Amazon Redshift, and Spark SQL on Amazon EMR. You may then handle entry to those sources utilizing the identical Lake Formation permissions which can be used to manage tables within the information lake.

Answer overview

For this submit, we use the Girls’s E-Commerce Clothes Assessment. The target is to create views within the Knowledge Catalog so you possibly can create a single frequent view schema and metadata object to make use of throughout engines (on this case, Athena). Doing so helps you to use the identical views throughout your information lakes to suit your use case. We create a view to masks the customer_id column on this dataset, then we are going to share this view to a different person in order that they’ll question this masked view.

Stipulations

Earlier than you possibly can create a view within the AWS Glue Knowledge Catalog, just remember to have an AWS Id and Entry Administration (IAM) position with the next configuration:

  • The next belief coverage:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": [
               "glue.amazonaws.com",
               "lakeformation.amazonaws.com"
            ]
          },
          "Motion": "sts:AssumeRole"
        }
      ]
    }

  • The next move position coverage:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Sid": "Stmt1",
          "Action": [
            "iam:PassRole"
          ],
          "Impact": "Permit",
          "Useful resource": "*",
          "Situation": {
             "StringEquals": {
               "iam:PassedToService": [
                 "glue.amazonaws.com",
                 "lakeformation.amazonaws.com"
               ]
             }
           }
         }
       ]
    }

  • Lastly, additionally, you will want the next permissions:
    • "Glue:GetDatabase",
    • "Glue:GetDatabases",
    • "Glue:CreateTable",
    • "Glue:GetTable",
    • "Glue:UpdateTable",
    • "Glue:DeleteTable",
    • "Glue:GetTables",
    • "Glue:SearchTables",
    • "Glue:BatchGetPartition",
    • "Glue:GetPartitions",
    • "Glue:GetPartition",
    • "Glue:GetTableVersion",
    • "Glue:GetTableVersions"

Run the AWS CloudFormation template

You may deploy the AWS CloudFormation template glueviewsblog.yaml to create the Lake Formation database and desk. The dataset will probably be loaded into an Amazon Easy Storage Service (Amazon S3) bucket.

For step-by-step directions, consult with Making a stack on the AWS CloudFormation console.

When the stack is full, you possibly can see a desk referred to as clothing_parquet on the Lake Formation console, as proven within the following screenshot.

Create a view on the Athena console

Now that you’ve your Lake Formation managed desk, you possibly can open the Athena console and create a Knowledge Catalog view. Full the next steps:

  1. Within the Athena question editor, run the next question on the Parquet dataset:
SELECT * FROM "clothing_reviews"."clothing_parquet" restrict 10;

Within the question outcomes, the customer_id column is at the moment seen.

Subsequent, you create a view referred to as hidden_customerID and masks the customer_id column.

  1. Create a view referred to as hidden_customerID:
CREATE PROTECTED MULTI DIALECT VIEW clothing_reviews.hidden_customerid SECURITY DEFINER AS 
SELECT * FROM clothing_reviews.clothing_parquet

Within the following screenshot, you possibly can see a view referred to as hidden_customerID was efficiently created.

  1. Run the next question to masks the primary 4 characters of the customer_id column for the newly generated view:
ALTER VIEW clothing_reviews.hidden_customerid UPDATE DIALECT AS
SELECT '****' || substring(customer_id, 4) as customer_id,clothing_id,age,title,review_text,ranking,recommend_ind,positive_feedback,division_name,department_name,class_name 
FROM clothing_reviews.clothing_parquet

You may see within the following screenshot that the view hidden_customerID has the customer_id column’s first 4 characters masked.

The unique desk clothing_parquet stays the identical unmasked.

Grant entry of the view to a different person to question

Knowledge Catalog views assist you to use Lake Formation to manage entry. On this step, you grant this view to a different person referred to as amazon_business_analyst after which question from that person.

  1. Sign up to the Lake Formation console as admin.
  2. Within the navigation pane, select Views.

As proven within the following screenshot, you possibly can see the hidden_customerid view.

  1. Sign up because the amazon_business_analyst person and navigate to the Views web page.

This person has no visibility to the view.

  1. Grant permission to the amazon_business_analyst person from the info lake admin.
  1. Sign up once more as amazon_business_analyst and navigate to the Views web page.

  1. On the Athena console, question the hidden_customerid view.

You’ve got efficiently shared a view to the person and queried it from the Athena console.

Clear up

To keep away from incurring future fees, delete the CloudFormation stack. For directions, consult with Deleting a stack on the AWS CloudFormation console.

Conclusion

On this submit, we demonstrated easy methods to use the AWS Glue Knowledge Catalog to create views. We then confirmed easy methods to alter the views and masks the info. You may share the view with totally different customers to question utilizing Athena. For extra details about this new characteristic, consult with Utilizing AWS Glue Knowledge Catalog views.


In regards to the Authors

Leonardo Gomez is a Principal Analytics Specialist Options Architect at AWS. He has over a decade of expertise in information administration, serving to prospects across the globe deal with their enterprise and technical wants. Join with him on LinkedIn

Michael Chess – is a Product Supervisor on the AWS Lake Formation group based mostly out of Palo Alto, CA. He focuses on permissions and information catalog options within the information lake.

Derek Liu – is a Senior Options Architect based mostly out of Vancouver, BC. He enjoys serving to prospects resolve massive information challenges by AWS analytic companies.

Leave a Reply

Your email address will not be published. Required fields are marked *