[ad_1]
On this submit, we present you easy methods to use the brand new views characteristic the AWS Glue Knowledge Catalog. SQL views are a strong object used throughout relational databases. You should utilize views to lower the time to insights of knowledge by tailoring the info that’s queried. Moreover, you should use the ability of SQL in a view to specific advanced boundaries in information throughout a number of tables that may’t be expressed with easier permissions. Knowledge lakes present prospects the flexibleness required to derive helpful insights from information throughout many sources and lots of use circumstances. Knowledge shoppers can eat information the place they should throughout strains of enterprise, growing the speed of insights technology.
Prospects use many alternative processing engines of their information lakes, every of which have their very own model of views with totally different capabilities. The AWS Glue Knowledge Catalog and AWS Lake Formation present a central location to handle your information throughout information lake engines.
AWS Glue has launched a new characteristic, SQL views, which lets you handle a single view object within the Knowledge Catalog that may be queried from SQL engines. You may create a single view object with a distinct SQL model for every engine you wish to question, reminiscent of Amazon Athena, Amazon Redshift, and Spark SQL on Amazon EMR. You may then handle entry to those sources utilizing the identical Lake Formation permissions which can be used to manage tables within the information lake.
Answer overview
For this submit, we use the Girls’s E-Commerce Clothes Assessment. The target is to create views within the Knowledge Catalog so you possibly can create a single frequent view schema and metadata object to make use of throughout engines (on this case, Athena). Doing so helps you to use the identical views throughout your information lakes to suit your use case. We create a view to masks the customer_id column on this dataset, then we are going to share this view to a different person in order that they’ll question this masked view.
Stipulations
Earlier than you possibly can create a view within the AWS Glue Knowledge Catalog, just remember to have an AWS Id and Entry Administration (IAM) position with the next configuration:
- The next belief coverage:
- The next move position coverage:
- Lastly, additionally, you will want the next permissions:
"Glue:GetDatabase",
"Glue:GetDatabases",
"Glue:CreateTable",
"Glue:GetTable",
"Glue:UpdateTable",
"Glue:DeleteTable",
"Glue:GetTables",
"Glue:SearchTables",
"Glue:BatchGetPartition",
"Glue:GetPartitions",
"Glue:GetPartition",
"Glue:GetTableVersion",
"Glue:GetTableVersions"
Run the AWS CloudFormation template
You may deploy the AWS CloudFormation template glueviewsblog.yaml to create the Lake Formation database and desk. The dataset will probably be loaded into an Amazon Easy Storage Service (Amazon S3) bucket.
For step-by-step directions, consult with Making a stack on the AWS CloudFormation console.
When the stack is full, you possibly can see a desk referred to as clothing_parquet
on the Lake Formation console, as proven within the following screenshot.
Create a view on the Athena console
Now that you’ve your Lake Formation managed desk, you possibly can open the Athena console and create a Knowledge Catalog view. Full the next steps:
- Within the Athena question editor, run the next question on the Parquet dataset:
Within the question outcomes, the customer_id
column is at the moment seen.
Subsequent, you create a view referred to as hidden_customerID
and masks the customer_id
column.
- Create a view referred to as
hidden_customerID
:
Within the following screenshot, you possibly can see a view referred to as hidden_customerID
was efficiently created.
- Run the next question to masks the primary 4 characters of the
customer_id
column for the newly generated view:
You may see within the following screenshot that the view hidden_customerID
has the customer_id
column’s first 4 characters masked.
The unique desk clothing_parquet
stays the identical unmasked.
Grant entry of the view to a different person to question
Knowledge Catalog views assist you to use Lake Formation to manage entry. On this step, you grant this view to a different person referred to as amazon_business_analyst
after which question from that person.
- Sign up to the Lake Formation console as admin.
- Within the navigation pane, select Views.
As proven within the following screenshot, you possibly can see the hidden_customerid
view.
- Sign up because the
amazon_business_analyst
person and navigate to the Views web page.
This person has no visibility to the view.
- Grant permission to the
amazon_business_analyst
person from the info lake admin.
- Sign up once more as
amazon_business_analyst
and navigate to the Views web page.
- On the Athena console, question the
hidden_customerid
view.
You’ve got efficiently shared a view to the person and queried it from the Athena console.
Clear up
To keep away from incurring future fees, delete the CloudFormation stack. For directions, consult with Deleting a stack on the AWS CloudFormation console.
Conclusion
On this submit, we demonstrated easy methods to use the AWS Glue Knowledge Catalog to create views. We then confirmed easy methods to alter the views and masks the info. You may share the view with totally different customers to question utilizing Athena. For extra details about this new characteristic, consult with Utilizing AWS Glue Knowledge Catalog views.
In regards to the Authors
Leonardo Gomez is a Principal Analytics Specialist Options Architect at AWS. He has over a decade of expertise in information administration, serving to prospects across the globe deal with their enterprise and technical wants. Join with him on LinkedIn
Michael Chess – is a Product Supervisor on the AWS Lake Formation group based mostly out of Palo Alto, CA. He focuses on permissions and information catalog options within the information lake.
Derek Liu – is a Senior Options Architect based mostly out of Vancouver, BC. He enjoys serving to prospects resolve massive information challenges by AWS analytic companies.
[ad_2]