Deltalake with Amazon EMR

This guide helps you quickly explore the main features of Delta Lake. It provides code snippets that show how to read from and write to Delta tables with Amazon EMR.
For more details, check this video, "Incremental Data Processing using Delta Lake with EMR"

Quickstart

Create s3 bucket for delta lake (e.g. learn-deltalake-2022)
Create an EMR Cluster using AWS CDK (Check details in instructions)
Create an EMR Studio using AWS CDK (Check details in instructions)
Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/
Open the EMR Studio and create an EMR Studio Workspace
Launch the EMR Studio Workspace
Attach the EMR Cluster to a Jupyter Notebook
Upload deltalake-with-emr-demo.ipynb into the Jupyter Notebook
Set kernel to PySpark, and Run each cells
For running Amazon Athena queries on Delta Lake, Check this

Key Configurations

Amazon EMR Applications
- Hadoop
- Hive
- JupyterHub
- JupyterEnterpriseGateway
- Livy
- Apache Spark (>= 3.0)

Apache Spark (PySpark)

{
  "conf": {
    "spark.jars.packages": "io.delta:delta-core_2.12:{version}",
    "spark.sql.extensions": "io.delta.sql.DeltasparkSessionExtension",
    "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog",
  }
}

⚠️ YOU MUST REPLACE {version} with the appropriate one
For more details, check this

Compatibility with Apache Spark

ℹ️ The following table lists are lastly updated on 26 Aug 2022

Delta lake version	Apache Spark version
2.0.x	3.2.x
1.2.x	3.2.x
1.1.x	3.2.x
1.0.x	3.1.x
0.7.x and 0.8.x	3.0.x
Below 0.7.x	2.4.2 - 2.4.<latest>

More infomration at: Delta Lake releases

References

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cdk-stacks		cdk-stacks
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
amazon_athena_queries_on_deltalake.md		amazon_athena_queries_on_deltalake.md
deltalake-with-emr-demo.ipynb		deltalake-with-emr-demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cdk-stacks

cdk-stacks

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

amazon_athena_queries_on_deltalake.md

amazon_athena_queries_on_deltalake.md

deltalake-with-emr-demo.ipynb

deltalake-with-emr-demo.ipynb

Repository files navigation

Deltalake with Amazon EMR

Quickstart

Key Configurations

Compatibility with Apache Spark

References

Security

License

About

Releases

Packages

Contributors 2

Languages

License

aws-samples/amazon-emr-with-delta-lake

Folders and files

Latest commit

History

Repository files navigation

Deltalake with Amazon EMR

Quickstart

Key Configurations

Compatibility with Apache Spark

References

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages