2023: Fine-grained access control (FGAC) for row, column, and cell-level filtering
2024: Spark runtime directly applying Lake Formation permissions, no more proxies
2025: Fully native integration, with Spark directly reading, writing, and managing tables under Lake Formation governance
Benefits of FGAC vs. full table access control (FTAC):
FGAC provides row and column-level data segregation, important for interactive analytics and compliance
FTAC offers complete visibility and flexibility for trusted ETL pipelines, ML preparation, and batch workloads
Unified SQL Views Across Analytics Engines
Challenge: Teams often rewrite the same business logic separately for Athena, Glue, and EMR Spark, leading to duplicated logic, drift, and governance issues
AWS Data Catalog Views provide:
Unified governance: Single logical view definition, with Lake Formation enforcing permissions consistently
Cross-engine consistency: Same result across Athena, Glue, Spark, etc. due to shared underlying logic
Faster collaboration: Teams can reuse the same view logic without rewrites or coordination
Unified Spark Runtime Across AWS
Historical context: EMR originally used a custom EMRFS connector, while the open-source community developed S3A
Recent unification efforts:
EMR 7.12, Glue 5.1, and Athena now use the same Spark 3.5.6 runtime, Iceberg 1.10.0, Hive 1.0.2, and other libraries
S3A is now the default storage connector, with alignment to the open-source community
S3A provides access to all S3 storage classes, including Glacier, enabling new ETL use cases
Performance Optimizations
Materialized Views:
Caching layer that can be used to optimize ETL pipelines by pre-filtering data, caching intermediate results, and maintaining reporting views
Integrated with EMR, Glue, and Athena Spark, with automatic refresh managed by AWS
Can provide up to 8x query performance improvements by rewriting jobs to leverage the materialized view
85% reduction in encryption overhead, resulting in 20% faster jobs
10-96x improvements for common string manipulation functions like uppercase, lowercase, trim, length, and reverse
Key Takeaways
AWS has unified the Spark experience across EMR, Glue, and Athena, providing a consistent runtime, connectors, and capabilities
Security is now built-in natively, with fine-grained access control and centralized governance through AWS Lake Formation
Performance has been significantly optimized, with materialized views, Iceberg improvements, and optimizations for common ETL patterns
These enhancements enable enterprises to build scalable, secure, and high-performance ETL pipelines on AWS without the previous complexities and operational overhead
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.