Last Updated: November 21, 2025

Apache Spark

Big data processing framework

Core Concepts

Item	Description
`RDD`	Resilient Distributed Dataset
`DataFrame`	Structured data with named columns
`Transformation`	Lazy operation (map, filter, join)
`Action`	Triggers computation (count, collect)

Common Operations

Item	Description
`select()`	Select specific columns
`filter()`	Filter rows by condition
`groupBy()`	Group data for aggregation
`join()`	Join two DataFrames

SQL Operations

# Register DataFrame as temp view
df.createOrReplaceTempView("people")

# Run SQL query
result = spark.sql("""
    SELECT name, age 
    FROM people 
    WHERE age > 18 
    ORDER BY age DESC
""")

result.show()

💡 Pro Tips

Quick Reference

Spark processes data in-memory for fast analytics

← Back to Data Science & ML | Browse all categories | View all cheat sheets

Apache Spark | Sheetly Cheat Sheet

Apache Spark

Core Concepts

Common Operations

SQL Operations

💡 Pro Tips

Quick Reference

Related Cheat Sheets