Apache Spark | Sheetly Cheat Sheet

Last Updated: November 21, 2025

Apache Spark

Big data processing framework

Core Concepts

Item Description
RDD Resilient Distributed Dataset
DataFrame Structured data with named columns
Transformation Lazy operation (map, filter, join)
Action Triggers computation (count, collect)

Common Operations

Item Description
select() Select specific columns
filter() Filter rows by condition
groupBy() Group data for aggregation
join() Join two DataFrames

SQL Operations

# Register DataFrame as temp view
df.createOrReplaceTempView("people")

# Run SQL query
result = spark.sql("""
    SELECT name, age 
    FROM people 
    WHERE age > 18 
    ORDER BY age DESC
""")

result.show()

💡 Pro Tips

Quick Reference

Spark processes data in-memory for fast analytics

← Back to Data Science & ML | Browse all categories | View all cheat sheets