Spark Stream Windowing Cheat Sheet

Event-time, watermarks, and aggregations

Last Updated: November 21, 2025

Focus Areas

Focus
Define window duration
Set watermark lateness

Commands & Queries

df.withWatermark('timestamp', '10 minutes')
Set lateness
df.groupBy(window('timestamp', '5 minutes')).count()
Aggregate
writeStream.trigger(Trigger.ProcessingTime('1 minute'))
Set trigger

Summary

Event-time windowing keeps streaming results consistent.

💡 Pro Tip: Emit late data metrics and adjust watermark delay carefully.
← Back to Data Science & ML | Browse all categories | View all cheat sheets