Getting ready

As in the previous sections, let's make use of the flights dataset and create an RDD and a DataFrame against this dataset:

## Create flights RDD
flights = sc.textFile('/databricks-datasets/flights/departuredelays.csv')\
.map(lambda line: line.split(","))\
.zipWithIndex()\
.filter(lambda (row, idx): idx > 0)\
.map(lambda (row, idx): row)

# Create flightsDF DataFrame
flightsDF = spark.read\
.options(header='true', inferSchema='true')
.csv('~/data/flights/departuredelays.csv')
flightsDF.createOrReplaceTempView("flightsDF")