Questions for the CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK were updated on : Jan 11 ,2025
Which of the following operations can be used to return a DataFrame with no duplicate rows? Please select the most complete answer.
e
The code block shown below contains an error. The code block intended to create a single-column DataFrame from Scala List years which is made up of integers. Identify the error.
Code block:
spark.createDataset(years)
b
A Spark application has a 128 GB DataFrame A and a 1 GB DataFrame B. If a broadcast join were to be performed on these two DataFrames, which of the following describes which DataFrame should be broadcasted and why?
d
Which of the following Spark properties is used to configure whether DataFrame partitions that do not meet a minimum size threshold are automatically coalesced into larger partitions during a shuffle?
e
The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF udf(assessPerformance)
storesDF.withColumn(result, assessPerformanceUDF(col(customerSatisfaction)))
a
Which of the following code blocks writes DataFrame storesDF to file path filePath as JSON?
b
The below code block contains a logical error resulting in inefficiency. The code block is intended to efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId. Identify the logical error.
Code block:
storesDF.join(broadcast(employeesDF), storeId)
a
Which of the following operations can be used to create a new DataFrame that has 12 partitions from an original DataFrame df that has 8 partitions?
a
Which of the following code blocks returns the first 3 rows of DataFrame storesDF?
c
Which of the following statements about the Spark driver is true?
d