Questions for the CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK were updated on : Feb 15 ,2025
Which of the following operations can be used to return a DataFrame with no duplicate rows? Please select the most complete answer.
The code block shown below contains an error. The code block intended to create a single-column DataFrame from Scala List years which is made up of integers. Identify the error.
Code block:
A Spark application has a 128 GB DataFrame A and a 1 GB DataFrame B. If a broadcast join were to be performed on these two DataFrames, which of the following describes which DataFrame should be broadcasted and why?
Which of the following Spark properties is used to configure whether DataFrame partitions that do not meet a minimum size threshold are automatically coalesced into larger partitions during a shuffle?
The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF udf(assessPerformance)
storesDF.withColumn(result, assessPerformanceUDF(col(customerSatisfaction)))
Which of the following code blocks writes DataFrame storesDF to file path filePath as JSON?
The below code block contains a logical error resulting in inefficiency. The code block is intended to efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId. Identify the logical error.
Code block:
storesDF.join(broadcast(employeesDF), storeId)
Which of the following operations can be used to create a new DataFrame that has 12 partitions from an original DataFrame df that has 8 partitions?
Which of the following code blocks returns the first 3 rows of DataFrame storesDF?
Which of the following statements about the Spark driver is true?