Certified Associate Developer For Apache Spark Exam, Free Questions and Answers, Page 1

Question 1

Which of the following operations can be used to return a DataFrame with no duplicate rows? Please select the most complete answer.

A. DataFrame.distinct()
B. DataFrame.dropDuplicates() and DataFrame.distinct()
C. DataFrame.dropDuplicates()
D. DataFrame.drop_duplicates()
E. DataFrame.dropDuplicates(), DataFrame.distinct() and DataFrame.drop_duplicates()

Answer:

e

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 2

The code block shown below contains an error. The code block intended to create a single-column DataFrame from Scala List years which is made up of integers. Identify the error.

Code block:

spark.createDataset(years)

A. The years list should be wrapped in another list like List(years) to make clear that it is a column rather than a row.
B. The data type is not specified the second argument to createDataset should be IntegerType.
C. There is no operation createDataset the createDataFrame operation should be used instead.
D. The result of the above is a Dataset rather than a DataFrame the toDF operation must be called at the end.
E. The column name must be specified as the second argument to createDataset.

Answer:

b

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 3

A Spark application has a 128 GB DataFrame A and a 1 GB DataFrame B. If a broadcast join were to be performed on these two DataFrames, which of the following describes which DataFrame should be broadcasted and why?

A. Either DataFrame can be broadcasted. Their results will be identical in result and efficiency.
B. DataFrame B should be broadcasted because it is smaller and will eliminate the need for the shuffling of itself.
C. DataFrame A should be broadcasted because it is larger and will eliminate the need for the shuffling of DataFrame B.
D. DataFrame B should be broadcasted because it is smaller and will eliminate the need for the shuffling of DataFrame A.
E. DataFrame A should be broadcasted because it is smaller and will eliminate the need for the shuffling of itself.

Answer:

d

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 4

Which of the following Spark properties is used to configure whether DataFrame partitions that do not meet a minimum size threshold are automatically coalesced into larger partitions during a shuffle?

A. spark.sql.shuffle.partitions
B. spark.sql.autoBroadcastJoinThreshold
C. spark.sql.adaptive.skewJoin.enabled
D. spark.sql.inMemoryColumnarStorage.batchSize
E. spark.sql.adaptive.coalescePartitions.enabled

Answer:

e

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 5

The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF udf(assessPerformance)
storesDF.withColumn(result, assessPerformanceUDF(col(customerSatisfaction)))

A. The assessPerformance() operation is not properly registered as a UDF.
B. The withColumn() operation is not appropriate here UDFs should be applied by iterating over rows instead.
C. UDFs can only be applied vie SQL and not through the DataFrame API.
D. The return type of the assessPerformanceUDF() is not specified in the udf() operation.
E. The assessPerformance() operation should be used on column customerSatisfaction rather than the assessPerformanceUDF() operation.

Answer:

a

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 6

Which of the following code blocks writes DataFrame storesDF to file path filePath as JSON?

A. storesDF.write.option("json").path(filePath)
B. storesDF.write.json(filePath)
C. storesDF.write.path(filePath)
D. storesDF.write(filePath)
E. storesDF.write().json(filePath)

Answer:

b

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 7

The below code block contains a logical error resulting in inefficiency. The code block is intended to efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId. Identify the logical error.
Code block:
storesDF.join(broadcast(employeesDF), storeId)

A. The larger DataFrame employeesDF is being broadcasted rather than the smaller DataFrame storesDF.
B. There is never a need to call the broadcast() operation in Apache Spark 3.
C. The entire line of code should be wrapped in broadcast() rather than just DataFrame employeesDF.
D. The broadcast() operation will only perform a broadcast join if the Spark property spark.sql.autoBroadcastJoinThreshold is manually set.
E. Only one of the DataFrames is being broadcasted rather than both of the DataFrames.

Answer:

a

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 8

Which of the following operations can be used to create a new DataFrame that has 12 partitions from an original DataFrame df that has 8 partitions?

A. df.repartition(12)
B. df.cache()
C. df.partitionBy(1.5)
D. df.coalesce(12)
E. df.partitionBy(12)

Answer:

a

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 9

Which of the following code blocks returns the first 3 rows of DataFrame storesDF?

A. storesDF.top_n(3)
B. storesDF.n(3)
C. storesDF.take(3)
D. storesDF.head(3)
E. storesDF.collect(3)

Answer:

c

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

Question 10

Which of the following statements about the Spark driver is true?

A. Spark driver is horizontally scaled to increase overall processing throughput.
B. Spark driver is the most coarse level of the Spark execution hierarchy.
C. Spark driver is fault tolerant if it fails, it will recover the entire Spark application.
D. Spark driver is responsible for scheduling the execution of data by various worker nodes in cluster mode.
E. Spark driver is only compatible with its included cluster manager.

Answer:

d

User Votes:

A

50%

B

50%

C

50%

D

50%

E

50%

Discussions

vote your answer:

A

B

C

D

E

0 / 1000

CertSteps

databricks CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK Exam Questions

Page 1 out of 11. Viewing questions 1-10 out of 102

Question 1

Answer:

User Votes:

Question 2

Answer:

User Votes:

Question 3

Answer:

User Votes:

Question 4

Answer:

User Votes:

Question 5

Answer:

User Votes:

Question 6

Answer:

User Votes:

Question 7

Answer:

User Votes:

Question 8

Answer:

User Votes:

Question 9

Answer:

User Votes:

Question 10

Answer:

User Votes: