databricks CERTIFIED DATA ENGINEER ASSOCIATE Exam Questions

Questions for the CERTIFIED DATA ENGINEER ASSOCIATE were updated on : Sep 09 ,2024

Page 1 out of 9. Viewing questions 1-10 out of 90

Question 1

Which of the following queries is performing a streaming hop from raw data to a Bronze table?

  • E. None
Answer:

e

User Votes:
E
50%
Discussions
vote your answer:
E
0 / 1000

Question 2

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

  • A. They can turn on the Auto Stop feature for the SQL endpoint.
  • B. They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.
  • C. They can reduce the cluster size of the SQL endpoint.
  • D. They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.
  • E. They can set up the dashboard's SQL endpoint to be serverless.
Answer:

e

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 3

Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?

  • A. The ability to manipulate the same data using a variety of languages
  • B. The ability to collaborate in real time on a single notebook
  • C. The ability to set up alerts for query failures
  • D. The ability to support batch and streaming workloads
  • E. The ability to distribute complex data operations
Answer:

d

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 4

Which of the following data workloads will utilize a Gold table as its source?

  • A. A job that enriches data by parsing its timestamps into a human-readable format
  • B. A job that aggregates uncleaned data to create standard summary statistics
  • C. A job that cleans data by removing malformatted records
  • D. A job that queries aggregated data designed to feed into a dashboard
  • E. A job that ingests raw data from a streaming source into the Lakehouse
Answer:

d

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 5

A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).
Which of the following code blocks creates this SQL UDF?

  • E. None
Answer:

e

User Votes:
E
50%
Discussions
vote your answer:
E
0 / 1000

Question 6

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

  • A. GRANT ALL PRIVILEGES ON TABLE sales TO team;
  • B. GRANT SELECT CREATE MODIFY ON TABLE sales TO team;
  • C. GRANT SELECT ON TABLE sales TO team;
  • D. GRANT USAGE ON TABLE sales TO team;
  • E. GRANT ALL PRIVILEGES ON TABLE team TO sales;
Answer:

a

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 7

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.
Which of the following tools can the data engineer use to solve this problem?

  • A. Unity Catalog
  • B. Data Explorer
  • C. Delta Lake
  • D. Delta Live Tables
  • E. Auto Loader
Answer:

c

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 8

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:



If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

  • A. processingTime(1)
  • B. trigger(availableNow=True)
  • C. trigger(parallelBatch=True)
  • D. trigger(processingTime="once")
  • E. trigger(continuous="once")
Answer:

b

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 9

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

  • A. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.
  • B. Records that violate the expectation cause the job to fail.
  • C. Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.
  • D. Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.
  • E. Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.
Answer:

b

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 10

In which of the following file formats is data from Delta Lake tables primarily stored?

  • A. Delta
  • B. CSV
  • C. Parquet
  • D. JSON
  • E. A proprietary, optimized format specific to Databricks
Answer:

c

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000
To page 2