Snowflake SnowPro Advanced Data Engineer Exam Dumps, SnowPro Advanced Data Engineer Practice Test Questions

Snowflake SnowPro Advanced Data Engineer Exam Dumps & Practice Test Questions

Question No 1:

A Snowflake Data Engineer is working with a secure function that retrieves data through an inbound share from another Snowflake account. The engineer attempts to assign USAGE privileges on this secure function to include it in an outbound share to share with a different account.

What will be the result of this action?

A. The attempt will result in an error because the engineer cannot reshare data that has been shared with them.
B. The attempt will result in an error because only views and secure stored procedures are eligible for sharing via outbound shares.
C. The attempt will result in an error because secure functions can only be shared through inbound shares, not outbound.
D. The secure function will be successfully shared with the target account.

Correct Answer:

A. The attempt will result in an error because the engineer cannot reshare data that has already been shared with them.

Explanation:

In Snowflake, data sharing allows seamless access to data between different accounts without the need to replicate or move the data. This is done through inbound shares (receiving data) or outbound shares (providing data). However, Snowflake enforces strict rules when it comes to resharing data, especially when dealing with secure functions.

A secure function is a user-defined function (UDF) that ensures secure data sharing practices. When this function retrieves data via an inbound share, it essentially consumes data shared from another Snowflake provider. Snowflake's resharing policies prohibit the resharing of data that has been received via an inbound share. This rule is in place to preserve data ownership and ensure proper access control between different organizations.

In this case, when the Data Engineer attempts to assign USAGE privileges on a secure function that depends on an inbound share, and then include it in an outbound share, the action fails. This is because resharing is not allowed on data derived from inbound shares, regardless of the object being shared (such as a secure function).

Therefore, an error will be returned as Snowflake disallows the resharing of data or objects that originate from inbound shares. This policy ensures that shared data is not indirectly exposed to other accounts without permission.

Understanding these limitations is crucial for ensuring compliance and proper data-sharing architecture within Snowflake.

Question No 2:

In data engineering or analytics platforms such as Snowflake or similar SQL-based systems, it's often essential to detect changes in a dataset (e.g., tables, query results, or windows) efficiently. A common approach is to compute a 'fingerprint' — a hash value that represents the contents of the dataset. This allows for quick comparisons and change detection without comparing each row manually.

Which of the following SQL functions can be used to compute a 'fingerprint' across an entire table, query result, or window to efficiently detect data changes? (Choose two correct options)

A. HASH(*)
B. HASH_AGG(*)
C. HASH_AGG(<expr>, <expr>)
D. HASH_AGG_COMPARE(*)
E. HASH_COMPARE(*)

Correct Answers: B. HASH_AGG(*) and C. HASH_AGG(<expr>, <expr>)

Explanation:

In SQL-based platforms like Snowflake, detecting changes in datasets efficiently is often done by computing a fingerprint — a hash value that summarizes the contents of the data. This fingerprint allows you to easily compare datasets without manually checking each row.

HASH_AGG(*): This function computes a single hash value over the entire dataset, essentially creating a "fingerprint" for the whole table or query result. Even a small change in the dataset will result in a different hash, making it an efficient method for change detection.
HASH_AGG(<expr>, <expr>): This function is similar to HASH_AGG(*), but it allows you to specify the columns or expressions to include in the hash calculation. This can be useful when you're only interested in certain attributes of the data or need to format or sort the data before generating the hash.

Other options:

A. HASH(*) is not a valid function in Snowflake for computing dataset fingerprints.
D. HASH_AGG_COMPARE(*) and E. HASH_COMPARE(*) are not valid functions in Snowflake and do not exist as standard SQL functions for change detection.

Thus, HASH_AGG(*) and HASH_AGG(<expr>, <expr>) are the correct choices for efficiently computing a fingerprint of a dataset in Snowflake.

Question No 3:

In Snowflake, when working with external tables—which allow querying data stored outside of Snowflake (such as in Amazon S3, Google Cloud Storage, or Azure Blob Storage)—what type of staging location must be used to support these tables?

Which type of stage is required for the creation and usage of external tables in Snowflake?

A. Only internal stages can be used, and only within a single Snowflake account
B. Only internal stages can be used, and accessible from any Snowflake account in the same organization
C. Only external stages can be used, and they can be located in any region and cloud provider
D. Only external stages can be used, and they must reside in the same region and cloud provider as the Snowflake account

Correct Answer:

C. Only external stages can be used, and they can be located in any region and cloud provider

Explanation:

In Snowflake, external tables are designed to query data directly from cloud storage platforms like Amazon S3, Google Cloud Storage, or Azure Blob Storage without the need to load it into Snowflake first. For this to work, external stages are required. These stages reference external storage locations (such as S3 buckets or Azure containers), specifying where and how Snowflake can access the data.

Unlike internal stages (which store data within Snowflake and are used for loading or unloading data), external tables specifically require external stages, which can be located in any supported region or cloud provider. Snowflake provides the flexibility to reference storage from different cloud providers and regions, enabling seamless cross-cloud analytics.

Thus, the correct answer is C, as it correctly identifies that external stages are necessary for external tables and can span multiple regions and cloud providers.

Question No 4:

A Data Engineer is managing Snowpipe, an automated data ingestion service in Snowflake. They need to verify the current status of a specific pipe called my_pipe. This pipe exists in the test database, under a case-sensitive schema named Extract.

The engineer must use the SYSTEM$PIPE_STATUS function, which takes a fully qualified pipe name in the format: 'database_name.schema_name.pipe_name'. While forming the query, it's important to properly handle case sensitivity, especially for the schema name, using double quotes where needed.

Which of the following SQL queries will correctly return the status of the pipe?

A. SELECT SYSTEM$PIPE_STATUS("test.'extract'.my_pipe");
B. SELECT SYSTEM$PIPE_STATUS('test."Extract".my_pipe');
C. SELECT * FROM SYSTEM$PIPE_STATUS('test."Extract".my_pipe');
D. SELECT * FROM SYSTEM$PIPE_STATUS("test.'extract'.my_pipe");

Correct Answer: B. SELECT SYSTEM$PIPE_STATUS('test."Extract".my_pipe');

Explanation:

In Snowflake, when working with case-sensitive objects like schemas or tables, it's important to use double quotes to reference them correctly. In this case, the schema "Extract" is case-sensitive, so it must be enclosed in double quotes.

The correct syntax to query the status of a pipe using SYSTEM$PIPE_STATUS requires:

The database name (test), which is not case-sensitive and can be referenced without quotes.
The schema name (Extract), which is case-sensitive, so it must be enclosed in double quotes.
The pipe name (my_pipe), which is not case-sensitive and doesn't need special quoting.

Thus, the correct query is:

This matches option B.

Explanation of Incorrect Options:

Option A: The quotes around extract are incorrect. The schema name should be case-sensitive, and using 'extract' (in single quotes) is invalid.
Option C: The query attempts to use SELECT * FROM, but SYSTEM$PIPE_STATUS is a scalar function, not a table function. It does not return a rowset.
Option D: This query incorrectly combines double and single quotes around the schema name and is invalid.

Question No 5:

Company A and Company B both use Snowflake for their data warehousing needs but have their accounts hosted on different cloud providers and regions. They belong to different Snowflake organizations, and Company A wants to securely share specific datasets with Company B without migrating the entire database or duplicating data manually.

Which of the following options would allow Company A to share data with Company B under these constraints? (Choose two options)

A. Create a share in Company A's Snowflake account and directly add Company B’s account as a recipient of that share.
B. Create a share in Company A's account and create a reader account as the recipient. Then, provide Company B with access to this reader account.
C. Replicate Company A's database into Company B’s Snowflake account using database replication, then create a share from Company B's account and give access to its users.
D. Set up a new account under Company A’s Snowflake organization that resides in the same cloud and region as Company B. Use database replication to move data there, and then create a share from that new account to Company B.
E. Create a separate database within Company A’s Snowflake account containing only the data intended for Company B. Share it via Snowflake’s secure data sharing mechanism and add Company B as a recipient.

correct answers: B and D.

Explanation:

The constraints in this scenario are that the accounts are on different clouds and regions, and they belong to different Snowflake organizations. Snowflake's data sharing capabilities can be limited in certain cases based on these factors.

A. Create a share in Company A's Snowflake account and directly add Company B’s account as a recipient of that share.
This option is incorrect because direct data sharing between accounts in different regions or clouds (and especially from different organizations) is not supported in Snowflake. Secure Data Sharing works within the same region and cloud unless additional configuration, such as replication, is used.

B. Create a share in Company A's account and create a reader account as the recipient. Then, provide Company B with access to this reader account.
This option is correct. Snowflake allows the creation of "reader accounts" that can be used to securely share data with external users. A reader account is managed by the data provider (Company A in this case) and does not require the external company (Company B) to have a full Snowflake account. This method is particularly effective when sharing data across different clouds or regions, as it bypasses the need for both parties to have accounts in the same organization or cloud.

C. Replicate Company A's database into Company B’s Snowflake account using database replication, then create a share from Company B's account and give access to its users.
This option is incorrect because Snowflake's database replication feature only works within the same organization. Since Company A and Company B are in different organizations, direct replication between their accounts is not supported.

D. Set up a new account under Company A’s Snowflake organization that resides in the same cloud and region as Company B. Use database replication to move data there, and then create a share from that new account to Company B.
This option is correct. While direct sharing across regions and clouds isn't supported, this solution works by creating a new account within the same cloud and region as Company B. After replicating the data into this new account, Company A can create a share from this account to Company B. This ensures that the data sharing happens within the same cloud and region, overcoming the initial constraint.

E. Create a separate database within Company A’s Snowflake account containing only the data intended for Company B. Share it via Snowflake’s secure data sharing mechanism and add Company B as a recipient.
This option is incorrect because direct sharing between different regions or clouds is not supported unless both accounts are in the same region and cloud. Even though Company A can create a separate database for the data to be shared, this direct share would not work if Company B is in a different cloud or region.

Question No 6:

Data Engineer is managing a near real-time data ingestion pipeline using Amazon Kinesis Data Firehose, which delivers streaming data to a Snowflake staging table. The average file size is between 300 MB and 500 MB. The engineer wants to optimize Snowpipe's performance and cost-efficiency when loading the files into Snowflake, as Snowpipe charges per file ingested.

Which approach should the engineer take to achieve optimal Snowpipe performance while keeping costs low?

A. Increase the size of the virtual warehouse used by Snowpipe.
B. Split the files before loading them and set the SIZE_LIMIT option to 250 MB.
C. Change the file compression size and increase the frequency of the Snowpipe loads.
D. Decrease the buffer size to trigger delivery of files sized between 100 to 250 MB in Kinesis Firehose.

correct answer: D.

Explanation:

Snowpipe charges based on the number of files ingested, so optimizing for fewer, larger files is beneficial for minimizing costs. Let's evaluate each option.

A. Increase the size of the virtual warehouse used by Snowpipe.
This option is incorrect because Snowpipe is a serverless service, and the size of the virtual warehouse used by Snowpipe does not impact its performance or cost. The performance of Snowpipe is based on the number of files ingested and the size of those files, not the warehouse size.

B. Split the files before loading them and set the SIZE_LIMIT option to 250 MB.
This option is incorrect because Snowpipe does not support manual file splitting or the SIZE_LIMIT option for controlling file sizes. Snowpipe is optimized for processing whole files as they are received, so file splitting isn't feasible in this context.

C. Change the file compression size and increase the frequency of the Snowpipe loads.
This option is incorrect because file compression size is related to storage optimization and does not affect the ingestion cost per file. Additionally, increasing the frequency of Snowpipe loads will likely result in more files being ingested, which could increase costs, especially since Snowpipe charges per file.

D. Decrease the buffer size to trigger delivery of files sized between 100 to 250 MB in Kinesis Firehose.
This option is correct. Snowflake recommends ingesting files that are in the range of 100 MB to 250 MB for cost-efficient ingestion. By adjusting the buffer size in Kinesis Firehose to trigger delivery of smaller files within this range, the engineer can minimize the number of Snowpipe loads and optimize for both performance and cost. Smaller files reduce the cost associated with ingesting many large files but prevent the excessive number of small files that could inflate the per-file cost.

Question No 7:

In a Snowflake account where access controls are managed using custom roles and a hierarchical role structure, you want to implement column-level security by masking sensitive data based on the roles assigned to users. When creating a masking policy, you need to ensure that the policy checks whether the invoking user's role is part of the active role hierarchy, so that only authorized roles can view the unmasked data.

Which of the following Snowflake functions should be used within the masking policy to evaluate access based on the invoking role’s place in the hierarchy of the session’s current roles?

A. CURRENT_ROLE
B. INVOKER_ROLE
C. IS_ROLE_IN_SESSION
D. IS_GRANTED_TO_INVOKER_ROLE

Correct Answer: D. IS_GRANTED_TO_INVOKER_ROLE

Explanation:

In Snowflake, masking policies are used to mask sensitive data based on user roles. To implement column-level security that takes the role hierarchy into account, you need to evaluate whether the invoking user’s role is authorized to see unmasked data.

The function IS_GRANTED_TO_INVOKER_ROLE is designed specifically to evaluate whether a specific role is part of the role hierarchy of the invoking role. This is essential in environments with custom hierarchical roles, as it allows for dynamic access control based on the inherited roles of the user invoking the query.

Here’s an example of how this might be used in a masking policy:

In this case, users whose active roles or roles inherited through the session hierarchy are granted the PII_ACCESS_ROLE will see the unmasked SSN, while others will see the masked value.

Why not the other options?

CURRENT_ROLE simply returns the active role of the session but doesn't evaluate role relationships.
INVOKER_ROLE returns the role of the user invoking the query, but it doesn't assess role inheritance or whether the role is authorized based on the hierarchy.
IS_ROLE_IN_SESSION checks if a role is active in the session but does not evaluate the role hierarchy or whether the role is granted to the invoker.

Thus, IS_GRANTED_TO_INVOKER_ROLE is the correct choice for evaluating access based on the role hierarchy in masking policies.

Question No 8:

A Data Engineer is working within the Snowflake environment and needs to verify the existence and security status of a user-defined function (UDF) named REVENUE_BY_REGION within the SALES schema of the MYDATABASE database. The engineer has the necessary privileges and is operating within the correct context.

Which of the following SQL statements can be used to determine whether the UDF exists and if it is secure?

A. SHOW USER FUNCTIONS LIKE 'REVENUE_BY_REGION' IN SCHEMA SALES;
B. SELECT IS_SECURE FROM SNOWFLAKE.INFORMATION_SCHEMA.FUNCTIONS WHERE FUNCTION_SCHEMA = 'SALES' AND FUNCTION_NAME = 'REVENUE_BY_REGION';
C. SELECT IS_SECURE FROM INFORMATION_SCHEMA.FUNCTIONS WHERE FUNCTION_SCHEMA = 'SALES' AND FUNCTION_NAME = 'REVENUE_BY_REGION';
D. SHOW EXTERNAL FUNCTIONS LIKE 'REVENUE_BY_REGION' IN SCHEMA SALES;
E. SHOW SECURE FUNCTIONS LIKE 'REVENUE_BY_REGION' IN SCHEMA SALES;

Correct Answers:
A. SHOW USER FUNCTIONS LIKE 'REVENUE_BY_REGION' IN SCHEMA SALES;
C. SELECT IS_SECURE FROM INFORMATION_SCHEMA.FUNCTIONS WHERE FUNCTION_SCHEMA = 'SALES' AND FUNCTION_NAME = 'REVENUE_BY_REGION';

Explanation:

To verify the existence and security status of a user-defined function (UDF), you need to check both whether the function exists and whether it is marked as secure.

Option A is correct because the SHOW USER FUNCTIONS command lists all scalar and table functions defined by users in a specified schema. Using LIKE 'REVENUE_BY_REGION' helps filter for the specific function you're looking for, and this command can be used to verify that the function exists in the SALES schema.
Option C is correct because querying the INFORMATION_SCHEMA.FUNCTIONS view provides metadata about the functions in the database. The column IS_SECURE indicates whether the function is secure. If the function is secure, its definition cannot be viewed by non-owners. This query allows you to check the security status of the REVENUE_BY_REGION function.

Why not the other options?

Option B is incorrect because the query uses SNOWFLAKE.INFORMATION_SCHEMA.FUNCTIONS, which is a special global view that may not contain functions in user-defined schemas. It's typically used for system-level functions and is not reliable for user-defined functions in custom schemas like SALES.
Option D is incorrect because SHOW EXTERNAL FUNCTIONS is used for functions that invoke external services, not standard user-defined SQL functions.
Option E is incorrect because there is no SHOW SECURE FUNCTIONS command in Snowflake. Secure functions are identified using metadata queries like in Option C.

Thus, Option A (for existence) and Option C (for security status) are the correct choices.

Question No 9:

Which of the following approaches can be used to optimize a Snowflake query for performance when working with large datasets?

A. Using CLUSTER BY to optimize partitioning of large tables.
B. Increasing the virtual warehouse size for all queries.
C. Using the CACHING option on all queries to avoid recomputation.
D. Joining tables with LEFT OUTER joins instead of INNER joins for faster performance.

Correct Answer: A. Using CLUSTER BY to optimize partitioning of large tables.

Explanation:

When working with large datasets in Snowflake, optimizing query performance becomes crucial to ensure fast and efficient execution. One of the most effective strategies is using the CLUSTER BY option, which allows Snowflake to cluster the data within the tables based on one or more columns. By clustering tables on frequently filtered or joined columns, Snowflake can reduce the number of micro-partitions that need to be scanned during query execution, thereby improving query performance.

For instance, if you are frequently querying by a date column, clustering the table by this column ensures that the data is physically organized in a way that minimizes the number of micro-partitions involved in each query, leading to more efficient querying.

Why other answers are incorrect:

B. Increasing the virtual warehouse size for all queries: While increasing the virtual warehouse size can improve performance by allocating more resources, it is not always the most efficient or cost-effective solution. Scaling up the warehouse increases costs, and Snowflake's elasticity allows you to scale based on the specific needs of each query.
C. Using the CACHING option: Snowflake does provide automatic result caching, but relying on caching to optimize performance is not always effective, particularly for highly dynamic data. Moreover, caching is automatically handled by Snowflake, and explicit control over caching options is not generally recommended.
D. Using LEFT OUTER joins instead of INNER joins: In most cases, INNER joins are more efficient than LEFT OUTER joins, as they only return rows where there is a match in both tables. A LEFT OUTER join will return all rows from the left table, which often results in additional unnecessary data and processing.

Thus, CLUSTER BY is the optimal choice for enhancing query performance when working with large datasets in Snowflake.

Question No 10:

When designing a Snowflake schema for a data warehouse, which of the following is the primary benefit of using a star schema over a snowflake schema?

A. A star schema allows for more complex joins between fact and dimension tables.
B. A star schema results in faster query performance due to fewer joins.
C. A snowflake schema simplifies ETL processes.
D. A star schema offers better normalization of data, which reduces redundancy.

Correct Answer: B. A star schema results in faster query performance due to fewer joins.

Explanation:

When designing a Snowflake schema, choosing between a star schema and a snowflake schema is a critical decision. The main advantage of the star schema is that it optimizes query performance by minimizing the number of joins required between the fact and dimension tables.

In a star schema, the fact table (which contains the primary data) is directly linked to the dimension tables (which contain descriptive attributes about the data). These dimension tables are typically denormalized, meaning that all relevant attributes are stored in a single table, resulting in fewer tables to join when performing queries. Since there are fewer joins to perform in the star schema, queries generally execute faster compared to the snowflake schema, which may require more complex joins due to the normalization of dimension tables.

Why other answers are incorrect:

A. A star schema does not necessarily allow for more complex joins: In fact, a star schema reduces the need for complex joins due to its denormalized structure. The snowflake schema, on the other hand, might require more intricate joins.
C. A snowflake schema simplifies ETL processes: The snowflake schema typically complicates ETL processes because of the normalization. It requires additional transformations and processing steps to maintain the schema's structure.
D. A star schema does not offer better normalization: One of the primary features of a star schema is that it is denormalized, not normalized. While normalization reduces redundancy in the snowflake schema, it can also increase the complexity of queries, which is why the star schema is often preferred for performance reasons.

Therefore, using a star schema provides better query performance by reducing the number of joins required, which is essential for optimizing Snowflake data warehouses.