Prepare DSA-C02 Question Answers Free Update With 100% Exam Passing Guarantee [Q19-Q41]

Share

Prepare DSA-C02 Question Answers Free Update With 100% Exam Passing Guarantee [2024]

Dumps Real Snowflake DSA-C02 Exam Questions [Updated 2024]

NEW QUESTION # 19
Which of the following metrics are used to evaluate classification models?

  • A. F1 score
  • B. All of the above
  • C. Area under the ROC curve
  • D. Confusion matrix

Answer: B

Explanation:
Explanation
Evaluation metrics are tied to machine learning tasks. There are different metrics for the tasks of classification and regression. Some metrics, like precision-recall, are useful for multiple tasks. Classification and regression are examples of supervised learning, which constitutes a majority of machine learning applications. Using different metrics for performance evaluation, we should be able to im-prove our model's overall predictive power before we roll it out for production on unseen data. Without doing a proper evaluation of the Machine Learning model by using different evaluation metrics, and only depending on accuracy, can lead to a problemwhen the respective model is deployed on unseen data and may end in poor predictions.
Classification metrics are evaluation measures used to assess the performance of a classification model.
Common metrics include accuracy (proportion of correct predictions), precision (true positives over total predicted positives), recall (true positives over total actual positives), F1 score (har-monic mean of precision and recall), and area under the receiver operating characteristic curve (AUC-ROC).
Confusion Matrix
Confusion Matrix is a performance measurement for the machine learning classification problems where the output can be two or more classes. It is a table with combinations of predicted and actual values.
It is extremely useful for measuring the Recall, Precision, Accuracy, and AUC-ROC curves.
The four commonly used metrics for evaluating classifier performance are:
1. Accuracy: The proportion of correct predictions out of the total predictions.
2. Precision: The proportion of true positive predictions out of the total positive predictions (precision = true positives / (true positives + false positives)).
3. Recall (Sensitivity or True Positive Rate): The proportion of true positive predictions out of the total actual positive instances (recall = true positives / (true positives + false negatives)).
4. F1 Score: The harmonic mean of precision and recall, providing a balance between the two metrics (F1 score = 2 * ((precision * recall) / (precision + recall))).
These metrics help assess the classifier's effectiveness in correctly classifying instances of different classes.
Understanding how well a machine learning model will perform on unseen data is the main purpose behind working with these evaluation metrics. Metrics like accuracy, precision, recall are good ways to evaluate classification models for balanced datasets, but if the data is imbalanced then other methods like ROC/AUC perform better in evaluating the model performance.
ROC curve isn't just a single number but it's a whole curve that provides nuanced details about the behavior of the classifier. It is also hard to quickly compare many ROC curves to each other.


NEW QUESTION # 20
A Data Scientist as data providers require to allow consumers to access all databases and database objects in a share by granting a single privilege on shared databases. Which one is incorrect SnowSQL command used by her while doing this task?
Assuming:
A database named product_db exists with a schema named product_agg and a table named Item_agg.
The database, schema, and table will be shared with two accounts named xy12345 and yz23456.
1.USE ROLE accountadmin;
2.CREATE DIRECT SHARE product_s;
3.GRANT USAGE ON DATABASE product_db TO SHARE product_s;
4.GRANT USAGE ON SCHEMA product_db. product_agg TO SHARE product_s;
5.GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;
6.SHOW GRANTS TO SHARE product_s;
7.ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
8.SHOW GRANTS OF SHARE product_s;

  • A. GRANT USAGE ON DATABASE product_db TO SHARE product_s;
  • B. CREATE DIRECT SHARE product_s;
  • C. GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;
  • D. ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;

Answer: C

Explanation:
Explanation
CREATE SHARE product_s is the correct Snowsql command to create Share object.
Rest are correct ones.
https://docs.snowflake.com/en/user-guide/data-sharing-provider#creating-a-share-using-sql


NEW QUESTION # 21
You previously trained a model using a training dataset. You want to detect any data drift in the new data collected since the model was trained.
What should you do?

  • A. Add the new data to the existing dataset and enable Application Insights for the service where the model is deployed.
  • B. Retrained your training dataset after correcting data outliers & no need to introduce new data.
  • C. Create a new dataset using the new data and a timestamp column and create a data drift monitor that uses the training dataset as a baseline and the new dataset as a target.
  • D. Create a new version of the dataset using only the new data and retrain the model.

Answer: C

Explanation:
Explanation
To track changing data trends, create a data drift monitor that uses the training data as a baseline and the new data as a target.
Model drift and decay are concepts that describe the process during which the performance of a model deployed to production degrades on new, unseen data or the underlying assumptions about the data change.
These are important metrics to track once models are deployed toproduction. Models must be regularly re-trained on new data. This is referred to as refitting the model. This can be done either on a periodic basis, or, in an ideal scenario, retraining can be triggered when the performance of the model degrades below a certain pre-defined threshold.


NEW QUESTION # 22
Which one is incorrect understanding about Providers of Direct share?

  • A. As a data provider, you share a database with one or more Snowflake accounts.
  • B. You can create as many shares as you want, and add as many accounts to a share as you want.
  • C. If you want to provide a share to many accounts, you can do the same via Direct Share.
  • D. A data provider is any Snowflake account that creates shares and makes them available to other Snowflake accounts to consume.

Answer: C

Explanation:
Explanation
If you want to provide a share to many accounts, you might want to use a listing or a data ex-change.


NEW QUESTION # 23
What Can Snowflake Data Scientist do in the Snowflake Marketplace as Provider?

  • A. Eliminate the costs of building and maintaining APIs and data pipelines to deliver data to customers.
  • B. Publish listings for datasets that can be customized for the consumer.
  • C. Share live datasets securely and in real-time without creating copies of the data or im-posing data integration tasks on the consumer.
  • D. Publish listings for free-to-use datasets to generate interest and new opportunities among the Snowflake customer base.

Answer: A,B,C,D

Explanation:
Explanation
All are correct!
About the Snowflake Marketplace
You can use the Snowflake Marketplace to discover and access third-party data and services, as well as market your own data products across the Snowflake Data Cloud.
As a data provider, you can use listings on the Snowflake Marketplace to share curated data offer-ings with many consumers simultaneously, rather than maintain sharing relationships with each indi-vidual consumer.
With Paid Listings, you can also charge for your data products.
As a consumer, you might use the data provided on the Snowflake Marketplace to explore and ac-cess the following:
Historical data for research, forecasting, and machine learning.
Up-to-date streaming data, such as current weather and traffic conditions.
Specialized identity data for understanding subscribers and audience targets.
New insights from unexpected sources of data.
The Snowflake Marketplace is available globally to all non-VPS Snowflake accounts hosted on Amazon Web Services, Google Cloud Platform, and Microsoft Azure, with the exception of Mi-crosoft Azure Government.
Support for Microsoft Azure Government is planned.


NEW QUESTION # 24
What is the formula for measuring skewness in a dataset?

  • A. MODE - MEDIAN
  • B. (MEAN - MODE)/ STANDARD DEVIATION
  • C. (3(MEAN - MEDIAN))/ STANDARD DEVIATION
  • D. MEAN - MEDIAN

Answer: C

Explanation:
Explanation
Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical expla-nation for mathematical proofs, you can refer to books or websites that speak on the same in detail.


NEW QUESTION # 25
There are a couple of different types of classification tasks in machine learning, Choose the Correct Classification which best categorized the below Application Tasks in Machine learning?
To detect whether email is spam or not
To determine whether or not a patient has a certain disease in medicine.
To determine whether or not quality specifications were met when it comes to QA (Quality Assurance).

  • A. Binary Classification
  • B. Multi-Class Classification
  • C. Logistic Regression
  • D. Multi-Label Classification

Answer: A

Explanation:
Explanation
The Supervised Machine Learning algorithm can be broadly classified into Regression and Classification Algorithms. In Regression algorithms, we have predicted the output for continuous values, but to predict the categorical values, we need Classification algorithms.
What is the Classification Algorithm?
The Classification algorithm is a Supervised Learning technique that is used to identify the category of new observations on the basis of training data. In Classification, a program learns from the given dataset or observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or
1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue",
"fruit or animal", etc. Since the Classification algorithm is a Supervised learning technique, hence it takes labeled input data, which means it contains input with the corresponding output.
In classification algorithm, a discrete output function(y) is mapped to input variable(x).
y=f(x), where y = categorical output
The best example of an ML classification algorithm is Email Spam Detector.
The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are mainly used to predict the output for the categorical data.
The algorithm which implements the classification on a dataset is known as a classifier. There are two types of Classifications:
Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
Binary classification in deep learning refers to the type of classification where we have two class labels - one normal and one abnormal. Some examples of binary classification use:
To detect whether email is spam or not
To determine whether or not a patient has a certain disease in medicine.
To determine whether or not quality specifications were met when it comes to QA (Quality Assurance).
For example, the normal class label would be that a patient has the disease, and the abnormal class label would be that they do not, or vice-versa.
As is with every other type of classification, it is only as good as the binary classification dataset that it has - or, in other words, the more training and data it has, the better it is.


NEW QUESTION # 26
Which are the following additional Metadata columns Stream contains that could be used for creating Efficient Data science Pipelines & helps in transforming only the New/Modified data only?

  • A. METADATA$DELETE
  • B. METADATA$ACTION
  • C. METADATA$FILE_ID
  • D. METADATA$ROW_ID
  • E. METADATA$ISUPDATE

Answer: B,D,E

Explanation:
Explanation
A stream stores an offset for the source object and not any actual table columns or data. When que-ried, a stream accesses and returns the historic data in the same shape as the source object (i.e. the same column names and ordering) with the following additional columns:
METADATA$ACTION
Indicates the DML operation (INSERT, DELETE) recorded.
METADATA$ISUPDATE
Indicates whether the operation was part of an UPDATE statement. Updates to rows in the source object are represented as a pair of DELETE and INSERT records inthe stream with a metadata column METADATA$ISUPDATE values set to TRUE.
Note that streams record the differences between two offsets. If a row is added and then updated in the current offset, the delta change is a new row. The METADATA$ISUPDATE row records a FALSE value.
METADATA$ROW_ID
Specifies the unique and immutable ID for the row, which can be used to track changes to specific rows over time.


NEW QUESTION # 27
Data Scientist used streams in ELT (extract, load, transform) processes where new data inserted in-to a staging table is tracked by a stream. A set of SQL statements transform and insert the stream contents into a set of production tables. Raw data is coming in the JSON format, but for analysis he needs to transform it into relational columns in the production tables. which of the following Data transformation SQL function he can used to achieve the same?

  • A. lateral flatten()
  • B. Transpose()
  • C. He could not apply Transformation on Stream table data.
  • D. METADATA$ACTION ()

Answer: A

Explanation:
Explanation
To know about lateral flatten SQL Function, please refer:
https://docs.snowflake.com/en/sql-reference/constructs/join-lateral#example-of-using-lateral-with-flatten


NEW QUESTION # 28
Mark the incorrect statement regarding usage of Snowflake Stream & Tasks?

  • A. Snowflake ensures only one instance of a task with a schedule (i.e. a standalone task or the root task in a DAG) is executed at a given time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.
  • B. Streams support repeatable read isolation.
  • C. An standard-only stream tracks row inserts only.
  • D. Snowflake automatically resizes and scales the compute resources for serverless tasks.

Answer: C

Explanation:
Explanation
All are correct except a standard-only stream tracks row inserts only.
A standard (i.e. delta) stream tracks all DML changes to the source object, including inserts, up-dates, and deletes (including table truncates).


NEW QUESTION # 29
Skewness of Normal distribution is ___________

  • A. 0
  • B. Undefined
  • C. Positive
  • D. Negative

Answer: A

Explanation:
Explanation
Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical explanation for mathematical proofs, you can refer to books or websites that speak on the same in detail.


NEW QUESTION # 30
Consider a data frame df with columns ['A', 'B', 'C', 'D'] and rows ['r1', 'r2', 'r3']. What does the ex-pression df[lambda x : x.index.str.endswith('3')] do?

  • A. Results in Error
  • B. Returns the third column
  • C. Filters the row labelled r3
  • D. Returns the row name r3

Answer: C

Explanation:
Explanation
It will Filters the row labelled r3.


NEW QUESTION # 31
Which object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data of Data Science Pipelines?

  • A. Dynamic tables
  • B. Task
  • C. Tags
  • D. Delta
  • E. OFFSET
  • F. Stream

Answer: F

Explanation:
Explanation
A stream object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change,so that actions can be taken using the changed data. This process is referred to as change data capture (CDC). An individual table stream tracks the changes made to rows in a source table. A table stream (also referred to as simply a "stream") makes a "change table" available of what changed, at therow level, between two transactional points of time in a table. This allows querying and consuming a sequence of change records in a transactional fashion.
Streams can be created to query change data on the following objects:
Standard tables, including shared tables.
Views, including secure views
Directory tables
Event tables


NEW QUESTION # 32
Mark the correct steps for saving the contents of a DataFrame to aSnowflake table as part of Moving Data from Spark to Snowflake?

  • A. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the NAME() method.
    Step 3.Use the dbtable option to specify the table to which data is written.
    Step 4.Specify the connector options using either the option() or options() method.
    Step 5.Use the save() method to specify the save mode for the content.
  • B. Step 1.Use the write() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Specify the connector options using either the option() or options() method.
    Step 4.Use the dbtable option to specify the table to which data is written.
    Step 5.Use the mode() method to specify the save mode for the content.
    (Correct)
  • C. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Specify the connector options using either the option() or options() method.
    Step 4.Use the dbtable option to specify the table to which data is written.
    Step 5.Use the save() method to specify the save mode for the content.
  • D. Step 1.Use the writer() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Use the dbtable option to specify the table to which data is written.
    Step 4.Specify the connector options using either the option() or options() method.
    Step 5.Use the save() method to specify the save mode for the content.

Answer: B

Explanation:
Explanation
Moving Data from Spark to Snowflake
The steps for saving the contents of a DataFrame to a Snowflake table are similar to writing from Snowflake to Spark:
1. Use the write() method of the DataFrame to construct a DataFrameWriter.
2. Specify SNOWFLAKE_SOURCE_NAME using the format() method.
3. Specify the connector options using either the option() or options() method.
4. Use the dbtable option to specify the table to which data is written.
5. Use the mode() method to specify the save mode for the content.
Examples
1.df.write
2..format(SNOWFLAKE_SOURCE_NAME)
3..options(sfOptions)
4..option("dbtable", "t2")
5..mode(SaveMode.Overwrite)
6..save()


NEW QUESTION # 33
Which one of the following is not the key component while designing External functions within Snowflake?

  • A. API Integration
  • B. UDF Service
  • C. Proxy Service
  • D. Remote Service

Answer: B

Explanation:
Explanation
What is an External Function?
An external function calls code that is executed outside Snowflake.
The remotely executed code is known as a remote service.
Information sent to a remote service is usually relayed through a proxy service.
Snowflake stores security-related external function information in an API integration.
External Function:
An external function is a type of UDF. Unlike other UDFs, an external function does not contain its own code; instead, the external function calls code that is stored and executed outside Snowflake.
Inside Snowflake, the external function is stored as a database object that contains information that Snowflake uses to call the remote service. This stored information includes the URL of the proxy service that relays information to and from the remote service.
Remote Service:
The remotely executed code is known as a remote service.
The remote service must act like a function. For example, it must return a value.
Snowflake supports scalar external functions; the remote service must return exactly one row for each row received.
Proxy Service:
Snowflake does not call a remote service directly. Instead, Snowflake calls a proxy service, which relays the data to the remote service.
The proxy service can increase security by authenticating requests to the remote service.
The proxy service can support subscription-based billing for a remote service. For example, the proxy service can verify that a caller to the remote service is a paid subscriber.
The proxy service also relays the response from the remote service back to Snowflake.
Examples of proxy services include:
Amazon API Gateway.
Microsoft Azure API Management service.
API Integration:
An integration is a Snowflake object that provides an interface between Snowflake and third-party services.
An API integration stores information, such as security information, that is needed to work with a proxy service or remote service.
An API integration is created with the CREATE API INTEGRATION command.
Users can write and call their own remote services, or call remote services written by third parties. These remote services can be written using any HTTP server stack,including cloud serverless compute services such as AWS Lambda.


NEW QUESTION # 34
As Data Scientist looking out to use Reader account, Which ones are the correct considerations about Reader Accounts for Third-Party Access?

  • A. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
  • B. Each reader account belongs to the provider account that created it.
  • C. Data sharing is only possible between Snowflake accounts.
  • D. Users in a reader account can query data that has been shared with the reader account, but cannot perform any of the DML tasks that are allowed in a full account, such as data loading, insert, update, and similar data manipulation operations.

Answer: C

Explanation:
Explanation
Data sharing is only supported between Snowflake accounts. As a data provider, you might want to share data with a consumer who does not already have a Snowflake account or is not ready to be-come a licensed Snowflake customer.
To facilitate sharing data with these consumers, you can create reader accounts. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
Each reader account belongs to the provider account that created it. As a provider, you use shares to share databases with reader accounts; however, a reader account can only consume data from the provider account that created it.
So, Data Sharing is possible between Snowflake & Non-snowflake accounts via Reader Account.


NEW QUESTION # 35
Which of the following is a Python-based web application framework for visualizing data and analyzing results in a more efficient and flexible way?

  • A. StreamBI
  • B. Streamlit
  • C. Streamsets
  • D. Rapter

Answer: B

Explanation:
Explanation
Streamlit is a Python-based web application framework for visualizing data and analyzing results in a more efficient and flexible way. It is an open source library that assists data scientists and academics to develop Machine Learning (ML) visualization dashboards in a short period of time. We can build and deploy powerful data applications with just a few lines of code.
Why Streamlit?
Currently, real-world applications are in high demand and developers are developing new libraries and frameworks to make on-the-go dashboards easier to build and deploy. Streamlit is a library that reduces your dashboard development time from days to hours. Following are some reasons to choose the Streamlit:
It is a free and open-source library.
Installing Streamlit is as simple as installing any other python package It is easy to learn because you won't need any web development experience, only a basic under-standing of Python is enough to build a data application.
It is compatible with almost all machine learning frameworks, including Tensorflow and Pytorch, Scikit-learn, and visualization libraries such as Seaborn, Altair, Plotly, and many others.


NEW QUESTION # 36
Data providers add Snowflake objects (databases, schemas, tables, secure views, etc.) to a share us-ing Which of the following options?

  • A. Grant privileges on objects to a share via a third-party role.
  • B. Grant privileges on objects directly to a share.
  • C. Grant privileges on objects to a share via a database role.
  • D. Grant privileges on objects to a share via Account role.

Answer: B,C

Explanation:
ExplanationWhat is a Share?
Shares are named Snowflake objects that encapsulate all of the information required to share a database.
Data providers add Snowflake objects (databases, schemas, tables, secure views, etc.) to a share using either or both of the following options:
Option 1: Grant privileges on objects to a share via a database role.
Option 2: Grant privileges on objects directly to a share.
You choose which accounts can consume data from the share by adding the accounts to the share.
After a database is created (in a consumer account) from a share, all the shared objects are accessible to users in the consumer account.
Shares are secure, configurable, and controlled completely by the provider account:
New objects added to a share become immediately available to all consumers, providing real-time access to shared data.
Access to a share (or any of the objects in a share) can be revoked at any time.


NEW QUESTION # 37
Which metric is not used for evaluating classification models?

  • A. Recall
  • B. Accuracy
  • C. Precision
  • D. Mean absolute error

Answer: D

Explanation:
Explanation
The four commonly used metrics for evaluating classifier performance are:
1. Accuracy: The proportion of correct predictions out of the total predictions.
2. Precision: The proportion of true positive predictions out of the total positive predictions (precision = true positives / (true positives + false positives)).
3. Recall (Sensitivity or True Positive Rate): The proportion of true positive predictions out of the total actual positive instances (recall = true positives / (true positives + false negatives)).
4. F1 Score: The harmonic mean of precision and recall, providing a balance between the two metrics (F1 score = 2 * ((precision * recall) / (precision + recall))).
Root Mean Squared Error (RMSE)and Mean Absolute Error (MAE) are metrics used to evaluate a Regression Model. These metrics tell us how accurate our predictions are and, what is the amount of deviation from the actual values.


NEW QUESTION # 38
You previously trained a model using a training dataset. You want to detect any data drift in the new data collected since the model was trained.
What should you do?

  • A. Add the new data to the existing dataset and enable Application Insights for the service where the model is deployed.
  • B. Retrained your training dataset after correcting data outliers & no need to introduce new data.
  • C. Create a new dataset using the new data and a timestamp column and create a data drift monitor that uses the training dataset as a baseline and the new dataset as a target.
  • D. Create a new version of the dataset using only the new data and retrain the model.

Answer: C

Explanation:
Explanation
To track changing data trends, create a data drift monitor that uses the training data as a baseline and the new data as a target.
Model drift and decay are concepts that describe the process during which the performance of a model deployed to production degrades on new, unseen data or the underlying assumptions about the data change.
These are important metrics to track once models are deployed toproduction. Models must be regularly re-trained on new data. This is referred to as refitting the model. This can be done either on a periodic basis, or, in an ideal scenario, retraining can be triggered when the performance of the model degrades below a certain pre-defined threshold.


NEW QUESTION # 39
Select the Data Science Tools which are known to provide native connectivity to Snowflake?

  • A. HEX
  • B. DvSUM
  • C. DiYotta
  • D. Denodo

Answer: A

Explanation:
Explanation
Hex - collaborative data science and analytics platform
Denodo - data virtualization and federation platform
DvSum - data catalog and data intelligence platform
Diyotta - data integration and migration


NEW QUESTION # 40
All aggregate functions except _____ ignore null values in their input collection

  • A. Count(attribute)
  • B. Count(*)
  • C. Sum
  • D. Avg

Answer: B

Explanation:
Explanation
Count(*)
* is used to select all values including null.


NEW QUESTION # 41
......

DSA-C02 Exam Dumps, DSA-C02 Practice Test Questions: https://www.prep4sures.top/DSA-C02-exam-dumps-torrent.html

Free DSA-C02 Exam Dumps to Pass Exam Easily: https://drive.google.com/open?id=1nUL-ye_VwXXS2HYVQAO8qR0nudHgn9U9