MCA Microsoft Certified Associate Azure Data Engineer Study Guide
Exam DP-203

By (author) Perkins,Benjamin Perkins

ISBN13: 9781119885429

Imprint: Sybex Inc.,U.S.

Publisher: John Wiley & Sons Inc

Format:

Published: 06/09/2023

Availability: Available

Description
Prepare for the Azure Data Engineering certification—and an exciting new career in analytics—with this must-have study aide In the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203, accomplished data engineer and tech educator Benjamin Perkins delivers a hands-on, practical guide to preparing for the challenging Azure Data Engineer certification and for a new career in an exciting and growing field of tech. In the book, you’ll explore all the objectives covered on the DP-203 exam while learning the job roles and responsibilities of a newly minted Azure data engineer. From integrating, transforming, and consolidating data from various structured and unstructured data systems into a structure that is suitable for building analytics solutions, you’ll get up to speed quickly and efficiently with Sybex’s easy-to-use study aids and tools. This Study Guide also offers: Career-ready advice for anyone hoping to ace their first data engineering job interview and excel in their first day in the field Indispensable tips and tricks to familiarize yourself with the DP-203 exam structure and help reduce test anxiety Complimentary access to Sybex’s expansive online study tools, accessible across multiple devices, and offering access to hundreds of bonus practice questions, electronic flashcards, and a searchable, digital glossary of key terms A one-of-a-kind study aid designed to help you get straight to the crucial material you need to succeed on the exam and on the job, the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203 belongs on the bookshelves of anyone hoping to increase their data analytics skills, advance their data engineering career with an in-demand certification, or hoping to make a career change into a popular new area of tech.
Introduction xxvii Part I Azure Data Engineer Certification and Azure Products 1 Chapter 1 Gaining the Azure Data Engineer Associate Certification 3 The Journey to Certification 7 How to Pass Exam DP- 203 8 Understanding the Exam Expectations and Requirements 9 Use Azure Daily 17 Read Azure Articles to Stay Current 17 Have an Understanding of All Azure Products 20 Azure Product Name Recognition 21 Azure Data Analytics 23 Azure Synapse Analytics 23 Azure Databricks 26 Azure HDInsight 28 Azure Analysis Services 30 Azure Data Factory 31 Azure Event Hubs 33 Azure Stream Analytics 34 Other Products 35 Azure Storage Products 36 Azure Data Lake Storage 37 Azure Storage 40 Other Products 42 Azure Databases 43 Azure Cosmos DB 43 Azure SQL Server Products 46 Additional Azure Databases 46 Other Products 47 Azure Security 48 Azure Active Directory 48 Role- Based Access Control 51 Attribute- Based Access Control 53 Azure Key Vault 53 Other Products 55 Azure Networking 56 Virtual Networks 56 Other Products 59 Azure Compute 59 Azure Virtual Machines 59 Azure Virtual Machine Scale Sets 60 Azure App Service Web Apps 60 Azure Functions 60 Azure Batch 60 Azure Management and Governance 60 Azure Monitor 61 Azure Purview 61 Azure Policy 62 Azure Blueprints (Preview) 62 Azure Lighthouse 62 Azure Cost Management and Billing 62 Other Products 63 Summary 64 Exam Essentials 64 Review Questions 66 Chapter 2 CREATE DATABASE dbName; GO 69 The Brainjammer 70 A Historical Look at Data 71 Variety 73 Velocity 74 Volume 74 Data Locations 74 Data File Formats 75 Data Structures, Types, and Concepts 83 Data Structures 83 Data Types and Management 92 Data Concepts 95 Data Programming and Querying for Data Engineers 125 Data Programming 126 Querying Data 143 Understanding Big Data Processing 169 Big Data Stages 169 Etl, Elt, Eltl 174 Analytics Types 175 Big Data Layers 176 Summary 177 Exam Essentials 177 Review Questions 179 Part II Design and Implement Data Storage 181 Chapter 3 Data Sources and Ingestion 183 Where Does Data Come From? 185 Design a Data Storage Structure 189 Design an Azure Data Lake Solution 190 Recommended File Types for Storage 198 Recommended File Types for Analytical Queries 199 Design for Efficient Querying 200 Design for Data Pruning 203 Design a Folder Structure That Represents the Levels of Data Transformation 203 Design a Distribution Strategy 205 Design a Data Archiving Solution 206 Design a Partition Strategy 207 Design a Partition Strategy for Files 209 Design a Partition Strategy for Analytical Workloads 210 Design a Partition Strategy for Efficiency and Performance 211 Design a Partition Strategy for Azure Synapse Analytics 211 Identify When Partitioning Is Needed in Azure Data Lake Storage Gen 2 212 Design the Serving/Data Exploration Layer 213 Design Star Schemas 214 Design Slowly Changing Dimensions 215 Design a Dimensional Hierarchy 219 Design a Solution for Temporal Data 220 Design for Incremental Loading 222 Design Analytical Stores 223 Design Metastores in Azure Synapse Analytics and Azure Databricks 224 The Ingestion of Data into a Pipeline 228 Azure Synapse Analytics 228 Azure Data Factory 268 Azure Databricks 275 Event Hubs and IoT Hub 301 Azure Stream Analytics 303 Apache Kafka for HDInsight 314 Migrating and Moving Data 316 Summary 317 Exam Essentials 317 Review Questions 319 Chapter 4 The Storage of Data 321 Implement Physical Data Storage Structures 322 Implement Compression 322 Implement Partitioning 325 Implement Sharding 328 Implement Different Table Geometries with Azure Synapse Analytics Pools 329 Implement Data Redundancy 331 Implement Distributions 341 Implement Data Archiving 342 Azure Synapse Analytics Develop Hub 346 Implement Logical Data Structures 360 Build a Temporal Data Solution 361 Build a Slowly Changing Dimension 365 Build a Logical Folder Structure 368 Build External Tables 369 Implement File and Folder Structures for Efficient Querying and Data Pruning 372 Implement a Partition Strategy 375 Implement a Partition Strategy for Files 376 Implement a Partition Strategy for Analytical Workloads 377 Implement a Partition Strategy for Streaming Workloads 378 Implement a Partition Strategy for Azure Synapse Analytics 378 Design and Implement the Data Exploration Layer 379 Deliver Data in a Relational Star Schema 379 Deliver Data in Parquet Files 385 Maintain Metadata 386 Implement a Dimensional Hierarchy 386 Create and Execute Queries by Using a Compute Solution That Leverages SQL Serverless and Spark Cluster 388 Recommend Azure Synapse Analytics Database Templates 389 Implement Azure Synapse Analytics Database Templates 389 Additional Data Storage Topics 390 Storing Raw Data in Azure Databricks for Transformation 390 Storing Data Using Azure HDInsight 392 Storing Prepared, Trained, and Modeled Data 393 Summary 394 Exam Essentials 395 Review Questions 396 Part III Develop Data Processing 399 Chapter 5 Transform, Manage, and Prepare Data 401 Chapter 6 Ingest and Transform Data 402 Transform Data Using Azure Synapse Pipelines 404 Transform Data Using Azure Data Factory 410 Transform Data Using Apache Spark 414 Transform Data Using Transact- SQL 429 Transform Data Using Stream Analytics 431 Cleanse Data 433 Split Data 435 Shred JSON 439 Encode and Decode Data 445 Configure Error Handling for the Transformation 450 Normalize and Denormalize Values 451 Transform Data by Using Scala 461 Perform Exploratory Data Analysis 463 Transformation and Data Management Concepts 473 Transformation 473 Data Management 480 Azure Databricks 481 Data Modeling and Usage 485 Data Modeling with Machine Learning 486 Usage 494 Summary 500 Exam Essentials 500 Review Questions 502 Create and Manage Batch Processing and Pipelines 505 Design and Develop a Batch Processing Solution 507 Design a Batch Processing Solution 510 Develop Batch Processing Solutions 512 Create Data Pipelines 538 Handle Duplicate Data 560 Handle Missing Data 569 Handle Late- Arriving Data 571 Upsert Data 572 Configure the Batch Size 578 Configure Batch Retention 581 Design and Develop Slowly Changing Dimensions 582 Design and Implement Incremental Data Loads 583 Integrate Jupyter/IPython Notebooks into a Data Pipeline 590 Chapter 7 Revert Data to a Previous State 591 Handle Security and Compliance Requirements 592 Design and Create Tests for Data Pipelines 593 Scale Resources 593 Design and Configure Exception Handling 593 Debug Spark Jobs Using the Spark UI 594 Implement Azure Synapse Link and Query the Replicated Data 594 Use PolyBase to Load Data to a SQL Pool 595 Read from and Write to a Delta Table 595 Manage Batches and Pipelines 596 Trigger Batches 597 Schedule Data Pipelines 597 Validate Batch Loads 598 Implement Version Control for Pipeline Artifacts 604 Manage Data Pipelines 607 Manage Spark Jobs in a Pipeline 609 Handle Failed Batch Loads 610 Summary 610 Exam Essentials 611 Review Questions 612 Design and Implement a Data Stream Processing Solution 615 Develop a Stream Processing Solution 617 Design a Stream Processing Solution 618 Create a Stream Processing Solution 630 Process Time Series Data 657 Design and Create Windowed Aggregates 658 Process Data Within One Partition 661 Process Data Across Partitions 663 Upsert Data 665 Handle Schema Drift 674 Configure Checkpoints/Watermarking During Processing 680 Replay Archived Stream Data 685 Design and Create Tests for Data Pipelines 688 Monitor for Performance and Functional Regressions 689 Optimize Pipelines for Analytical or Transactional Purposes 689 Scale Resources 690 Design and Configure Exception Handling 691 Handle Interruptions 694 Ingest and Transform Data 694 Transform Data Using Azure Stream Analytics 694 Monitor Data Storage and Data Processing 695 Monitor Stream Processing 695 Summary 695 Exam Essentials 696 Review Questions 697 Part IV Secure, Monitor, and Optimize Data Storage and Data Processing 699 Chapter 8 Keeping Data Safe and Secure 701 Design Security for Data Policies and Standards 702 Design a Data Auditing Strategy 711 Design a Data Retention Policy 716 Design for Data Privacy 717 Design to Purge Data Based on Business Requirements 719 Design Data Encryption for Data at Rest and in Transit 719 Design Row- Level and Column- Level Security 722 Design a Data Masking Strategy 723 Design Access Control for Azure Data Lake Storage Gen 2 724 Implement Data Security 730 Implement a Data Auditing Strategy 731 Manage Sensitive Information 739 Implement a Data Retention Policy 745 Encrypt Data at Rest and in Motion 748 Implement Row- Level and Column- Level Security 749 Implement Data Masking 753 Manage Identities, Keys, and Secrets Across Different Data Platform Technologies 755 Implement Access Control for Azure Data Lake Storage Gen 2 765 Implement Secure Endpoints (Private and Public) 772 Implement Resource Tokens in Azure Databricks 778 Load a DataFrame with Sensitive Information 779 Write Encrypted Data to Tables or Parquet Files 780 Develop a Batch Processing Solution 781 Handle Security and Compliance Requirements 782 Design and Implement the Data Exploration Layer 784 Browse and Search Metadata in Microsoft Purview Data Catalog 784 Push New or Updated Data Lineage to Microsoft Purview 785 Summary 786 Exam Essentials 787 Review Questions 789 Chapter 9 Monitoring Azure Data Storage and Processing 791 Monitoring Data Storage and Data Processing 793 Implement Logging Used by Azure Monitor 793 Configure Monitoring Services 799 Understand Custom Logging Options 821 Measure Query Performance 822 Monitor Data Pipeline Performance 823 Monitor Cluster Performance 824 Measure Performance of Data Movement 824 Interpret Azure Monitor Metrics and Logs 825 Monitor and Update Statistics about Data Across a System 828 Schedule and Monitor Pipeline Tests 830 Interpret a Spark Directed Acyclic Graph 830 Monitor Stream Processing 832 Implement a Pipeline Alert Strategy 832 Develop a Batch Processing Solution 832 Design and Create Tests for Data Pipelines 832 Develop a Stream Processing Solution 837 Monitor for Performance and Functional Regressions 837 Design and Create Tests for Data Pipelines 838 Azure Monitoring Overview 841 Azure Batch 841 Azure Key Vault 842 Azure SQL 843 Summary 844 Exam Essentials 844 Review Questions 846 Chapter 10 Troubleshoot Data Storage Processing 849 Optimize and Troubleshoot Data Storage and Data Processing 851 Optimize Resource Management 854 Compact Small Files 857 Handle Skew in Data 859 Handle Data Spill 860 Find Shuffling in a Pipeline 862 Tune Shuffle Partitions 864 Tune Queries by Using Indexers 869 Tune Queries by Using Cache 876 Optimize Pipelines for Analytical or Transactional Purposes 877 Optimize Pipeline for Descriptive versus Analytical Workloads 886 Troubleshoot a Failed Spark Job 888 Troubleshoot a Failed Pipeline Run 890 Rewrite User- Defined Functions 899 Design and Develop a Batch Processing Solution 901 Design and Configure Exception Handling 902 Debug Spark Jobs by Using the Spark UI 902 Scale Resources 902 Monitor Batches and Pipelines 904 Handle Failed Batch Loads 904 Design and Develop a Stream Processing Solution 905 Optimize Pipelines for Analytical or Transactional Purposes 905 Handle Interruptions 906 Scale Resources 908 Summary 909 Exam Essentials 910 Review Questions 912 Appendix Answers to Review Questions 915 Chapter 1: Gaining the Azure Data Engineer Associate Certification 916 Chapter 2: CREATE DATABASE dbName; GO 916 Chapter 3: Data Sources and Ingestion 917 Chapter 4: The Storage of Data 918 Chapter 5: Transform, Manage, and Prepare Data 918 Chapter 6. Create and Manage Batch Processing and Pipelines 919 Chapter 7: Design and Implement a Data Stream Processing Solution 920 Chapter 8: Keeping Data Safe and Secure 921 Chapter 9: Monitoring Azure Data Storage and Processing 921 Chapter 10: Troubleshoot Data Storage Processing 922 Index 925
  • Education
  • Mathematics
  • Electronics & communications engineering
  • Professional & Vocational
Height:239
Width:193
Spine:66
Weight:1383.00
List Price: £47.50