Dataplex catalog entity groups

In Google Cloud’s Dataplex Catalog, entity groups are logical collections of metadata entities that represent datasets, tables, or other resources stored in data repositories. These entity groups are typically organized based on the structure of the data lake and reflect the relationships between different data assets. Below are examples of common entity groups in a Dataplex catalog:


1. Data Domains or Subject Areas


Entity groups can be organized by business domains or subject areas, such as:

• Sales

• Entities: Customer Transactions, Revenue, Sales Targets

• Marketing

• Entities: Campaign Data, Leads, Engagement Metrics

• Finance

• Entities: General Ledger, Expense Reports, Budget Data

• Operations

• Entities: Inventory, Supply Chain, Workforce Data


2. Data Types


Entity groups can also be classified based on the type of data:

• Master Data

• Entities: Customer Master, Product Master, Vendor Master

• Transactional Data

• Entities: Order Details, Payment Transactions, Shipment Records

• Reference Data

• Entities: Currency Codes, Country Codes, Tax Codes

• Log Data

• Entities: System Logs, Application Logs, Audit Trails


3. Data Sources


Grouping by the original data source:

• Operational Databases

• Entities: Oracle ERP, MySQL, Postgres Tables

• Third-Party APIs

• Entities: Weather Data, Market Prices, Social Media Metrics

• Cloud Storage

• Entities: GCS Buckets, Data Files (Parquet, CSV, JSON)


4. Analytical Layers


Entity groups based on data processing layers:

• Raw Data

• Entities: Unprocessed Logs, Raw IoT Data, Ingested Files

• Processed Data

• Entities: Cleaned Data, Transformed Tables, Aggregated Metrics

• Curated Data

• Entities: BI Dashboards, Reporting Tables, Machine Learning Features


5. Data Governance Classifications


Entity groups defined by data governance requirements:

• Sensitive Data

• Entities: PII Data, Payment Information, Health Records

• Non-Sensitive Data

• Entities: Open-Access Datasets, Publicly Shared Data


6. Storage Systems


Entity groups reflecting the storage technology or systems:

• BigQuery Tables

• Entities: Fact Tables, Dimension Tables, Aggregates

• Google Cloud Storage

• Entities: Bucket Contents (e.g., Yearly Financial Reports, Logs)

• Databases

• Entities: Tables and Views from MySQL, PostgreSQL, or other DBs


7. Project-Specific Groupings


Entity groups aligned with specific projects or initiatives:

• Customer 360 Initiative

• Entities: Customer Profile, Interaction History, Behavioral Data

• Supply Chain Optimization

• Entities: Supplier Performance, Delivery Times, Inventory Levels


8. Lineage-Based Grouping


Entity groups representing the flow of data:

• Source Data

• Entities: Raw Ingested Files

• Intermediate Data

• Entities: Transformation Results, Staging Tables

• Final Outputs

• Entities: Analytical Reports, Machine Learning Outputs


9. Industry-Specific Groups


For example, in an airline business:

• Passenger Data

• Entities: PNR Records, Ticket Sales, Loyalty Program Data

• Flight Operations

• Entities: Flight Schedules, Crew Rosters, Maintenance Logs

• Revenue Management

• Entities: Fare Classes, Load Factors, Revenue Forecasts


These entity groups help maintain an organized and governed catalog, enabling efficient discovery, management, and usage of data assets. Let me know if you’d like a more detailed breakdown for a specific use case!



From Blogger iPhone client