Data Discovery

Early in a product or feature, we often know we'll need to acquire data from some existing source. We need to know what data exists, where, and how to get to it.

Artifacts

See artifact examples for how you might capture this information in a markdown file.

Data Flows (optional)

Depending on the complexity of the system and number of sources, this can be a visual diagram, or simply a few text bullets. For few sources, skipping data flows is a valid option.

Data Mapping

List out each data element, and where it comes from. Organize as a table for each conceptual entity if there are many data elements.

Information to capture:

  • Element: label we use in the context of our system.
  • Data Type: the format of the data (Text/Number/Date/PDF/Image/etc). Be as precise as makes sense - number may be sufficint for many use cases but float or double may be important in other cases.
  • Source System: where the external data lives using a meaningful system name. Could be a system, file store location, etc.
  • Source Table/File/Field: where in the system does the element live? Could be a database table and column, or a CSV file and column number, etc.
  • Notes: any useful additional info such as calculation logic or any interesting features.

Example Data Mapping

Element Data Type Source System Source Table/File/Field Notes
First Name Text CustomerProfile customer.fname
Last Name Text CustomerProfile customer.lname
Account Balance Number Orders, Payments (calculated) Total of all order values minus all payment values

Data Source Profile

The following basic table is filled out for each data source. This helps us capture how to get access to data.

Data Source Name Descriptive Name of Data Source recognizable to stakeholders
Interface (database, REST API, SOAP/WSDL API, flat file, etc.)
Development Notes on how to develop against this interface - how to mock, use a sandbox version, etc.
Authentication High-level notes on how system-system authentication works
Points of Contact System owner, technical/troubleshooting, data SMEs, other PoCs
Data Quality Notes on timeliness, quality, integrity of data

Example Artifact: Customer Balances

Data Flows

  • Customer Balance App <- CustomerProfile
  • Customer Balance App <- Data Warehouse <- ShoppingCart

Data Mapping

Element Data Type Source System Source Table/File/Field Notes
First Name Text CustomerProfile customer.fname
Last Name Text CustomerProfile customer.lname
Account Balance Number ShoppingCart (calculated) Total of all order values minus all payment values

Data Source Profiles

Data Source Name CustomerProfile System (CPS)
Interface database/SQL in AWS Aurora (PGSQL)
Development Have to create custom mock DB
Authentication System-generated username/password credentials injected in app in pipeline
Points of Contact Joe Doe, System Owner; Sara Sims, Data Architect/DBA
Data Quality name fields are required and generally 100% present
Data Source Name Data Warehouse/ShoppingCart
Interface Elastcsearch web JSON API
Development sandbox exists at dev.enterprisedw.thisorg.gov
Authentication requires API key manually issued
Points of Contact Jane Jackson, System Architect
Data Quality hit or miss depending on which legacy ordering system it comes from

results matching ""

    No results matching ""