Data Discovery
Early in a product or feature, we often know we'll need to acquire data from some existing source. We need to know what data exists, where, and how to get to it.
Artifacts
See artifact examples for how you might capture this information in a markdown file.
Data Flows (optional)
Depending on the complexity of the system and number of sources, this can be a visual diagram, or simply a few text bullets. For few sources, skipping data flows is a valid option.
Data Mapping
List out each data element, and where it comes from. Organize as a table for each conceptual entity if there are many data elements.
Information to capture:
- Element: label we use in the context of our system.
- Data Type: the format of the data (Text/Number/Date/PDF/Image/etc). Be as precise as makes sense -
numbermay be sufficint for many use cases butfloatordoublemay be important in other cases. - Source System: where the external data lives using a meaningful system name. Could be a system, file store location, etc.
- Source Table/File/Field: where in the system does the element live? Could be a database table and column, or a CSV file and column number, etc.
- Notes: any useful additional info such as calculation logic or any interesting features.
Example Data Mapping
| Element | Data Type | Source System | Source Table/File/Field | Notes |
|---|---|---|---|---|
| First Name | Text | CustomerProfile | customer.fname | |
| Last Name | Text | CustomerProfile | customer.lname | |
| Account Balance | Number | Orders, Payments | (calculated) | Total of all order values minus all payment values |
Data Source Profile
The following basic table is filled out for each data source. This helps us capture how to get access to data.
| Data Source Name | Descriptive Name of Data Source recognizable to stakeholders |
|---|---|
| Interface | (database, REST API, SOAP/WSDL API, flat file, etc.) |
| Development | Notes on how to develop against this interface - how to mock, use a sandbox version, etc. |
| Authentication | High-level notes on how system-system authentication works |
| Points of Contact | System owner, technical/troubleshooting, data SMEs, other PoCs |
| Data Quality | Notes on timeliness, quality, integrity of data |
Example Artifact: Customer Balances
Data Flows
- Customer Balance App <- CustomerProfile
- Customer Balance App <- Data Warehouse <- ShoppingCart
Data Mapping
| Element | Data Type | Source System | Source Table/File/Field | Notes |
|---|---|---|---|---|
| First Name | Text | CustomerProfile | customer.fname | |
| Last Name | Text | CustomerProfile | customer.lname | |
| Account Balance | Number | ShoppingCart | (calculated) | Total of all order values minus all payment values |
Data Source Profiles
| Data Source Name | CustomerProfile System (CPS) |
|---|---|
| Interface | database/SQL in AWS Aurora (PGSQL) |
| Development | Have to create custom mock DB |
| Authentication | System-generated username/password credentials injected in app in pipeline |
| Points of Contact | Joe Doe, System Owner; Sara Sims, Data Architect/DBA |
| Data Quality | name fields are required and generally 100% present |
| Data Source Name | Data Warehouse/ShoppingCart |
|---|---|
| Interface | Elastcsearch web JSON API |
| Development | sandbox exists at dev.enterprisedw.thisorg.gov |
| Authentication | requires API key manually issued |
| Points of Contact | Jane Jackson, System Architect |
| Data Quality | hit or miss depending on which legacy ordering system it comes from |
Edit this page:
Page Source Code
|
Contributing Guide