UX AI Intern
Informatica
12 Week (JUN - AUG 2022)
Ranjeet Tayi, Jill Blue Lin
Data lineage is the process of understanding, recording, and visualizing data as it flows from data sources to consumption. It shows where the data originated, how it has changed, and its ultimate destination within the data pipeline.
Apart from helping organizations keep a clear record of data’s movements and transformations, it is also used for:
To make sure data elements in your report are trustworthy
To check data stores for personal user information and ensure data privacy law compliance.
To analyze impact of changes you make upstream or downstream in your data system
Automated data lineage process relies on creating connection assignments by parsing code logic. However, when the code logic or parser is unavailable, it creates a gap in the data flow diagram, especially in legacy systems without custom data lineage scanners.
Reduce the hassle of documentation and assignment through third party apps through in app project creation and assignment.
A clear job summary with recommendations and it's breakdown for easier project planning and management.
Drill down from data set level to data column level recommendations to take decisions at different lineage levels.
Recommendation card provides prediction parameters and a confidence score to supplement decision making and build trust with the user. Action on a recommendation refines the ML model.
Lineage view toggle from map to list view enables quick search and comparison of recommendations for accurate decision making.
The comments panel provides a space to discuss and consult with other data stewards about a recommendation decision within context.
The Inferred lineage toggle indicates that there is some ML based inferred lineage documentation curated by the data steward.
This project utilized the 4D Design Process, a converging and diverging approach which consisted of the following phases
At the start of the project, I met with stakeholders to understand the business goals of the project. The following business goals allowed me to define my own design goals:
Due to time and budget constraints, direct user engagement was not possible during the concept development phase. Instead, I relied on the project manager's and fellow designers' expertise to understand the user. I conducted three stakeholder interviews with the following research goals:
Machine learning approach is hard to debug
Share prediction details with users to build trust
Data Lineage is used for critical business decisions
Distinguish derived vs. inferred to show information reliability
Data systems too vast for one person to know entirely
Inferred lineage needs to be a collaborative tool
Stakeholder interviews and secondary research was defined in three role based user personas: Data Catalog Administrator, Data Steward, and Business Analyst.
To validate my understanding of the scenario, the storyboard served as a tangible representation, facilitating communication and shaping the project's direction.
With the scenario and use cases in place, I then explored ideas for live collaboration, recommendation display, CLAIRE representation, and differentiation of derived and inferred lineage.
Due to time and budget constraints, user interviews were not possible at the start of the project. Therefore, I used the customer validation call to get more insights into the lineage systems and users. I spoke with two clients, where I examined their existing data systems and lineage in their real-time environment in the first half of the call, and then shared my ideas and prototype concepts; collecting feedback.
To garner feedback on my concepts, I met with data stewards from two clients: Elevance Health (an insurance provider) and Thrivent (a not-for-profit financial organization). Talking to them I got the following feedback -