A key data pipeline capability is to track data lineage, including methodologies and tools that expose data’s life cycle and help answer questions about who, when, where, why, and how data changes. Data pipelines transform data, which is part of the data lineage’s scope, and tracking data changes is crucial in regulated industries or when human safety is a consideration. Platforms that have data lineage capabilities include Alex Solutions, Alation, Atlan, Boomi, Collibra, Erwin, IBM, Informatica, Manta, Microsoft, Octopai, Oracle, Precisely, Secoda, Solidatus, SAP, SAS, and Talend. Other data catalog, data governance, and AI governance platforms may also have data lineage capabilities.
“Business and technical stakeholders must equally understand how data flows, transforms, and is used across sources with end-to-end lineage for deeper impact analysis, improved regulatory compliance, and more trusted analytics,” says Felix Van de Maele, CEO of Collibra.
The data ops behind data pipelines
When you deploy pipelines, how do you know whether they receive, transform, and send data accurately? Are data errors captured, and do single-record data issues halt the pipeline? Are the pipelines performing consistently, especially under heavy load? Are transformations idempotent, or are they streaming duplicate records when data sources have transmission errors?