Because data is critical to AI algorithms it must be handled with the necessary rigor and transparency.
The meta data about the data which is used to train an algorithm needs to be carefully documented, including time of data collection, the collection method and device (if applicable), the location of the collection, if and how the data has been cleaned and validated, intended uses, maintenance, life cycle etc.
The article "Datasheets for Datasets" by Timnit Gebru et al. motivates and introduces this approach.
Such a manifest provides auditability and transparency and lays the foundation for responsibility and accountability.
It is important to incorporate the notion of the meta data manifest in machine learning tools and workbenches automate the meta data collection and maintenance wherever possible.