The need for provenance in artificial intelligence

There is no doubt that AI — or whatever people mean when they recklessly toss around words like machine learning — will only rise in prominence in both the near — and long — term future. More domains are taken over by intelligent software that first automate basic (read slow and boring) operations and then completely taking over the previous approach and performing the new skill the machine learned by just observing the training data.
These are amazing times for deep learning researchers who more and more often take the credits of creating an AI that reads X-rays better than medical doctors, when it clearly was a bunch of linear algebra operations.
Amidst the frenzied enthusiasm of AI — or what some prefer to call machine learning — several issues need to be addressed, promptly addressed. I’ll briefly touch on some of the ones in dire need of intervention by organizations (usually the data owners) and the community of data scientists.

AI without identity

If it’s true that an AI will soon automate the critical parts of a business (and we are all making sure it is), then AI must have an identity, much like a medical doctor, professor, banker or office clerk do. Identifying an AI allows its users to refer to it uniquely, invalidating it as soon as the AI is engaged in nasty behavior. After all, AI (or machine learning models) are known to have gone wrong numerous times.
Essentially machine learning mistakes could be caused by several factors such as low quality data, poor training (or data over-fitting). They can also be forged by data scientists with malicious intentions.
We still have to face the era of “attacking machine learning for fun and profit” and we are definitely not prepared for it.

Reputation of data owners and scientists

Another issue to deal with in the immediate future is about the reputation of data owners and data scientists. A misbehaving AI — even one with an identity — cannot and should not be sent to a court or in front of senators of the House and Senate to testify why a few thousand artificial neurons decided to disclose the private data of 100 million users to some data brokers, and then activated the microphones of their mobile devices during their most intimate moments.
And let’s face it, can we really expect politicians to cut through all that technical jargon?

Reputation of data

The third issue is about reputation of data.
Already, several efforts have been taken in that direction. Needless to say that any machine learning model is as good as the data it has been trained on. Garbage-in-garbage-out is the most succinct explanation of how important data is to machine learning. The need of a reputation system for data is twofold: it can prove how good or bad an AI (or machine learning model) is; it makes it possible to incentivize data owners whenever their data has been used to build that AI.

While a global AI regulator does not, and hopefully will not ever, exist we strongly believe AI should be traceable for users.
As AI belongs to the people who created it and use it every day to improve their lives or their businesses, traceability and regulation will help bring a safe AI closer to reality and prevent the spread of malicious or misbehaving AI.