The dangerous slippery slope of Artificial Intelligence and Machine Learning

As someone who works in the Artificial Intelligence and Machine Learning fields as part of our efforts for Online Shop, the expression "the road to hell is paved with good intentions" rings true.

The dangerous slippery slope of Artificial Intelligence and Machine Learning

As someone who works in the Artificial Intelligence and Machine Learning fields as part of our efforts for Online Shop, the expression "the road to hell is paved with good intentions" rings true.

We have seen dozens of new startups sprout that focus on specific areas of AI and/or ML. Some focus purely on voice, others purely on art, companies such as ours focus on more commercially led efforts, as explained in my previous blog entries and of course our White Paper, available to read here.

Even though our effort is not focused solely on audio and media, we thought it would be a novel concept to see what would be possible should we provide our 'engine' with enough data to see whether we can compete with services such as OpenAI and others.

I will not be explaining which models we created for this use case example, and how it works in detail to avoid duplication, however the outcome is perhaps reminiscent of building the first blocks for a Skynet like entity to prosper. Coupled with enough knowledge, results are beyond 'uncanny', they are to the untrained eye and ear, as real as the air they breathe.

Our 'engine' is primarily focused on learning from user behavior and adjusting code in real time to provide the most user intuitive experience to increase not only the experience but also help those that use our services to increase their overall conversion rate. Something that can only be achieved through exemplary understanding of behavior and psychology, to achieve that perfect Net Promoter score - what makes your customers add a certain product to their basket? What are your customers most interested in? Coupled with myriads of other data sets, and we have a successful model that can learn, adapt, change and improve, not only a users deployed shop instance but also itself. I provided a laymen explanation of how it all works in my previous entries which you can find here and here.

Now what happens when use learnings from the same model in expressionist mediums, that are based solely on feeling, emotion and experience rather than cold sets of data?

Look at the below painting:

And now take a gander at another one below:

One can be forgiven for mistaking the above two 'paintings' for the work of Van Gogh and Pablo Picasso.

Both pieces were done by our AI engine, in less than five seconds with no other prompt besides inputting the artists name. Using imagery data and collections of existing works from both artists, coupled with reinforcement to rank the most successful outputs by members of our team it took just over a day to clone the artistic prowess of both artists to output completely original art work.

Now lets take another famous artist, have you ever wondered how Rembrandt may paint a popular character, such as Batman for example? Now you can find out.

Famous for his attention to detail and dedication and timeless artworks, this one was generated in just under ten seconds.

It is very novel, however those that work in the field will soon start to realize that Murphy's Law is being enacted before our very own eyes. What starts as a novel concept for good, can quickly turn bad.

Art is a beautiful expression of one self, the environment, experience and so forth, but what about voice? It is after all something spy movies taught us to be 'hard to crack'. Not anymore.

As fun and novel as services such as 'FakeYou' and 'UberDuck' are, they have their limits and can be easily deduced to be computer synthesized. Now, what if an engine powerful enough exists that can replicate a voice to such a degree of certainty that it sounds 99.9% as good as the source material and audio data it was provided? From pitch all the way to the mannerisms of one self?

We put this to the test, and the results are scarily accurate. To a point where both myself and my co-founder Siraaj, are experiencing ethical dilemma of whether Artificial Intelligence and Machine Learning need to be reigned in and heavily controlled.

Watch the below video of what we have achieved:

Now this isn't our forte but rather a novel research attempt to see what is possible in other fields for our overall engine. I can only guess there are dozens if not hundreds of other startups which are solely focusing on voice synthesizing and replication which have already achieved similar if not better results than ours.

Any artist, voice actor, celebrity and historical figure can now be replicated with a high degree of accuracy. Besides the legal implications which will now arise of studios, developers and others using a voice actor's voice for a project without their consent, a more sinister future looms. I can assume that within the year we will have services which will allow anyone from the general public to easily recreate voices of anyone, trying to protect your service under 'Terms and Conditions' from being legally liable will not be a good enough defense, when thousands of new songs using the voices of artists such as 2-Pac, Eminem, 50 Cent or Lil Jon start flooding YouTube and other services.

Providing access to such tools and services to the general public is a real concern, not only for security but this will open the floodgates for scammers, fraudsters, stalkers and others. All one would need is just a thirty second clip of your voice to have it replicated with a high degree of accuracy, and you can only hazard to guess what they can do with it.

If you're in the public eye such as Jeff Bezos or Elon Musk, there is perhaps no way to protect yourself. Imagine someone who only works paycheck to paycheck, and has no common sense working at a heavily customer service focused company, such as let's say, Four Seasons receiving a call from John Davison. Would they question the voicemail or the call at hand? Give it another several years and real time voice synthenization will be possible. Scammers will have the ability to call unsuspecting victims, using familiar voice to improve their success rate. And disinformation tactics will be spread like wild fire.

One only needs to send 'purposed' voice clips to a tabloid newspaper to have someone easily discredited, and their reputation in ruins should a reporter fail to do their due diligence and chance clickbait, under journalistic protection.

What is it to stop a stalker from sending 'recordings' to the police as purposed as evidence? Not only can it be used to incriminate someone who is otherwise innocent, but it will also grind our existing legal processes to a halt. Investigators will now have to find new ways and tools to distinguish the real between the synthesized, and it is looking like at this moment, such will be impossible.

However not all is bad I suppose, as creators will now be able to use their voices for easier content production or even creating original voices for their projects such as video games by indy developers who cannot afford voice actors. Or re-creating historical figures and events by educational institutions for purposes of education. However such use cases should not be used to have it be readily available for anyone to use and misuse. Not only will this destroy jobs but it will heavily affect the economy itself.

It is a very dangerous precedent which cannot be guarded against. As we were doing research on best ways to prevent your voice from being captured and synthesized we discovered that it is a fruitless endeavor, as much as we can throw resources at adding invisible whitenoise, pitch modulation and even working with manufacturers who produce cameras and microphones to implement such technology, it can only prevent your voice from being captured 'real time'.

With advancements in robotics, accessibility to 'deep fake' technology becoming ever more prevalent, voice and visual replication it is not only a national security threat but also one that is certainly going to be misused to prison political opponents by corrupt governments run by despot leaders. We already can see the pitch mob mentality which is evoked through hive mind thinking, of doing first and asking questions later on social media channels, where many will jump to conclusions before doing any due diligence or research.

Protestors who go against unfair treatment and injustices can be arrested and using such technology have replicated voice notes be played back, to incriminate and to jail. No one is safe.

With the amount of content and data that is being uploaded and shared, replicating ones personality is becoming easier. The only way to protect one self is to use AI to change ones voice at any public capacity or engagement, and don a mask. However, for many this is already too late.

A Blade Runner 2049 like future is coming sooner than one might ponder and think. Perhaps a T-1000's is already in the works.

A line needs to be drawn and such things outlawed, or heavily controlled and used by research institutions only with no public access which will only lead to abuse and nefarious use for fraud, incrimination and scams.

Laws are too slow to catch up with technological advancement and it's misuse. Right now, it is up to the founders and developers of such technologies to draw the line and stop, however monetary incentives are too great to ignore for many who are chasing a better life and infamy. But one should ask themselves whether what they are doing is right, or is it just an excuse to win market share at any cost? Perhaps when it happens to them, will they rethink of their work and what they have done.