Machine learning is a method of data analysis that automates analytical model building. Algorithms are used to iteratively learn from data, allowing computers to find patterns without being programmed exactly where to look.
The iterative aspect of machine learning means that these analytical models can adapt over time, arriving at more and more refined and reliable answers. The field isn’t new, but it’s gaining new momentum as scalable and cost-effective computing power has allowed the application of more and more complex algorithms to be applied to bigger and bigger data sets.
And as you’ll see in this addition, the data sets can be almost anything…
Editor’s Note: If you want a refresher on how the different technologies we’ve covered compare and intersect, this primer by journalist Michael Copeland for Nvidia is an excellent starting point: What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?
In a very ‘meta’ analysis, the MonkeyLearn blog applied machine learning to coverage of startups in hundreds of thousands of articles from three of the most popular news outlets that cover them: TechCrunch, VentureBeat and Recode. The goal was to discover cool trends and insights. They asked interesting questions like:
- What are the hottest industries for startups right now?
- Do machine learning startups get more press than fintech startups?
- What is the startup segment with most acquisitions?
While this series of articles is most valuable for showing you, step-by-step, how you can apply machine learning to almost any data set, it does bring to mind a future scenario, where machine learning systems are generating startups which are then evaluated by other AIs and then a third group invests in them. The potential future of human obsolescence knows no bounds.
When I raised concerns about problems I would often hear the phrase “that’s just an engineering problem; let’s move on.” I later realized that’s code speech for “I don’t think a paper mentioning this would get through the peer review process.”
– Denny Britz
Author Denny Britz makes a personal and impassioned observation that the reason that deep learning is not moving forward fast enough is…a lack of engineering rigor. Britz makes the case that too often, academic deep learning researchers build their research on top of previous research instead of shared knowledge. This protects their exclusivity, but often means that anything outside of your respective bubble takes too much work in recreating code to be worthwhile. Which means that deep learning researchers are recursively relying on their own previous mistakes without enough outside rigor, or the attempts to re-engineer others’ work introduces unknown new mistakes.
Ironically, the consequences of this unintentional recursion show up in even the bread-and-butter of artificial intelligence research. The fact that Google’s AI sees dog faces everywhere should be a warning to humans doing research in the exact same field.
At its core, the web is trust-based: the details are in how computers can trust other computers, and increasingly, how people can trust other people. While we hear a lot about the many malicious actors that are working on spoofing both kinds of trust, this research from the University of Chicago was refreshing. It shows that while the economics of the internet and machine learning can enable bad actors, those same economics can lead to solutions that stop them.
The researchers looked at a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. These attacks are cheap and therefore scalable. They also add in tricks to prevent detection, such as modulating the rate of output and using a multi-stage review and editing process to make the reviews essentially indistinguishable to both state-of-the-art statistical detectors and human readers.
Happily, they go on to develop novel automated defenses against these attacks by leveraging the lossy transformation introduced by the RNN training and generation cycle. They even look at countermeasures against their measures, and show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers.
If you’ve come this far, you’ve earned this. In the last few days, news coverage about the existence of FakeApp, a piece of deep learning software that makes it very easy to map anyone’s face onto almost any video, with startlingly convincing results. The developer of the software claims he’s not a professional researcher, just a programmer with an interest in machine learning.
“I just found a clever way to do face-swap,” said the creator of the software, referring to his algorithm. “With hundreds of face images, I can easily generate millions of distorted images to train the network. After that if I feed the network someone else’s face, the network will think it’s just another distorted image and try to make it look like the training face.” He may be using an algorithm similar to one developed by Nvidia researchers that uses deep learning to, for example, instantly turn a video of a summer scene into a winter one.
Like many new technologies, it’s unclear whether the results will be useful, nefarious, or simply hilarious. If we’ve learned anything so far, it’s probably all three.