Peter Ludwig is the Co-Founder and CTO of Applied Intuition, a Silicon Valley-based AI and vehicle software supplier.
Imagine a world where smart machines zip around our cities without anyone behind the wheel. Traffic jams, accidents and fatalities are things of the past. These self-driving vehicles would not only safely transport people and goods, but they would also handle heavy tasks like farming, mining and building homes.
This future has been a dream since even before the famous DARPA Grand Challenge that jump-started the race for autonomous vehicles in 2004. Thanks to the latest breakthroughs in machine learning (ML) and artificial intelligence (AI), this dream is becoming a reality.
Then And Now
If machine learning has existed since the 1950s, why is today any different? The change comes from new ways of designing AI models, better techniques for handling data and a huge increase in computing power.
In the past, adding more data to a machine learning model only helped up to a certain point. But in 2017, a new kind of AI model called the transformer was introduced, removing previous limitations on how much a model could learn.
Now, the more data you feed these models, the better they become. Instead of training on millions of data points—the “big data” of the 2010s—researchers can now use trillions of data points collected from across the internet.
However, bigger models and more data require more computing power. To meet this need, companies have built massive data centers filled with thousands of specialized chips designed for AI tasks. These advancements have ushered in a new era for machine learning: the age of the “foundation model.”
The Foundation Model Era
Previously, if you wanted to train a machine learning model to do a specific task—like recognizing pedestrians in car camera images—you had to collect and manually label thousands or even millions of real-world examples. The model would learn by being shown pictures with and without pedestrians and adjusting itself to make correct classifications. Once trained, the model was fixed in its behavior; if you asked it to identify a bus in an image, it couldn’t do it.
Foundation models change this by training on simpler, more fundamental tasks using much larger amounts of data. In our pedestrian example, the task might be filling in a missing part of an image. For language models, it’s predicting the next word in a sentence.
Training models this way makes them highly adaptable. A foundation model trained on all the images on the internet could identify not only pedestrians but also traffic cones, strollers, dogs, cats, coffee mugs—you name it. Many of these models can also handle multiple types of data at once, like images and text.
This newfound versatility means that creating models for a particular task has become much cheaper. Instead of needing to see thousands of training examples, a large foundation model can learn only from a few examples of what you want. With just a few “shots,” the model can perform specialized tasks like translating a language or identifying images of buses.
These new capabilities, especially in handling different types of data simultaneously, may lead to a future where we can interact directly with machines. Instead of programming them with code, we’ll simply ask for help in plain language, just as we would with another person.
Even better, these machines may be able to respond and explain their thoughts and actions. For example, soon we may able to tell our self-driving cars our preferences for driving style, temperature, music and more, just like we would with a human driver.
The Next Generation Of Autonomous Vehicles
Recent advancements in AI models, data and computing power have also brought significant changes to the development of self-driving cars, leading to what’s being called AV 2.0. For most autonomous vehicles, there are four main components:
1. Perception: What’s around me?
2. Localization: Where am I, based on what I see?
3. Planning: Given where I am and what’s happening around me, how do I get to my destination?
4. Controls: How do I operate the car’s accelerator, brakes and steering to follow that path?
In the earlier “AV 1.0” systems, only the perception part used machine learning, while the other parts relied on manually written rules. In “AV 2.0,” every component uses machine learning. This not only enhances the car’s ability to see but also allows it to behave more naturally in situations where strict rules might fail—for example, crossing a solid yellow line to go around a double-parked car.
However, adding more machine learning models to the self-driving system brings challenges in testing and verifying these vehicles.
Companies currently rely heavily on simulations to ensure that new versions meet a wide range of requirements. AV 2.0 systems are more sensitive to differences between real-world data and simulated data, so simulations need to be as realistic as possible. Instead of using hand-built 3D environments and pre-programmed vehicle behaviors, future testing will need to use advanced machine learning techniques to create highly realistic and scalable simulations.
The Future Of AI In Vehicles
It’s crucial for car manufacturers and the wider vehicle industry to adopt AI technologies in their development processes and products. There’s enormous potential for improved autonomous driving capabilities, better interaction between humans and machines and increased productivity for developers.
Just as software revolutionized many industries, AI is set to do the same—but even faster. Companies that quickly embrace these technologies may have a first-mover advantage and the chance to set industry standards. Those that delay may quickly fall behind, as their products will lack features compared to competitors.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
link
