Li Liyun from Xiaopeng Intelligent Driving: We are on the eve of unmanned driving, and there is still a great dividend in the Intelligent Driving Scaling Law | An Exclusive Interview with 36Kr.
Written by Li Anqi
Edited by Li Qin
After test-driving Tesla FSD in the United States, Li Liyun, the person in charge of Xiaopeng's intelligent driving, has two major feelings. One is that Tesla FSD is indeed outstanding in North America, but he also realizes that the myth of Tesla needs to be broken.
"The road conditions in China and the United States are very different. We have a better understanding of China's road conditions and the driving habits of Chinese people, while Tesla may have a better understanding of the driving habits of Americans. It's hard to say who is stronger," Li Liyun said.
The competition in the intelligent driving field is not new. In the past year, car companies such as Xiaopeng have quickly been involved in the intelligent driving competition of "able to drive nationwide" and "end-to-end".
At the end of December this year, Xiaopeng also plans to deliver the "parking space to parking space" intelligent driving function, and the technical model has also been upgraded to "one-stage end-to-end", which can achieve full-scene coverage, including low-speed driving in the park. In the future, it will cover parking, high-speed scenarios, and even overseas intelligent driving.
"Parking space to parking space" is the technical highland of the current intelligent driving competition among car companies. As the literal meaning implies, car companies hope that users can start the intelligent driving function in the parking space, and the vehicle can cruise at low speed, pass the gate independently, enter the highway, drive on urban roads, and finally park in the destination parking space.
In early December, the updated version of Tesla FSD v13.2 pushed to some test users by Tesla includes the FSD that starts from the parking state, that is, the "parking space to parking space" function. Not long ago, Li Auto also announced the highlighting of this function, and similar functions of Huawei and Xiaomi cars are also about to be launched.
The process from intelligent driving to autonomous driving is not achieved overnight. Instead, car companies continuously complete the puzzle of high-speed, urban, parking, and park scenes through technical breakthroughs. Now, this step is about to be completed.
Li Liyun also said that Xiaopeng is now on the "eve of unmanned driving", and there will be a greater breakthrough in capabilities next year. "It may be on the Max model first, and Xiaopeng's intelligent driving will gradually provide capabilities similar to unmanned driving."
This is almost the earliest domestic car company to give the node of unmanned driving.
Xiaopeng is indeed one of the earliest domestic car companies to be labeled with "intelligent driving". In 2019, Xiaopeng developed the high-speed NGP function on the coupe model P7, targeting Tesla NOA.
However, due to its early start, Xiaopeng has almost gone through all technical routes, from relying on high-precision maps to turning to a map-free solution, and then evolving to end-to-end. In August 2023, Xiaopeng's intelligent driving also experienced the departure of the key figure Wu Xinzhou and team turmoil. Thus, Xiaopeng's intelligent driving entered a defensive stage.
During the same period, some players in the industry quickly entered the mass production state of high-level intelligent driving. For example, Huawei, which invests intensively, took the lead in launching the intelligent driving function of "able to drive nationwide"; Li Auto, due to its light burden, decisively cut into the "end-to-end" technology, and even directly switched to the aggressive "one-stage end-to-end" solution.
Since the middle of this year, Xiaopeng's intelligent driving has been trying to return and counterattack.
Recently, Li Liyun, the person in charge of Xiaopeng's intelligent driving, was interviewed by 36Kr. He shared how Xiaopeng integrates AI into intelligent driving, and how to achieve technological equality and a business closed loop after equipping with the standard pure vision intelligent driving.
Li Liyun told 36Kr that Xiaopeng's research and development of end-to-end began in April 2023. At that time, the main purpose was to use the road recognition ability of AI to get rid of the reliance on high-precision maps.
As a result, Xiaopeng found that after the "one-stage end-to-end" large model was launched, problems that originally required a long time to polish, such as special diversions, right turns or U-turns, were quickly solved.
In May this year, Xiaopeng officially launched the "end-to-end" intelligent driving large model, including three parts: Neural Network XNet, Regulation and Control Large Model XPlanner, and Large Language Model XBrain.
One of the core resources that end-to-end intelligent driving relies on is data. Xiaopeng claims that the training data volume of its end-to-end large model has reached 20 million clips. Li Auto has also publicly disclosed its end-to-end training data, which is currently about 8 million clips.
Regarding the data starting point of 20 million clips, Li Liyun told 36Kr that this is due to the accumulation of intelligent driving experience based on rules in the past. Xiaopeng's current data collection and training efficiency is very high. For example, through real-time labeling of vehicle-end rule experience, the required segments for training are accurately collected, thereby conducting precise and intensive training for the target scene.
He also believes that under the massive data and the continuously improving computing power of the cloud and the vehicle end, there is still a great dividend in the Scaling Law of autonomous driving.
Scaling Law is a law in the large model industry, which usually means that the larger the model parameters, the larger the data set, and the more computing resources, the better the model performance.
The landmark achievement of Xiaopeng's launch of the "end-to-end" large model intelligent driving solution is to more decisively get rid of the reliance on lidar and use pure vision as the intelligent driving technical route. For example, on the new car P7+, Xiaopeng has equipped the pure vision intelligent driving AI Eagle Eye solution as standard.
However, in the industry, the mainstream choice in the next few years is still to use vision + lidar as the main sensors. In this regard, Li Liyun told 36Kr Auto that taking the pure vision route is actually based on the first principle.
In his opinion, the traffic environment in which people live, including road signs, traffic lights, the entire road, and even the shape of the car, are all designed for people. "The most important sensor for people is the eyes, so pure vision must be the most direct and efficient solution."
Xiaopeng also claims that its pure vision solution based on AI Eagle Eye can achieve a better effect than human vision and can well handle large light differences and backlight scenes.
On the road of pure vision intelligent driving, Xiaopeng Automobile has decided to go all the way. Li Liyun told 36Kr that starting from the P7+ model, there will no longer be a version distinction between Max and Pro, but "all models are equipped with Max, that is, Xiaopeng AI Turing Intelligent Driving (NGP), and all will use the pure vision solution."
This contains a clear intelligent driving commercialization strategy. "We will use the one-stage end-to-end to amplify the advantages of our vehicle models. Not only are all vehicle models equipped as standard, but we also hope to cover all functions, achieve point-to-point, including enabling intelligent driving overseas. The pure vision intelligent driving solution undoubtedly has an inherent advantage in cost.
In fact, by adopting the horse racing strategy of making "intelligent differentiation" in the same-level vehicle models, Xiaopeng has tasted the sweetness in the two new cars MONA M03 and P7+.
Since its launch in September, M03 has sold over 10,000 units for three consecutive months, and P7+ received more than 30,000 pre-orders within 3 hours of its launch. The prices of these two cars are both below 200,000 yuan, which is the dominant area of established car companies such as BYD.
"Intelligent driving is the top reason for Xiaopeng P7+ users to buy a car," Li Liyun said. The parking experience of the current hot-selling M03 is also at the same level as the Max version of intelligent driving.
Intelligent driving technology has become more and more deeply embedded in the business system of car companies. At the same time, it is also undertaking the development dividend of the entire AI technology and rapidly iterating. This is the product of the interweaving of resources and efficiency.
Li Liyun said that if the end-to-end technical trend is regarded as an "industrial revolution" in the field of intelligent driving, in fact, only a few enterprises can truly achieve transformation and upgrading, and most people will still face more cruel competition.
Because end-to-end does not make things simpler, but makes the entire iterative chain longer and more uncontrollable. To some extent, it requires more resources, including greater computing power and more elite AI researchers.
"I prefer to believe that it is the car companies, rather than the suppliers, that can take the lead in breaking through from assisted driving to unmanned driving."
The following is a conversation between 36Kr Auto and Li Liyun, the person in charge of Xiaopeng's intelligent driving. The content has been slightly edited:
「On Experience: The Competition of "Parking Space to Parking Space", Users Will Pay for a Better Experience」
36Kr Auto: I heard that you just came back from the United States. Have you experienced Tesla's FSD v13?
Li Liyun: Unfortunately, since v13 has not been mass-produced and pushed to users, I didn't have the opportunity to test FSD v13, but I drove v12.5 for a week. There are two points to summarize. The first is to break the myth, and the second is that Tesla is outstanding in North America, with many aspects worth learning, including activating in the parking lot and driving into the parking lot, which are functions that users really like.
In terms of breaking the myth, I think the road conditions in China and the United States are very different. Whether it is our XNGP 5.4.0 or the upcoming 5.5.0 version, the intelligent driving is on par with Tesla. We have a better understanding of China's road conditions and the driving habits of Chinese people, while Tesla may have a better understanding of the driving habits of Americans. It's hard to say who is stronger, and I'm very looking forward to the performance of FSD after it enters China.
36Kr Auto: The industry will say that Tesla's technology is half a year to a year ahead of domestic technology. How much do you think this technological lead is now?
Li Liyun: In terms of technical methodology, Xiaopeng coincides with global AI companies such as Tesla and OpenAI. We emphasize the cloud-end large model, and the parameter quantity is more than 80 times that of the vehicle end. The training data volume of our cloud-end large model reaches more than 20 million clips. Each clip can be understood as a small movie of about one minute. Each model is trained in the cloud based on a large amount of data. We have a saying that "one day in the cloud is equivalent to three to five years on the ground".
We hope to distill the cloud-end capabilities to different chip platforms, and even after the chip is replaced in the future, the cloud-end model can be deployed.
36Kr Auto: Do you think there is a difference between the "parking space to parking space" on Tesla FSD v13 and the "parking space to parking space" in China?
Li Liyun: On the upcoming XOS 5.5.0 version that we will push to users, we use the "one-stage end-to-end" to provide users with a complete "parking space to parking space" experience. In fact, in the parking lot and underground garage, we launched the VPA (Memory Parking) function in 2021, but the user experience was not good, and the user penetration rate was not as high as that of urban intelligent driving.
Therefore, after we use the "one-stage end-to-end" to achieve standard configuration for all vehicle models, we are now realizing full-scene coverage, including low-speed driving from "parking space to parking space" and driving in the park. In the future, it will also cover parking, high-speed, and even overseas intelligent driving. We hope to bring users a more coherent and smooth experience, rather than a fragmented experience similar to the combination of VPA + NOA.
36Kr Auto: When users drive to a new underground garage for the first time, do they need to remember it once to achieve point-to-point?
Li Liyun: For the first time, there is definitely no map, which is very difficult. It would be better if there is navigation guidance in the park. Just like when you go to an underground garage for the first time, you also need some guidance or the memory of others. We can achieve seamless learning and memory. The second time you go, you can have a smooth "parking space to parking space" capability.
36Kr Auto: Xiaopeng previously had memory driving, such as remembering 10 routes. What is the essential difference between this and "parking space to parking space"?
Li Liyun: The biggest difference is still at the experience level. The two biggest changes in Xiaopeng's intelligent driving in 2024 are, first, AIization, organizational change for AI, and AI capability enhancement. The other change is that we think users will not pay for better technology, but will definitely pay for a better experience.
After empowering "parking space to parking space" with the end-to-end large model, the user experience will be greatly improved. Although sometimes it may need to learn and refer to prior information, the sense of learning will be very weak.
「On AI: The Scaling Law of Intelligent Driving Has Not Reached Its End」
36Kr Auto: The intelligent driving industry has changed a lot this year. At the beginning of the year, it was "able to drive nationwide", then it became end-to-end, and now it is "parking space to parking space". How do you view this competition? What will be the focus of competition next year?
Li Liyun: Originally, Xiaopeng's intelligent driving was a lone warrior, but now it is a fierce competition. The competition next year will definitely be more intense and interesting.
This year, from "able to drive nationwide" to gradually moving towards end-to-end, the competition is about the gameplaying ability and the competition for anthropomorphism. After reaching "parking space to parking space", I think we are in the "eve of unmanned driving" stage, and Xiaopeng's intelligent driving will firmly move towards unmanned driving.
We hope to make a breakthrough in capabilities next year. It may be on the Max model first, and Xiaopeng's intelligent driving will gradually provide capabilities similar to unmanned driving, with an unlimited reduction in the number of takeovers. With the advancement of regulations or the launch of new models, we will definitely develop the ability to move towards unmanned driving. Xiaopeng's ultimate goal is to free the driver's hands and energy.
36Kr Auto: How to achieve this? Is Scaling Law effective in the field of intelligent driving?
Li Liyun: Xiaopeng is a loyal believer in Scaling Law. Although many recent AI frontier works have said that Scaling Law and pre-training have encountered bottlenecks. However, in terms of computing power and data volume, I think there is still a great dividend in the Scaling Law of autonomous driving.
In terms of data, there will be a bottleneck when the high-quality data on the Internet reaches about 600 - 700T. However, most intelligent driving enterprises have only gradually shifted to end-to-end this year, and the accumulation of a large amount of high-quality driving data for autonomous driving has not reached the end.
In addition, both the computing power of the vehicle end and the cloud end will have a more significant growth, and the model parameters will also be further expanded. I think the Scaling Law of autonomous driving has not reached its end.
36Kr Auto: There are some practices in the industry, such as integrating the end-to-end large model with the visual language model. Do you agree with this view?
Li Liyun: I don't object. These views describe end-to-end from different perspectives, and in my opinion, they may lead to the same result. Whether it is a visual language large model or a visual action large model, the essence is to achieve a human-like driving model through a large amount of data input and a certain reasoning ability.
Xiaopeng focuses on vision. We have achieved "photon in, control out". Of course, from the perspective of multi-modal input, vision is only one sensor, and there are various lanes, GPS, or various multi-modal inputs themselves.
The massive amount of data on the Internet will give the model a stronger cognitive ability to recognize characters and roads. However, we believe that having high-quality and rich driving data can better solve the cerebellum problem of vehicle driving, which does not conflict with the large model.
Many driving actions are relatively instinctive, just like the human cerebellum, but of course