HomeArticle

After obtaining hundreds of millions of financing, Mei Tao wants to talk about the entrepreneurial solutions of this generation of AI entrepreneurs.

于丽丽2024-12-25 17:15
When the waves come, jump along with them.

"Waves of the Undercurrent" exclusively learned that Zhixiang Weilai, an AI video generation startup, after completing the Pre-A round of financing led by Dunhong Capital, has newly obtained the A round of financing led by a state-owned capital fund mainly from Hefei Industrial Investment, with a total financing scale of several hundred million RMB. At the same time, institutions including the Anhui Provincial Artificial Intelligence Mother Fund and Hubei Changjiang Film Group Co., Ltd. also participated in the investment. Previously, it had received two rounds of leading investments from Alpha Startups and Iflytek.
Zhixiang Weilai is the world's first AI company to launch text-to-video generation. At the beginning of the entrepreneurship, the founder and CEO Mei Tao carefully calculated an account: From the perspective of dependence on computing power and resources, compared to large language models, multimodal models are an absolute dimensionality reduction; and from the perspective of commercialization possibilities, it can move earlier and faster. This seems to be a more rational and practical romance, but the reality is obviously more cruel than imagined.

From Sora at the beginning of the year, Keling in the middle of the year, to Google Veo 2, the video generation in 2024 has long been a fiercely competitive track, and its enthusiasm is no less than that of large language models.

Even so, entrepreneurship is still a temptation that this generation of AI researchers like Mei Tao find hard to resist - AI has never been so close to business and reality.

As a graduate of the University of Science and Technology of China, during his 12 years at Microsoft, Mei Tao reached the academic peak: He published more than 300 papers in the fields of multimedia analysis and computer vision, and won the Best International Paper Award 15 times. He not only became an IEEE Fellow and a foreign academician of the Canadian Academy of Engineering, but also the chief scientist of the Major Project of Artificial Intelligence in the Science and Technology Innovation 2030 Initiative of the Ministry of Science and Technology.

This experience also allowed him to see the gap from technology to product, and ultimately decided to bridge this chain. The five years after 2018 at JD.com were the beginning of Mei Tao's entry into the industrial field. As the vice president of JD.com and the vice president of the JD.com Exploration Research Institute, he embarked on the exploration path from technology to commercialization. Later, the Zhixiang Weilai he founded linked all these more closely together.

Mei Tao's entrepreneurial situation is very similar to a slice of AI entrepreneurs in this era: When embracing the product, one cannot give up the model, otherwise it is likely to be swallowed; when testing the domestic market, one cannot give up going overseas, because there are many dilemmas in the domestic consumer market that many startups cannot solve. As for financing, in the current cold capital cycle, it often means that entrepreneurs also need to feed back confidence to investors.

These also made Mei Tao realize the real difference between being a senior executive in a large company and starting his own business - the former is that there is always someone behind you; but now "there is no one behind you", "all problems will come to you, and you must handle them all."

The following are some insights and summaries of Mei Tao on financing, commercialization, and other aspects after more than a year of entrepreneurship:

The Video Generation Track Is Indeed Closer to Commercialization

1. Some time ago, Sora was officially released, but its overall functionality is similar to our expectations. Objectively speaking, in the current video generation field, OpenAI no longer has a great advantage. When it first came out, although it was just a demo, it changed the entire methodology; but as of today, from the perspective of product implementation, in fact, other products both overseas and domestic are almost the same.

2. Since the beginning of this year, the entire video generation track has become very crowded. In June, Keling and Luma AI were launched, and we also officially announced the launch of a new model at the World Artificial Intelligence Conference in Shanghai in July. August saw MiniMax Conch, and recently World Labs and Google Veo 2. They have even transitioned from image processing to the 3D field. The intense competition comes from the shorter commercialization path of this track and the faster product implementation speed compared to large language models.

3. The real income of global AIGC last year was approximately 20 billion US dollars, of which 50 - 60% of the income came from video and image generation, or tool-related income related to image and video; 30% is related to large language models, such as chatbot-type income. Therefore, many companies have begun to shift to this track, and it has become a must-win territory for large model companies.

4. For startups, we will not compete head-on with large factories like OpenAI and ByteDance. First, we need to have innovative algorithms that are unique. Second, we need to solve the last-mile problem in the subdivided industries and occupy the users' minds with products and closed-loop value. Large factories have the advantage of computing power, especially in C-end traffic, but they are responsible for the financial statements, so they will definitely focus on the business of the mainstream track, and the products must also serve the current mainstream products. For example, both ByteDance's Dream and Kuaishou's Keling must serve their existing creator ecosystems.

5. We will never do repetitive things on the path of large factories, but have our own professional and subdivided fields. Previously, we did 1 + 3 + N, referring to one large model, three core products, and many scene ecologies. Later, we will release a new multimodal understanding model benchmarking GPT-4o to make this "1" thicker and wider.

6. In terms of the model, we independently developed the world's first commercial large-scale video generation model with tens of billions of parameters benchmarking OpenAI Sora. We have the most comprehensive multimodal copyright corpora in China, hundreds of thousands of hours of copyright video materials, and tens of thousands of authorized IPs. It not only covers 70% of the domestic film and television data, but also has formed hundreds of millions of AIGC secondary creation materials, which are currently widely used in scenarios such as film and television, cultural tourism, and marketing. As of the end of November, it has cumulatively served more than 10 million users and more than 40,000 enterprises in more than 100 countries and regions, and the monthly recurring income has also achieved a large-scale growth.

7. At the same time, we are about to release a new model of a hybrid expert with a MoE architecture. During training, it not only has a DiT (Diffusion Transformer architecture), but also an AR (Auto-Regressive) architecture, and can combine the advantages of both, not only reflecting the visual generation effect of the DiT architecture, but also solving the discretization problem of Tokens in the AR architecture. Currently, we have verified this on images.

From the overall perspective of the model, we first did generation and then understanding. In the future, we will have a unified model architecture that integrates the understanding model and the generation model into a unified architecture, which is still in the experimental stage. Later, we also hope to turn the most comprehensive copyright video materials we have accumulated in China into an AI video search service.

8. In addition to large factories, the base large model companies that have transferred to this track also have their unique advantages, such as their experience in the architecture of a ten-thousand-card cluster. However, in terms of the technical route for video generation and the understanding of data, we, these multimodal native startups, are more vertical and specialized.

Also, the market in the video generation field is very large. Some companies are good at the animation style, some are good at the realistic style, and some are good at the film and television-level style and 3D. It is by no means that one manufacturer can do everything well, and different companies and users are not completely overlapping. Therefore, the crowdedness of the track will not affect us to move forward at our own pace.

2. The Tuition Fees Paid in the Commercialization Process

9. It is said that this generation of AI entrepreneurs must have both lofty ideals and a down-to-earth attitude from DAY1. Since the first day of our entrepreneurship, we have been very crisis-conscious and thinking about how to find PMF. We have moved relatively early and fast in commercialization. Although we did not raise the most money, every penny we spent and every person we recruited were well thought out.
10. This is also related to the training I received at JD.com. JD.com is a retail enterprise with a relatively low gross profit, so the company culture emphasizes refined operations. Many times, the boss will use extreme thinking, that is, to use the least resources to complete a business. In addition, the three elements of the product - cost, efficiency, and experience - are also repeatedly emphasized as indispensable. This holds true for any company and any product. Our company has made many attempts in commercialization and also paid some tuition fees, and gradually found the feeling.

11. When making C-end products, we must consider how to solve the problem of double non-hundred. The current AIGC products have two non-hundred problems: First, users cannot use the products 100% well, and second, the model cannot generate the effects that users expect 100%. Therefore, AIGC products currently need to cross two gaps: How to move from technical early adopters to professional users, and how to move from professional users to ordinary users. Our C-end products have a strong growth momentum and recently appeared on the list of the Potential Award for Export Products on the 2024 China AI Product List.

12. As for the enterprise end, when I was doing supply chain analysis at JD.com before, I learned that although there are a large number of Chinese enterprises, there are not many truly large-scale enterprises. In this case, it is still relatively difficult to get enterprises to "buy things". China's SaaS has also been difficult to break through, but the emergence of AIGC technology may change this situation.

13. For enterprise services, our KA customers are mainly some central state-owned enterprises and leading Internet enterprises. Last year, the commercial shooting product we made for brand merchants to be listed on shelves is called PixMaker. After the strategic upgrade this year, we began to produce marketing materials, especially to provide tools for the marketing production of short videos. Because we think that the largest industry related to AIGC is content production, and the largest part of content production is related to marketing. Currently, we have more than 40,000 cooperative small and medium-sized enterprises, and there are also more than 100 large enterprises. For example, the AI video ringtone we launched in cooperation with operators can turn our AIGC product into a truly national-level product.

14. In addition, we pay more attention to tooling and SaaS services. We think that there is an advantage in China that products can be built well by serving large customers first, and then going overseas to do SMB (small and medium-sized customers) services in reverse. The product logic of SMB and large C or professional individual users is basically the same, and there is no need for point-to-point services. We have several products that are doing well. In the end, we mainly did two things in commercialization: One is to provide a good creative platform and content ecology for creators, and the second is to produce good advertising content for brand enterprises that need to do marketing. In the future, we will also explore attempts from production to launch.

3. The Financing Solution for This Generation of Entrepreneurs

15. Not long ago, we received two financings. One is a market-oriented fund, and the other is state-owned capital. We combined the Pre-A round and the A round. The former is Dunhong Capital, a well-known leading fund in the cultural and technological field, and the latter is a state-owned capital fund mainly led by Hefei Industrial Investment, including the Anhui Provincial Artificial Intelligence Mother Fund, Hubei Changjiang Film Group Co., Ltd., etc. It is an indisputable fact that it is very difficult for current AI startups to raise money from US dollar funds. Therefore, we are now walking on two legs - negotiating with state-owned capital, and also negotiating with market-oriented and industrial capital.

16. When obtaining state-owned capital financing, I think it is necessary to consider whether the industrial direction that the government behind the state-owned capital is focusing on developing is consistent with the company's direction, and whether the company can be built into a leading or chain-leading enterprise. Nowadays, state-owned capital, such as Hefei Industrial Investment, also has a very professional perspective, view, and due diligence, as well as a market-based judgment. Moreover, state-owned capital also represents some industrial directions of the local government, and startups can also leverage this potential.

17. Last year, for our first round of financing, it came from a USTC alumni group called "Zhongheda". This group consists of about 100 people, basically composed of some entrepreneurs and scholars from USTC, who often organize alumni activities together and do some exchanges on entrepreneurship. It is 15 USTC alumni from this group who formed a partner LLP to support our first round of financing.

For a long time, the training model of USTC is to cultivate scientists who are biased towards mathematics, physics, and chemistry, the so-called "one academician out of a thousand students". But it is not so prominent in engineering and business. Therefore, they wanted to support one person to do this, and I just wanted to start a business. This money is called "Zhongheda Seed No. 1", and there may soon be Seed No. 2 and Seed No. 3, etc.

18. When we first started financing, some US dollar funds came in. They like big stories, the more ambitious the better; but later, after the US legal provisions came out, many US dollar funds dared not invest, and we switched to the RMB structure. Regarding whether it is US dollars or RMB, I think it depends on where your business is and where your customers are. If our business can really achieve globalization in the future, we can also go for US dollar funds, and the structure can also be adjusted.

19. Three years ago when you started a business, you could easily get 100 yuan; but now, 70% of US dollar funds are already impossible, and the remaining 30% of RMB is scattered everywhere. Maybe only a small part of it is an industrial fund, and now industrial capital has also become cautious. Currently, there are not many multimodal startups that can raise funds outside. If it were ten years ago, there would be at least a dozen or so. But the logic is the same. If there is no commercialization data, who will take over in the end? My past industrial experience tells me that a company must create real business value and create value for shareholders, otherwise the company is meaningless.

20. I often tell my investors when they can exit. I don't know how high our upper limit is, because many times it depends on the general trend and some contingencies; but I will tell him how high our lower limit is. That is, I will ensure that our company operates healthily and stably.

4. When the Wave Comes, Jump with It

21. Entrepreneurship makes me feel that my life has never been so complete. When being a senior executive in a large company, you only need to manage the technology or the team well. As for the strategy, there is still a boss behind you. Becoming an entrepreneur is different. There is no one behind you, and all problems will eventually come to you, and you must end them.

22. Everyone who joins a startup must first build up their psychological strength and think it through clearly. Otherwise, when encountering a little difficulty, they will think, why should I come to suffer this? I have experienced the transition from technology to product, and also did a period of commercialization in the middle, but when I really started a business, I found that more things are needed.

23. Around 2015, when the Four Little Dragons rose, I was still at Microsoft. At that time, many people asked me to start a business, but I didn't. One reason is that I felt that my wings were not yet fully fledged, and I could still move forward academically; the other reason is that I felt that the business model of that wave was relatively thin. I chose to come out in 2018 because I felt that I had some academic accumulation and wanted to go all in on a product.

24. At the Microsoft Research Institute, we often said that from a technology to a product, it may require a hundred engineers; to sell the product well, it may also require a hundred solution experts or BDs. It can be seen how big the gap is in the middle. At that time, I thought that I must find a place to bridge this chain. Later, when I went to JD.com, every technology I did was applied to the product. This process can be seen as: from technology to product, to a business line, and then to a company.

25. At that time, choosing the video track was also the result of a rational thinking. Last year, we judged that the competition in the large language model was too fierce. And the gap between domestic and foreign video generation is not large. In addition, in terms of the business model, the large language model is used in human-computer interaction and understanding, emphasizing accuracy, and hallucinations would be rather bad. While video generation is a digital creative industry, users are not so concerned about the hallucination problem. Our company was established in March last year, received the first sum of money in May, and the first version of the Zhixiang model was launched on the HiDream.ai website in August. At that time, we were the world's first AI company to launch text-to-video generation.

26. We do both models and applications. If we only do applications without independent research and development of models, it would be too thin and

Baidu
map