#11 Unique challenges and opportunities in China's professional data and information industry (FULL)
This is a consolidation of a collection of essays I ran here at Data Currency, the only newsletter on Asia’s Professional Data and Information (PD&I) industry.
In the collection, I review China’s PD&I industry, one of the very last industries that remains at around 1/100 of Western peers. I explore why it is still so small. But as an entrepreneur, I always believe in the simple truth that with every challenge comes an opportunity, so when I describe these challenges, I will not just talk about the problems but the opportunities they represent.
Specifically, I have identified seven “pairs” of challenges and opportunities. Pivot to whichever one appeals to you.
A highly digitalized society, but data is under-monetized
Data walled gardens
China’s “low-trust” society
Client’s lack of willingness to pay for data and research
Capital market mismatch
Not enough time to develop benchmark-level brands, but new opportunities also arise from new technologies
Too clear a separation between data and media
Here we go.
Part 1: A highly digitalized society, but data is under-monetized
China’s real market economy was really kick-started in the 1990s and grew simultaneously with the Internet and digital technologies. Many industries were built with digitalization from the very beginning. This created a unique situation where China could be more advanced than advanced economies in terms of digitalization.
For instance, when I lived in Hong Kong, an advanced economy, I still regularly used cash to make payments. When trading stocks, the physical stock certificate was still a thing. Once, I carried with me an unnamed stock certificate worth north of $100 million in my bag for my then-boss to move from one stockbroker to another. (What a thrill to feel richer than a drug dealer!)
Yet, by that time, mobile payment was already widely adopted in the Mainland, and only extremely few people knew that stocks actually used to exist in physical forms.
This phenomenon is prevalent in every industry, be it marketing, transportation, retail, catering, logistics, automobiles, factory equipment, or cross-border trade.
Data is everywhere.
This early proliferation of digitalization and big data supposedly should provide ample resources for the market intelligence and DaaS industry, and theoretically, we should have had a much bigger opportunity than in the West.
In the West, this industry is highly advanced, and each data vertical has already been well monetized. For instance, there are always several players for each type of data, such as satellite data (SpaceKnow, Orbital Insights…) and credit card data (YipitData, MScience, Second Measure, Earnest Research…). On the other hand, society-wide digitalization in the West is actually not as advanced as in China, limiting further growth (I have heard some people still write checks in the US?)
China’s situation is exactly the opposite. On the one hand, there is much more data to work with, yet on the other hand, most of it remains untapped.
So why is it untapped? A major hurdle is that all of these data are kept in hundreds of thousands (even millions) of data silos of large and small sizes, and those silos aren’t talking to each other.
In fact, when China established the National Data Administration in 2023, its main mission was to break down the walls of those silos and let data really flow. Personally, I am not so hopeful for such state-led efforts, but I will write about this in the future.
Opportunity #1: the chance to become a “super-connector” for data.
The unique opportunity, though, is precisely that for those who can manage to open these silos, they can be a “super-connector” for the underlying data to flow out.
One of the first such partnerships we at BigOne Lab set up (which I always used as an example to share with my clients and investors) was how we built our high-frequency data monitor for China’s beer industry.
The biggest channel for beer consumption is through restaurants, bars, and karaoke outlets. In the highly digitalized society of China, when consumers go to a restaurant or bar, it is very likely that we won’t have human waiters or waitresses anymore to take orders. Instead, we will scan a QR code installed on the table and make orders through e-menus.

Most of these “QR-codes-on-tables” are operated by some third-party software providers, which theoretically have data about each item on each order for each consumer and each restaurant outlet. So we knocked on the doors of a few of these software providers and made an offer: Please sell us your data. Feel free to anonymize it and de-sensitize it so no information about any specific individual or merchant shall be revealed to us, but we are only interested in the product item-level details.
For instance, Mr. ABC bought two bottles of Budweiser Magnum at the XYZ restaurant for a grand total of ¥35.5 at 8:20 pm on November 11, 2024. This event will be captured and stored in our database (without knowing who Mr. ABC or restaurant XYZ actually is, of course.)
From that data, we clean (and there is a lot of super dirty data to clean) and turn them into a solid product through which investors and corporate decision-makers alike can have real-time monitoring of how consumers are actually consuming their and their competitors’ products.
By partnering with us, the QR-code-on-table providers gain an extra profit for their low-margin software business, which they could not gain if we hadn’t existed. In the meantime, our customers gain valuable intelligence about something they didn’t previously know about. Thus, a small pocket of incremental value was achieved.
Just imagine, there are hundreds of thousands of these untapped value pockets waiting for us to unlock in logistics, transportation, sportswear, luxury, advanced manufacturing, new energy, tourism, video games, export and import, and basically every corner of our economy.
In time, our industry will witness the rise of “data super-connectors,” whose databases are plugged into all of the previous “silos,” and the data will flow through these super-connectors to the people who need to gain intelligence from that data.
And it’s likely that when this happens, our market will not be as fragmented as it is in the West.
Part 2: Data “walled gardens”
There is a special type of data silo in China.
China’s early adoption of digitalization also gave birth to gigantic data organizations, the most prominent among which are China’s world-famous internet companies, including Tencent, Alibaba, ByteDance, Meituan, RedNote, JD, Pinduoduo, and so on. Each of those players controls a massive internal database of highly detailed user and business behaviors. They are so big that perhaps the term “data silos” or “data islands” are not accurate in describing their databases anymore. They are “data continents,” or, as many people will put it, “data walled gardens.”
And it’s not just internet players. Commercial banks with nationwide networks, three state-owned telecom operators, and many other corporate giants also stand vigilant above their own garden walls against prying eyes.
Compounding the problem for our industry is that, unlike in the West, when you think of e-commerce, you think of Amazon, search = Google, social media = Facebook, video = YouTube; but here in China, most of the players are vertically integrated, so that for each industry vertical, there are usually several players competing there. The e-commerce space is crowded with Alibaba, JD, Pinduoduo, Douyin, Kuaishou, RedNote, and even Tencent’s WeChat. Although Tencent’s WeChat dominates instant messaging, in the broader social media space, it has to share space with Douyin and RedNote. In videos, there are Douyin, Kuaishou, Bilibili, and WeChat. For “search”, Tencent, RedNote, and Douyin have been eating away at Baidu’s expense for a very long time.

Within someone’s own vertically integrated garden, they know everything, everywhere, all at once. Just a few clicks away will give you access to every detail about each click of each user.
But by contrast, each walled garden knows next to nothing about what is going on in other walled gardens. There are no channels connecting those big walled gardens to each other.
Why aren’t those walled gardens sharing more data with each other? Cutthroat competition is one thing. Lack of trust in business culture (which I will elaborate a bit more on later) is another. For now, it’s sufficient to just remember that they are not connected with each other, which also makes it extremely difficult for third-party players to access that data.
Opportunity #2: Whoever is able to break open those walls will profit handsomely
But the opportunity here is also obvious: whoever can break open the walls of some of these wall gardens, by gaining garden owners’ trust, they can unlock a vast body of previously unrealized value.
An example is this data company called ChanMaMa, a leading provider of data analytics tools serving merchants in the ecosystem of Douyin, the Chinese precursor to TikTok, also under ByteDance Group. Rumor has it that Douyin has opened at least some data APIs for ChanMaMa, which offers relatively smooth access to data so that ChanMaMa does not need to web-scrape Douyin’s data like most other players. This access gives ChanMaMa’s data a great advantage in terms of accuracy as well as speed, and now it has become the go-to option for analytical tools for anyone who wants to sell stuff on Douyin’s platform, and is rumored to boast an ARR of ~$30m, which is very sizable in China’s fledgling professional intelligence industry.
A bigger potential breakthrough is not just by working closely with one of the walled gardens, but by trying to link them together.
This is similar to what Travis May of Datavant and LiveRamp fame dubbed the “Give-to-Get Model”. As defined by Travis, in this model:
companies have to share data, and in return, they get access to data that has been shared with them.
NielsenIQ is a great example. It is the leading provider of FMCG data. It gets data from the voluntary sharing of sales data by supermarket chains. In return, Nielsen gives aggregated data back to contributing supermarket chains. Then, Nielsen monetizes the data it collects, for free, by selling that data to consumer brand companies, who need reliable intelligence of their competitors and make strategic adjustments accordingly. It’s a beautiful business model, where supply is essentially free, while demand is also inelastic, creating an incredible moat. Many intelligence firms, such as Visible Alpha, Tegus, and Beauté Research, run on this model.
Just imagine what will happen if such a model applies to China’s big data walled gardens. For instance, if someone could persuade China’s e-commerce giants to trustingly share data with them in exchange for returning aggregated industry data, how much value could this generate? Garden walls will be torn down. Everyone—the contributing e-commerce platforms, merchants, and even government agencies—will gain insights about the complete picture of consumer journeys, unencumbered by each platform’s sample biases.
I think we are far, far away from this future, but the optimist in me believes that, as our industry grows and our industry leaders become more widely known, this future will come sooner or later.
Part 3: China’s relatively “low-trust” society handicaps this industry, while also leaving unique opportunities
China has a “low-trust” society. I recently covered this in my personal newsletter, with Part 1 laying out the basic framework and Part 2 discussing how this concept applies in business and investment. My core idea is that contemporary Chinese culture runs on a “not-trustworthy until proven innocent” basis, as opposed to the “trustworthy until proven innocent” mode that is more prevalent in Western societies.
The “low-trust” nature is especially pronounced in the highly sensitive data business because, as I explained before, this industry is all about trust, especially with all of our intangible and high-touch products.
Of course, we can use contracts to safeguard trust. But legal proceedings are always the last resort, and no contracts can work if there is not a basic level of trust there.
The issue of “trust deficit” handicaps this industry in all directions: demand, supply, and distribution. It is a major reason why this industry in China is still at ~1/100 of its global peer.
Demand-side trust deficit
On the demand side, the trust issue is similar to the “hardware-over-software bias” that I wrote about for the software industry:
Chinese people love hard things. We don’t like it to be soft. Hardware is hard. It’s more real. Software? I can’t see it. I can’t smell it. It could be anything you describe to me, or it could be just a pile of garbage. How can I trust it’s real?
This is the fundamental reason why the SaaS industry in China is still tiny compared with its Western peers. In the B2B setting, it’s much easier to sell a piece of hardware, with flashy screens and huge servers, than selling a piece of formless software.
To be sure, data is not software. But in terms of this hardware-over-software bias, the problems with data are even more critical. Software tools can at least be tried out and used in daily operations. But data? How do you evaluate whether this data is better than that one? What if the providers fake it? In a low-trust society, those questions are hard to answer.
Supply-side trust deficit
The trust deficit issue also creates huge headaches for the supply side and plays a big role in the longevity of “data silos”.
It’s always sensitive to ask for someone’s data. Where is my data going to be stored? How should data be transferred? How to prevent data breaches? How to make sure my data is not misused?
In Post #2 of this newsletter, I described a data product that my team has successfully built, turning data sourced from on-table e-menu services into high-frequency data tracking of China’s beer industry. To enable such innovation, we will have to source data from our partners, who are operators of these e-menu services. What I didn’t mention at the time was that, for each successful collaboration like this, we were usually rejected in 9 other cases.
We did a lot of cold calling and literally knocked on doors to make these projects work.
Distribution side trust deficit
In a more typical industry involving physical products, it’s common to establish distribution networks. You can see them, touch them, smell them, and most importantly, measure their quantity. Walmart is ordering 100,000 pieces of clothes from my factory? Done deal.
But the data industry has no such luxury, for the simple reason that not only is data non-physical, but it can even be easily replicable. A simple “Ctrl-C” + “Ctrl-F” can make a copy of a product worth millions of dollars.
This is why establishing distribution networks in our industry is hard, and most of us prefer direct sales, and even in direct sales, we are constantly worried that our clients may redistribute our products.
However, not having distribution networks is equal to leaving big money on the table. For instance, a vendor may not have certain products like ours, but they have an excellent client network in a market segment that we have never known how to work with. In an ideal world, we two should come together and collaborate, unlocking untapped value for both of us. But most of the time, we can’t do that. Instead, we either leave money on the table or, worse, try to copy from other people’s products, which leads to over-capacity in the industry and makes everyone poorer in the process.
In China’s “low-trust” environment, you can probably guess that this problem is even worse. The trust deficit here is so bad that I know some vendors who do not even allow their own salespeople to access their live data (only historical samples), fearing that their own salespeople will resell the data for their personal benefit.
Can you imagine how inefficient that is?
Opportunity #3: Whoever gains trust in a low-trust setting enjoys an incredible moat
As I explain in Part 2 of “China’s low-trust society”, I mention the best strategy in operating in a low-trust environment.
One major corollary of a “low-trust” society is that once trust is obtained, this trust can be extremely strong, and even stronger than the bonds formed in high-trust societies.
Since trust is so scarce, it becomes a kind of coveted resource. Your best strategy is to try to form high-trust bonds in a sea of low-trust. It will be your secret weapon and can help you build incredible moat. The payoff can be huge.
…
If you are not into this “high trust, low trust” stuff, it’s actually totally fine. Sometimes, the most radical and somewhat counter-intuitive strategies can also win you Guanxi. For instance, being 100% transparent about yourselves and being blunt at the right moment can give you special power since few other people do it this way. If you manage to build a personal brand around these qualities, you will also become a master at quickly obtaining other people’s trust.
This is exactly what we find to have incredible power in our own industry. When dealing with both customers and suppliers, we seek to uphold the highest standard of integrity and honesty. For instance, we always tell our clients that we cannot guarantee perfect accuracy of our data, and we will never do that. But what we will guarantee is always to be transparent about our methodology, our limitations, and the potential risks of using our data.
In terms of distribution relationships, we have managed to build a small network of like-minded partners. There are distributors who help us expand in certain markets that we can’t reach, while promising us they will never resell the data without our permission, and we trust their promise. We “ship” data products to them as if we are shipping a physical product, and they pay us for each new client.
We are also distributing on this type of “back-to-back” basis for the vendors whose products in their own niche will always be better than ours, but would also want to utilize our own client network.
So instead of neijuan (excessive competition), instead of everybody doing everything at the same time, we choose specialization and collaboration. Instead of destroying value, we create value.
I hope we can be successful with this model, which will be yet another proof of my belief that forming high-trust bonds can build an incredible moat in a low-trust society.
Part 4: Lack of customer willingness to pay for data and research
It’s common knowledge in our industry that, compared with Chinese domestic clients, international clients pay more (sometimes a whole magnitude more) for data and research products.
For instance, in the FMCG sector, it is common for foreign brands like Coca-Cola or Mars to pay millions of dollars for offline sales tracking data from Nielsen, but it’s hard to persuade many domestic brands to pay even a million RMB for the same thing.
In the private equity space, it is commonplace for Western funds like KKR or Carlyle to dole out hundreds of thousands of dollars for third parties to do due diligence on investment targets, while for a local RMB fund, many of them don’t even hire third-party due diligence teams but prefer to use their in-house teams.
Such a wide differential in willingness is the fundamental reason why China’s professional data and research industry is still so small.
What contributes to this phenomenon? Will domestic clients become more willing in the future? I can think of 2 big reasons.
#1 Trust deficit (again)
One obvious problem is still about trust, as I mentioned in the previous part. I will give one more live example here.
My firm, BigOne Lab, produces data products tracking KPIs of many listed companies, and it is customary to prepare a backtest sheet, stacking our historical data and estimates against the publicly disclosed quarterly results by the companies themselves. This kind of backtesting is necessary to evaluate the quality of our data, and it’s quite common when we deal with foreign hedge funds. Most clients just ask for it, assuming we are honest.
However, some Chinese domestic fund managers who are new to using data products tend not to trust our backtests. Their skepticism, although misplaced (in our eyes), is not without reason in a low-trust business environment. Given that the companies have already disclosed their relevant KPIs, what if we just made up numbers? Many sales cycles just broke down at this point. We can only imagine how much aggregate business value should have been there, but has not been realized.
But again, societal trust levels in China are improving, and we are seeing more and more people converging to the “trustworthy until proven otherwise” mindset, so I am pretty confident this won’t be too much of an issue in the future.
#2 Professionalization of corporate and investment activities
Trust is not the only problem here. It is probably not even the most important problem. The bigger problem, I believe, is a lack of professionalization. I will analyze this issue separately for the two biggest types of data intelligence buyers: corporates and institutional investors.
Corporates and their first-generation founders
For the corporates, if we just look at our private economy, which is our most vibrant sector, a crucial fact to remember is that most of the businesses are still run by first-generation founders.
Every founder is a legend. They got where they are now because of their fortitude, business acumen, and luck. They are, by definition of being a successful founder, confident of themselves. They also have a special kind of authority within their firms. If they say we are going this direction or that, nobody will dare challenge them, which means they do not need to persuade anyone. They may also make mistakes, but they will only have themselves to blame, and they are fine with that.
Many of them look at data when making a decision. But most of them think their own judgment is way more important than data. And I think they are absolutely right about this. After all, it’s always better to be roughly correct than to be precisely wrong. Relying on data alone to make decisions without referring to good intuition distilled from decades of experience can often be a recipe for disaster.
Within such a context, third-party data, research, and insights can only play a minor role in corporate decision-making. It is difficult to persuade those founders to pay big bucks for something they think they are better at.
BUT, things will change when they eventually pass down their businesses to either second generations or professional managers, who will never have the same level of authority and confidence as first-generation founders. This means that those future leaders will have to persuade other people before making a decision, which means there will be a substantial increase in the need for data and research.
Another secular trend that is also converging here is that, because of China’s highly digitalized economy, many newer businesses are already highly digitalized and data-driven when they are founded. Many of the newer founders have a background in the Internet sector, so reading data is also more of a habit for them.
One of our largest corporate clients is a major Chinese food and beverage company, which not only has a sophisticated data infrastructure but also is one of the rare companies in which the data analysis team could actually dictate what other departments should do.
This company has not even existed for more than a decade and was founded by a bunch of Internet veterans, so it had great data infrastructure for a consumer company from day one. Even more importantly, the founder was kicked out because of fraud by the investors, and the management today is professionally hired. Essentially, the company was prematurely passed down to professionals.
Now, looking from the shoes of the new CEO: This company has great data infrastructure, and this is a company that I haven’t founded. How do I make decisions that my team and my shareholders will feel happy about? Of course, I will need to collect a massive amount of data to make my decisions as scientific as possible.
As first-generation founders eventually faded away, I see this client as an embryo for what the future holds for Corporate China in the next few decades.
Institutional investors
Investment runs on information. Across the world, institutional investors are huge buyers of data and informational products.
While Chinese investors also pay for information, it often feels like the Wild West. I will just talk about one aspect here: Inside information and the exchange of “tips” are rampant and generally under-regulated compared with Western markets. This is not because Chinese regulators don’t want to regulate, but it’s just too commonplace and so too difficult to properly regulate.
To make matters worse, much of the purported inside information either can only tell one part of the story or is just outright fake. In this cacophony of either illicit or fake noises, it is difficult for a fund manager to attach a premium to quality data products and solid research. He or she may not even have the time to evaluate properly which data is good.
This cacophony of noises is reflected in stock prices. In the US, it’s common to see big price movements after a quarterly result announcement, showing that information is well kept away from the investing public. In China, such a type of post-result price movement almost never exists. Most of the information (as well as misinformation) has already been reflected in the prices many times.
But I also believe this is not going to be a permanent feature of our market. I guess in the early days of Western capital markets, it was just as chaotic. But as markets progress and mature, professionalism will take hold.
And I already see changes unfolding. China’s expert network industry is a centerpiece of this “cacophony of noises” and will be the first one to be reformed should the market get more “professionalised”. A few years ago, Tencent won a landmark case against expert network company Third Bridge for breach of commercial secrets and won ~$4m in the lawsuit. Two years ago, Capvision, which was the absolute industry leader in China, was crushed in a highly publicized national security case. The industry is rapidly heading towards a more regulated status.
Part 5: Capital market mismatch
In many industries in China, venture capital helps fund startups and expedite growth. Startups use VC funding to invest in R&D and/or create economies of scale by supporting massive user expansion. Most of China’s successful new companies in the last 2 decades in the technology, internet, and consumer sectors are all heavily funded by VCs.
However, when it comes to our professional data industry, there is a clear mismatch with the traditional VC model.
Professional data companies, once they have reached a certain critical mass, have incredible scalability and operational leverage. Consider this: when your data product becomes some form of trusted industry standard that everyone comes to buy, you actually incur minimal marginal cost when selling to 100 more or 1000 more clients or even 10000 more clients. Each new client you win gives you extra pure profit. After all, it’s just data and information, and it costs next to nothing to reproduce.
I call this the Magical Stage for professional data people, and to reach this stage, what is needed is expertise, focus, and most importantly:
Time, as in “a lot of time.” (I will talk more about it in Part 6)
Standard & Poor's traced its roots to 1860. Arthur C Nielsen founded his eponymous company back in 1923. In our industry, the longer you exist, the more valuable you automatically become. It’s a gift from the Goddess of Time.
This is also why most of this industry’s value creation is “backend-loaded”. In the beginning, for a few years, you are nobody, and you just quietly build your products. The longer you stay, the more likely you become some kind of “data currency” (explained by Travis May in the outstanding piece The Six Moats of Data Businesses). That’s when you really shine and reap most of the value.
Noticeably absent from the list of requirements to reach the Magical Stage is money. Professional data companies are asset-light and don’t need to burn through a lot of cash to build products, which inherently limits the need for additional capital.
On the other hand, being “small-ticket” targets also limits the appetite of traditional venture capitalists. These people typically aim to deploy massive amounts of capital, making bets on high-risk opportunities that have even the slightest likelihood to win back home a 1000x bagger in 5-8 years. Our industry, however, doesn’t allow such an occasion. Therefore, it is not common for VCs to seriously look at our industry.
But it’s not as if most VCs are capable of doing investments in our industry either. They are not capable for two main reasons.
On the one hand, for all the money that venture capital can provide, they don’t have the kind of patience our industry requires. A typical investment cycle for a VC fund in China is at most 5-7 years. But this kind of duration is not enough to win the favor of the Goddess of Time.
On the other hand, because this industry is rather niche, and not many entrepreneurs have successfully exited, there aren’t many good professionals who actually understand the nature of this business. For instance, in many of my conversations with Chinese VC investors, I find that most of them did not even know the difference between our industry and the software/SaaS industry.
Opportunity #5: Vast room for truly long-term capital to profit at the expense of traditional PE/VC investors.
With the challenge comes the opportunity. For an industry with limited capital needs and huge long-term growth potential (remember, we are still 1/100 the size of our western peers,) and which is also insulated from the traditional venture capital ecosystem, it actually creates a unique opportunity for smart money that understands the nature of this industry. With some nimble deployment of capital, this smart money can take substantial and even controlling stakes in the best niche players and ultimately becomes an industry consolidator in the style of S&P or some “data version” of Constellation Software.
Moreover, many of these niche players have already been funded by VCs in the 2016-2021 period, when there was a glut of capital, and when, comically, both startups and VCs didn’t realize they were actually not meant for each other.
Now, a few years later, those VCs are nearing their maturity and are now looking to exit from a business they just found out, heck, they don’t know the value of. This may just well be a golden opportunity for the smart money that does know.
Part 6: Time, data benchmark, and AI
Time is a double-layered tailwind for this industry in China.
Our industry, globally, is a very old one. As I wrote earlier, Henry Varnum Poor’s History of Railroads and Canals in the United States was founded in 1860 as an investor’s guide to the railroad industry. That was 165 years ago.
In 1906, Luther Lee Blake founded the Standard Statistics Bureau, which employed a similar business model but focused on non-railroad companies.
As more investors turn to the statistical journals of Poor and Standard Statistics Bureau, these research publications become instrumental voices in the evaluation of the financial health of target companies’ credit. As more investors trust these credit ratings, the research providers’ output becomes a standard or benchmark, and bond issuers eventually come to these rating agencies and pay them to conduct a credit “health check” of the issuers themselves. This is essentially how credit ratings such as “triple A” or “Baa” that you commonly hear today were born.
Standard and Poor would later merge and become today’s S&P Credit Rating, part of today’s S&P Global. Today, for a bond issuer to issue a bond to international clients, they effectively have to have a credit rating from the Big Three: S&P, Moody’s, or Fitch. This is the power of becoming a benchmark.
Travis May calls it “data currency”:
A data currency is used by two (or more) parties that rely on a particular data set to complete a transaction; meanwhile the data company that controls the currency takes a tax on its use.
Such dynamics play out in other industry verticals, including commodity data.
One of the first global commodities data providers was established in 1909 by Warren C. Platt, who founded the monthly magazine National Petroleum News in Cleveland, Ohio. In 1923, he started Platts Oilgram, a daily newsletter reporting on prices and market information.
In 1928, Standard Oil, Royal Dutch Shell, and Anglo-Persian Oil based an oil transaction on the Platts-published US Gulf Coast prices plus freight. Thereafter, Platts’ oil prices became the global benchmark for oil transactions worldwide, and each participant in these transactions had to subscribe to Platts’ data to keep track of their transactions. After 19 years of existence, Platts evolved from a nice-to-have to a must-have, and its profits grew exponentially.
In 1953, Platts was acquired by The McGraw-Hill Companies, which would later become S&P Global, and later Platts would become the foundational asset of S&P Commodity Insights.
What about China? Unfortunately, I don’t think any Chinese data provider has yet become a globally recognized data currency or benchmark. But fortunately, some seeds were planted a while ago, and some domestic players are poised to reach the “global benchmark” status.
Take the credit rating agency as an example. There are several credit rating agencies in China, but I am sure you have not heard of any of them.
To be honest, the Chinese credit rating industry has not yet existed. Copying from the Western model, the government has just handed several credit rating licenses to a select group of companies, who now have the only legal authority to conduct bond ratings. Thus, the natural path of market development was skipped from the very beginning, and none of the existing players, sitting on virtually free money, had the need or desire to build a globally recognized name.
Nor will they ever be able to. The rating business, stripped of brand recognition, is a highly commoditized industry. Anyone can rate; there is no magic in the algorithm itself. So the issuer just looks for the best rating the street can offer, and that’s why you end up with all the AAA ratings of Chinese bonds, rendering the rating to only have a symbolic value. No rating agency can build up a brand in such an ecosystem.
But someone other than a licensed rating agency might be able to do that.
With all the questionable “AAA” ratings, it’s natural for bond investors to demand a more objective rating to base their investment decisions on.
A company I know does just that: By building a proprietary and incorruptible framework of bond rating, they become a new, grassroots standard that bond investors look to. Although it’s currently only bond investors who subscribe, you can well imagine that, over time, as more and more investors look to their rating, and even require bond issuers to have those ratings, what kind of powerful business they will become. By that time, the bond rating industry in China will truly emerge.
The same thing is also happening with commodities. China has no lack of good commodities data businesses. The two largest ones, MySteel and SCI (卓创) are even publicly listed companies. In fact, this is a rare part of the professional data industry in China that is large enough to hit capital markets. However, if you examine their financial reports, you will find that their revenue and profit figures pale in comparison to global giants such as Platts (S&P Commodity Insights) and Argus Media, as they sell their data for the same commodity at only 1/100 to 1/10 of Western incumbents.
The reason is still the lack of “benchmarking power.” When Saudi Aramco and Glencore sign a contract for future delivery, it’s likely for them to benchmark the delivery prices on S&P/Platts, but it’s impossible to base it on something by MySteel.
But will this never change? As China, the largest consumer of global commodities, takes center stage, wouldn’t it be natural for some benchmark- or standard-level local commodities data providers to emerge?
I believe so. That’s why I always say that an S&P-sized opportunity is still waiting for Chinese companies.
But it’s not just about copying the path of Western giants. What’s even more interesting is that when we trudge our way through, we are armed with exciting new technologies that Mr. Poor or Mr. Platts couldn’t even imagine.
One obvious thing is Big Data. The mass explosion of data in the age of the internet, and most recently, the mobile internet, means we now have much more efficient means to collect data that Mr. Poor couldn’t imagine when he went out in the field to manually inspect railroads.
And then we can finally talk about the role of AI, the hot topic of the day that I have not yet addressed in this series.
AI will have a profound impact on the professional data and information industry worldwide, including in China, in two important ways.
First, it will make the “analytics layer” much “thinner” than before, and the distance from raw data to conclusions and insights much shorter. Gone will be fancy dashboards and complicated data portals. Instead, it will simply be a chat box. One simple instruction will give you all the information you need.
But not exactly. Whether you will get the information you need will depend on the quality of the underlying data. The second-order effect of AI is precisely this. Because the “analytics layer” ceases to be relevant, the much more easily accessible “data layer” becomes much more critical.
The proliferation of data, the fast maturation of AI, and the increasingly critical nature of data mean one thing: when Chinese companies set out to become China’s S&P or China’s Platts or China’s Nielsen, the process probably won’t take several more decades. Vastly better technologies will help expedite this process.
Once, I wrote at Baiguan the following paragraphs:
Over the last several decades, Chinese companies have gradually progressed through different stages. According to Mr. Liu Chuanzhi, founder of Lenovo, it is called “贸工技 trade-manufacturing-technology”. Initially, they were just traders of foreign goods. Then they became manufacturers for foreign brands. Afterwards, they became the master of many key technologies. At each stage, profit margin improves significantly from the previous stage.
Now, we may have reached the final stage: brands, culture, and stories, where most of the value is generated.
POP Mart and Laopu Gold are just the beginning of this decade-long trend. The emergence of more such brands with global branding power will serve as a strong, structural boost to China’s business profits…
The PD&I industry will be exactly the kind of industry where this “final stage” will take place.
We are now finally scaling the ultimate mountain peak of our industry: turbocharged by Big Data and AI, Chinese PD&I companies are poised to win exponentially growing profits by establishing globally recognized brands, standards, and benchmarks.
Part 7: Are data and media really two different things?
In China, there are two seemingly unrelated business models: one is professional data & information, the other is media. For many people, these are two distinct categories. Data companies say they’re in the data business (e.g., Wind), while media companies say they’re media companies (e.g., 36Kr, South China Morning Post). Data companies are populated by STEM engineers and data analysts, while media companies recruit people from a humanities background.
What many people don’t know is that many of the world’s top data companies actually started out as media companies. As I just explained, the origins of S&P lie in two companies: Standard Statistics Bureau and Poor’s Publishing. Poor’s Publishing was the earlier one, founded in 1860. In its early days, it was a publisher of books and journals about the railroad industry (which, at the time, was like the AI industry today—a red-hot sector). It introduced Americans to the railways: where they were located, how well they operated, and so on.
Later, Standard and Poor’s merged to become S&P. Later still, publishing giant McGraw-Hill acquired S&P. Then, in 2016, McGraw-Hill even changed its name to S&P Global. Why would a publisher buy a so-called financial information company? And why rename itself after the company it acquired?
This touches on the proper way to understand the relationship between data and media. They are not two separate things. Data is a way to describe the objective world. Its purpose is to enhance people’s understanding of reality so they can make better decisions. But the objective world can never be captured by just one angle. A hundred people have a hundred views. The same “fact,” seen from different angles, can look very different. For instance, this cup on my table—if someone buys just one, maybe it costs $10. But if someone buys 10,000 cups, maybe it’s only $0.5 per cup. So what’s the “objective” price of the cup? Which data should you refer to? Which one represents the “fact”?
The core of a data product’s ultimate value lies in whether it can become what I described in Part 6: a standard or a benchmark. If a trusted benchmark data provider says the price of the cup is now $2, then everyone will treat that as the benchmark.
Facts do exist, but the version of the fact that gets adopted depends on trust and consensus.
Data that becomes consensus or a benchmark carries immense commercial value. Just like the S&P and Platts examples I mentioned in Part 6.
In recent decades, although Asia has produced a number of excellent commodity data firms, they’re still 1–2 orders of magnitude behind Platts in terms of revenue, profit, or customer value. The fundamental reason? In international commodity trade, it’s still rare for Chinese firms’ price data to be used as the benchmark.
If something becomes a benchmark, it shows it’s influential. And influence stems from a consensus of trust: I know you refer to this data; he does too, so I need it as well.
Where does this “consensus of trust” come from? Of course, the data itself must meet quality standards. It can’t be made up. It needs to be based on a scientific, fair, and transparent methodology. That’s the foundation of everything.
But what many data companies often overlook is the importance of storytelling, communication, and shaping of narratives. Good data without a good story is just a string of 0s and 1s. And 0s and 1s have no moat—they eventually get commoditized and lose any premium value. Only through storytelling can data have a shot at becoming a benchmark and generating maximum value. On this point, data and media are not clearly separated—they’re organically unified and inseparable. That’s why I liked to say: “Data = Media.”
I wasn’t the first to say “Data = Media”, at least not in China. The first was Mr. He, the founder of the WeChat blog Data Iceberg and a successful serial entrepreneur in China’s data space. That blog is a great example of telling compelling stories with data. In recent years, we’ve also layered an English-language media brand, Baiguan, on top of our data business. It’s been a step in this direction, and by now our global influence is strong enough that we can publish op-eds in The New York Times.
A strong media product plays an irreplaceable role in building data credibility and gaining narrative power. That’s why in the 1980s, even though Michael Bloomberg’s data terminal business was already printing money, he still decided to build a news operation from scratch.
This is what most of the PD&I companies in China do not understand. Most of the news organizations in China also do not get this. Is there a chance to merge these two types, the STEM and the humanities, the hard data and storytelling, into a globally recognized, benchmark-level industry leader?
This is the question I wish to answer for this newsletter, as well as for my own career.


