Bright Data’s Approach to Make Web Data Available with AI-Powered Data Scraping: Insights from Chief Product Officer, Ariel Shulman | Podcast Ep. 29

Video Thumbnail
Play Button

ExtraMile by HiTechNectar is a renowned and top-notch interview series featuring voices from industry leaders shaping the future of technology. Our episodes are here to unlock innovations, strategies and practices driving rapid digital transformation.

For today’s session, we’re joined by Ariel Shulman, the Chief Product Officer at Bright Data, the leader in web data infrastructure, empowering over 20,000 companies by enabling them to access, integrate, and utilize critical web data at scale.

Ariel, a visionary professional with rich expertise in technology management, marketing, and strategy, walks us through his journey at Bright Data. Alongside, shares the unique challenges in product-led growth. Further, he highlights why businesses need structured, real-time web data, how AI is transforming data retrieval, and the company’s relentless culture of agility and innovation.

Join us in this insightful conversation and explore how Bright Data is shaping the future of web data accessibility and powering the next era of AI-driven business intelligence.

Key Takeaways:

  • Primary Challenge in Product Management: Challenge number one is probably to just to know what to do.
  • Importance of Strategic Focus: Challenge number two is probably actually the opposite, is to know what not to do.
  • True Value of AI Agents: The real utility or the real usefulness of these AI agents is when they become aware of their surroundings, when they can communicate with the outside world.
  • Function of Deep Lookup: What Deep Lookup does is it allows users to enter a query that will look at all this information that we have collected in these huge datasets, as well as have real-time access to different websites and then to synthesize a response.
  • Bright Data’s Unique Company Culture: Bright Data DNA is something that we are obsessed with and it really talks about moving fast.
  • Bright Data’s Strategic Role in the AI Era: Bright Data is the glue that allows the new internet to have uninterrupted access to data that’s available on the old internet.
  • Converting Data for AI Consumption: A website which was built for humans with images and all sorts of stars and animations and convert that, we convert that into very nicely structured data. Which is something that these AI agents or AI models know how to work with.

About Our Guest


Ariel Shulman

Ariel Shulman is an accomplished executive with extensive experience in technology management, business development, marketing, and strategy. Since joining Bright Data in 2021, Ariel has leveraged his networking, security, and Internet expertise to drive innovation to access high-quality public web data solutions at scale. He now serves as CPO, responsible for Bright Data’s AI-integrated product suite that leads innovation.

Fluent in English, French, Spanish, and Hebrew, and with professional experience across multiple countries, Ariel plays a pivotal role in shaping Bright Data’s global positioning.

About Company


Bright Data

Bright Data is the leading web data infrastructure company, trusted by over 20,000 organizations to ethically access and collect public web data at scale. Bright Data’s platform supports teams across AI, finance, e-commerce, travel, marketing and other markets with the performance, reliability, and compliance standards needed for large-scale data operations.

Built for large-scale use, the platform integrates seamlessly into existing systems and data stacks, offering full control over collection workflows. With built-in governance, flexible delivery, and alignment to global compliance standards, it supports secure, efficient, and auditable operations.

Transcript


Host: Hello everyone, and welcome back to another episode of ExtraMile by HiTechNectar, where we discuss the latest technologies and strategies that are driving transformation across industries with our experts and leaders.

I’m your host, Sayali and today I’m really excited to welcome Ariel Shulman, Chief Product Officer at Bright Data. So, Bright Data is a global leader in web data solutions, helping over 20,000 businesses worldwide. Ariel brings years of experience in product management, technology, strategy and more and has played a big role in pushing Bright Data forward with innovation.

Today, we’ll talk about Ariel’s career journey, Bright Data’s mission to make web data accessible, AI-powered web scraping, and what’s next in the world of data.

Ariel: Thank you, it’s my pleasure.

Host: So, Ariel, your background covers from product management, technology, marketing, sales and strategy. Could you share some key highlights from your journey with us?

Ariel: Yeah, well, I’m an industrial engineer by education and industrial engineering is something that I chose because I didn’t really know what I wanted to be, but I knew that it wasn’t going to be a doctor or a lawyer, so I went in the industrial engineering kind of direction and the nice thing about industrial engineering is that you learn a little bit of everything and you do know how to speak and how to interact with finance people, programmers, other engineers, developers, et cetera, et cetera.

And this has really helped me a lot in my career. I’ve always been at the intersection of technology and business. It’s kind of a good combination.In fact, when I think about it in most jobs, even when I was in a sales capacity, I usually knew more than the sales engineers that came to kind of, quote unquote, support me. So, I’ve been more drawn to the technical side of things, although I’m not like a real developer. I joined Bright Data in 2014.

It was still called Hola at the time. We went through many, many changes and updates along the years and a few years ago, I became the chief product officer. I think that really has kind of the biggest impact on a company, especially a product-led company.

Everything that we do in Bright Data is product-led. We don’t have any professional services. People sign up, they use the platform.

So, this has a really big impact on the company’s success.

Host: Thank you so much for sharing this amazing journey with us. So, moving forward, as a product management leader, what are some of the main challenges you face in product development and engineering?

Ariel: Yeah, it’s funny you ask that because I was just interviewing a candidate for product management here and I told him that one of the most difficult things that we have is that we don’t have anyone telling us what to do. When you are doing some kind of a job and your boss tells you, you need to do X, Y, Z, that in itself makes the work easier because you know what you need to do. And in product management, you don’t necessarily know what you need to do.

And in fact, if you ask people, sometimes they give you the wrong answers. Even they don’t know. So, there’s a famous quote by Henry Ford who said that if I ask people what they wanted, they would say that they need faster horses, right?

Because people didn’t know that a car was possible. So, challenge number one is probably to just to know what to do. Challenge number two is probably actually the opposite, is to know what not to do.

Because there’s so many things to do and so many people coming, people from sales or people from other product departments or marketing saying we must do this, we must do this. And it’s really important to be able to say no to certain things because there’s something called opportunity cost. When you’re doing one thing, you’re not doing the other thing.

So, choosing the right thing to do is the important thing. Probably, and I think this is a common theme for anyone in technology, it’s very hard to estimate timelines. You think that something could take a week, it can end up taking a month because it’s more complicated or because there are some additional complications you didn’t see or some other compliance reasons.

But probably the most difficult one is actually to know what to do.

Host: Now, speaking of Bright Data, Bright Data recently launched a free version of the web MCP. How does it help AI agents deal with the challenge of getting reliable web content?

Ariel: Yeah, so MCP was a technology that was released by Anthropic and it effectively allows an AI agent to interact with third party tools very easily. Instead of having to teach your AI agent to do all sorts of things with the tools, this MCP layer is something that kind of educates the AI agent how to use the tools. In the world of web data, the important thing with MCP is that it allows an AI agent, which is kind of an isolated entity, to communicate outwards with the real world in real time.

In many cases, AI agents, when they’re trying to contact websites to get maybe search results or weather or traffic or prices or anything like that, get blocked by these websites because websites treat AI agents as bots. The MCP server that we have released, which, by the way, has even a free tier, so anyone can sign up for this for free web access, gives these AI agents uninterrupted access to the web in general. So, to websites, to search engines, effectively, it means that the AI agent can send a request to the internet through our MCP server and it is guaranteed to get a response.

It will never get blocked.

Host: Since we’re already speaking of AI, what exactly does AI-powered web scraping mean and why is it so important to collect real-time web data for better decisions?

Ariel: OK, so if you think about an AI agent, that’s really the big trend these days is AI agents and we are seeing thousands and thousands of customers signing up with different AI agents. AI agents are connected to an LLM, so it can be an OpenAI, Anthropic or some other kind of DeepSeek, some kind of engine. And these engines have been trained on specific, you know, tons of information, but they have kind of a cutoff date.

So, they have been trained up to, let’s say, January of 2025. Now, if you have an AI agent that is trying to do something useful for the user, usually it involves getting information from the real world. So, in real time, if you ask an AI agent that doesn’t have web access, you know, what happened yesterday in the news?

It’s going to come back and tell you, hey, I don’t know. I’ve been trained up to so and so date. So effectively, it’s kind of like a book or encyclopedia.

It’s not connected. The real utility or the real usefulness of these AI agents is when they become aware of their surroundings, when they can communicate with the outside world. And that’s when they need to communicate.

And that’s when you need to have the scraping ability to get information like travel, OK, or news or weather. That’s all real time information. This is for maybe a kind of consumer facing AI agents.

But even enterprise AI agents sometimes need external access. Let’s say you’re working in a power company and you are using an AI agent as a person in the power company to plan your day and to see which customers you’re going to visit and to plan your route. All of that information, for example, the addresses of people and the roads and everything could be stored inside the databases of the power company.

But you may want to know what the weather is, OK, to plan ahead. Now, weather is something that’s dynamic. So even an AI agent that is enterprise focused will always need to go out and get some kind of real time information.

And that’s where you need this kind of AI scraping capability.

Host: This is very interesting. Now, let’s talk about one of your tools. Could you give us a quick overview of Bright Data’s Deep Lookup?

Like how does it help businesses handle complex questions with efficient web scraping?

Ariel: So Deep Lookup is something that we released a couple of months ago in beta and is now live for everyone. So, here’s the story. We have, I’m going to put the do not disturb here.

We have, you know, over 20,000 customers in Bright Data, enterprise customers, and they are scraping the web for millions of different reasons from different sites, etc. All the information that is collected from the Internet is what we call public web data. It is freely available to anyone without a user and password.

And we store this information. So, you can think of this as having a copy of the Internet in Bright Data. What Deep Lookup does is it allows users to enter a query that will look at all this information that we have collected in these huge datasets, as well as have real-time access to different websites and then to synthesize a response.

So maybe I’ll give you an example. It can help illustrate the value, maybe the difference between this and a Google search. So, let’s say that you are a battery manufacturer.

You make batteries, like, you know, AA batteries, the small batteries, and you want to sell them in hundreds of thousands. You’re not selling individual batteries. So, you need to find your customers.

That’s typically companies that make toys, for example. You want to sell them the batteries that go inside the toys. So, when the toy gets sold, your batteries are already inside.

So, you can do a deal for a large quantity. So, you can enter a query in Deep Lookup that will be something like this. I’m a battery manufacturer.

I make AA batteries. I want to sell them in large quantities. Find me manufacturers of toys or gadgets that need AA batteries.

For each of those companies, find me the purchasing manager and their contact details. Estimate the volume in terms of units shipped per year and give me a table of all of this information. Now, if you put this into Google, you’re going to get nothing.

I mean, you’re going to get something meaningless. With Deep Lookup, you enter this query and this process can take 10 or 20 minutes. OK, so there’s good reason for that, because Deep Lookup is going to go through all the datasets that we talked about, for example, all the Amazon products, all the Walmart products, LinkedIn profiles of people.

It’s going to go through social media to find products that are hot. It’s going to go through forums to find reviews, all sorts of things like that. And again, it can take 10, 20 minutes and it will end up giving you a table of the companies that you need to talk to, how many products they’re selling every year and who do you need to talk to in that company.

So that’s like a very valuable lead table for your salespeople to go after. This can take, you know, weeks for people to put together. So, this is a good example of, you know, Deep Lookup kind of blending datasets that we have kept, web scraping and AI to give a good question to a really tricky business.

Sorry, a good answer to a really tricky business question.

Host: That’s definitely a breakthrough. Moving on, as we all know, Bright Data serves a wide range of industries. So, what are the key things you keep in mind to meet such different customer needs, especially when it comes to data services?

Ariel: Yeah, it’s true that there’s a really very big variety of customers in Bright Data. So, we have sometimes consultants who are doing one-off projects. And on the other hand, on the other edge or range of the spectrum, we have Fortune 500 companies that are sometimes regulated by the Securities and Exchange Commission, so big banks and things like that.

So, there’s three main things I can think of. So, first is compliance. So Bright Data in general is very strict on compliance and privacy.

So we are, you know, we follow GDPR and CCPA and all sorts of privacy regulations. And we are also certified with the ISO and SOC 2 and SOC 3 and all sorts of certifications that basically say Bright Data is running a very organized, compliant operation. Some of these customers need this information, for example, big banks, where they really need to know that the company, we in this case, that they’re working with is very, very, you know, has the corporate governance and everything that’s needed.

Other customers don’t really care. They just need the data that they need. So it really depends.

So that’s, you know, the industries can dictate the level of compliance that’s that they’re interested in. But two other things that are maybe easy to understand is size and in the real time or not. So, some customers are really small.

Maybe they want to get 5000 profiles off of LinkedIn and they can get that in an email, in a CSV file or into their Google Drive. Other companies sometimes need to collect information from something that we call the Web Archive and they need to download petabytes of information. Petabytes is millions of gigabytes.

OK, so there’s huge amounts of information. It’s not something that you can email or put into a Google Drive. Sometimes these things need to be pushed directly into Amazon Storage or Azure Storage or Snowflake.

There’s very, very big differences there in terms of scale. And maybe last thing in terms of diversity is some operations require real time answers and some operations require or don’t require real time answer. They can be done offline.

So whenever, for example, you book a hotel or you book a flight or you do some shopping, many of the brands that you are using as a consumer, actually you use Bright Data under the hood and to search the web for the best deals or the best prices for you. For example, when you go to some travel websites, you know, sometimes it takes the table kind of some time to fill it, fill up with the flights, right? It doesn’t show up immediately.

The reason for that is that through Bright Data that those companies actually go and look out for, look for the different best prices. And that’s all real time. But there are other types of operations that don’t require real time information.

Let’s say that you are some kind of an investment firm looking for private companies and you’re looking for the web at the web for some signals of those private companies. It’s OK to collect that overnight and maybe even over the weekend and then come back and look at all this information. So, these are two different, very different things technologically also, the real time or not real time is very, those are very different things.

So different industries have different characteristics.

Host: As we’re aware, planning is one thing, but execution is key. So how do you make sure every new initiative at Bright Data is executed effectively and efficiently?

Ariel: You know, we in Bright Data, what I mean, what do we do in Bright Data? We collect publicly available web information from the web. The web is built for humans.

Websites change all the time. They change all the time, sometimes because, you know, people want them to have more features or more categories or just to be designed differently. And sometimes they even change because the website owners, for some reason, want to make it more difficult for bots or for scraping companies to collect this publicly available web information.

So, we are in a world that’s constantly changing and it’s changing very fast. Because of that, we have really kind of developed what we call the Bright Data DNA. Bright Data DNA is something that we are obsessed with and it really talks about moving fast.

We are a little bit strange in that, for example, we have no meetings here. OK, there’s no meetings in Bright Data. There’s no such thing as a monthly or weekly meeting.

Everything is done quickly, ad hoc. And we don’t have a meeting room in the company. There’s no such thing.

Actually, there is one room, but it’s only reserved for outside guests. So, we’re not allowed to use that. We don’t write in the product department.

We don’t write big, long documents of what we call PRDs. So, we don’t write those documents. We just do some things very quickly with short iterations on Slack.

We do a lot of MVPs, so minimal viable product, just to see that if we get, you know, some kind of a product market fit and we move really quickly because the web also moves quickly. If our customers depend on us to scrape a specific website and that website has changed, we need very quickly to adjust to that. So, we are very agile.

We do hundreds of releases a day and we move fast. Sometimes things break. I always tell the guys here that it’s a good thing that we are not in the medical device industry because people would be dying left and right.

But in this case, no, it’s just web. You can always revert. And we move fast and we hire people from all over the world.

Effectively, we have kind of 24-7 agile development. You have to do that if you want to serve customers in this, you know, ever, you know, moving kind of marketplace, which is the web data.

Host: Now, one last thing. Since we already spoke about real-time data access, how do you see data collection methods changing in the future?

Ariel: I think, first of all, there will be more. There will be more data collection because, you know, people are saying that, kind of, there will be more agents than people browsing the web soon. This is kind of the cliche that people are saying now.

And I think it’s true because people will be using these things more and more instead of going on five different websites and trying to collect information in their head to see, hey, what is the best, you know, kind of, I don’t know, Android phone to buy. They’re going to ask an agent to do that. Say, you know, go to these five sites that I like and choose the best Android phone for me.

So, there will be more of that. And all those agents will always need this uninterrupted access. And I think that if you can kind of divide the web into the old web and the new web, so the old internet is built for humans.

It’s just people with mouse and keyboards and brains who are trying to get information and process it inside their head. The new internet, which we are all starting to use more and more, is around agents, which take actions on your behalf, or AI models that you ask questions and they will give you some answers. And many of those answers are also related to real-time information.

The thing is that if you look at this kind of old internet and the new internet, there is a gap between because when the new internet, so agents and models try to connect to the old internet, which is built for humans, there’s either blocking or there’s compatibility issues. It’s very hard for those layers to communicate. And I think that kind of Bright Data is the glue that allows the new internet to have uninterrupted access to data that’s available on the old internet.

So, you can take, for example, a website which was built for humans with images and all sorts of stars and animations and convert that, we convert that into very nicely structured data. For example, something called Markdown language or a JSON file or a CSV file, which is something that these AI agents or AI models know how to work with. Okay.

So that’s kind of our part in this market as it is evolving right now.

Host: All right, Ariel. This has been such a clear, interesting and insightful conversation. Thank you so much for walking us through your journey, Bright Data’s vision and the future of web data.

Once again, thank you so much.

Ariel: Thank you for the good questions. I had fun.

Host: And to our viewers, thank you for joining us for the episode of ExtraMile by HiTechNectar. I’m your host Sayali. We’ll be back soon with another leader, another story and more insights.

Until then, stay tuned.


Explore Our Other Insightful Interviews:

Is Agentic AI Search Technology Driving a Shift from Traditional Search? Ft. Tim Resnik, Global VP of Professional Services at Botify | Podcast Ep. 28

How Intelligent Automation is Rewriting Rules of Enterprise Efficiency? Ft. Kavitha Chennupati, Senior Director at SS&C Technologies | Podcast Ep. 27

Bright Data Reviews & Recognitions