Podcast: Play in new window | Download
Subscribe: Spotify | RSS | More
Ali Alvi with Jason Barnard at The Bing Series
Ali Alvi talks to Jason Barnard about the search algorithm for featured snippets.
First thing we learn is that this feels a lot like a soccer interview.
Then Ali confirms what Gary Illyes said in 2019 – the different candidate sets use the core algo in a modular fashion. Ali is team lead for the Q&A candidate (Q&A is Bing’s name for featured snippet)
But also that all of the algos are end-to-end neural networks. We know what goes in, we see what comes out… but nobody knows what goes on in between 🙂
And a nice clarification – Q&A are pulled from the blue links below it. Other rich elements such as video and images don’t rely on the pages the 10 blue links provide – they have a separate selection process. Now that is interesting.
Even more interesting – Ali answers the intriguing question “where do the descriptions for the blue link / core results come from?” (spoiler alert – it isn’t from the core algo!)
We talk a great deal about trust – Bing must trust the website providing the answer. So building trust over time has to be key. And then onto the main factors / features that affect ranking for Q&A are: accuracy, trust, authoritativeness, freshness… and not being offensive (aka safeguarding Microsoft’s reputation).
We also discuss Google’s decision to remove the result from the main results when content is used as a featured snippet (Ali doesn’t agree with Google here).
And finally, dependence on annotations by the crawling and indexing team, as discussed with Fabrice Canel in the previous episode. It all fits together so nicely !
Catch the rest of the Bing Series:
- How Ranking Works at Bing – Frédéric Dubut, Senior Program Manager Lead, Bing
- Discovering, Crawling, Extracting and Indexing at Bing – Fabrice Canel Principal Program Manager, Bing
- How the Q&A / Featured Snippet Algorithm Works – (this episode) Ali Alvi, Principal Lead Program Manager AI Products, Bing
- How the Image and Video Algorithm Works – Meenaz Merchant, Principal Program Manager Lead, AI and Research, Bing
- How the Whole Page Algorithm Works – Nathan Chalmers, Program Manager, Search Relevance Team, Bing
Full Corrected Transcript for How the Q&A / Featured Snippet Algorithm Works (Ali Alvi with Jason Barnard)
Jason Barnard: The camera is kind of far away. They usually have cameras right in people’s faces. Anyway, welcome to the show, Ali Alvi.
Ali Alvi: Thank you. Great to be here.
Jason Barnard: That’s the best name I’ve heard all day. I love your name.
Ali Alvi: Thank you. Ali Alvi, rolls off the tongue, doesn’t it?
Jason Barnard: Yes, brilliant.
Ali Alvi: Like the boxer. People debate whether it’s “Ali” or “Ali.” I say it like the boxer.
Jason Barnard: All right. So here we are, looking out over Seattle from the Bing offices. You’re the team lead for Q&A?
Ali Alvi: Yes, I’m the lead PM for the team that handles Q&A in Bing, including the captions and the snippets you see under the URLs in the search results.
Jason Barnard: The blue link descriptions. Better descriptions that get pulled dynamically. That’s part of Q&A, so they can be generated as well?
Ali Alvi: Yes. The algorithms we use to generate the snippets are essentially the same algorithms we use for Q&A. Google calls it “featured snippets” because a snippet is just a feature. We use a slightly different framing at Bing: we’re saying this is an answer to a question, which is more explicit. And when you look at the architecture, we’re not just taking a snippet and featuring it. We actually do a lot more than that in many cases.
So broadly, it falls into this category: when a user comes and asks a question, or a query that looks like a question we can answer directly, that’s the domain my team handles. In addition to that, I’m also the lead PM for a high-ambition AI initiative called Project Turing.
Jason Barnard: Project Turing. That’s a Microsoft initiative, particularly within Bing?
Ali Alvi: There’s a team of scientists and applied researchers working on high-ambition natural language processing algorithms. We’re kind of the hub for those algorithms across all of Microsoft. That same team provides some of the models we use in Q&A. Think of it as a horizontal team that provides the brains for a lot of these scenarios, and Q&A is one of them.
Jason Barnard: So you’re the brains behind Bing?
Ali Alvi: I wouldn’t go that far. I represent the brilliant minds who are the minds behind Bing.
Jason Barnard: So for Q&A, my journey to this conversation started when I asked Gary Illyes from Google whether there’s a separate algorithm for featured snippets. He said, very dryly, “No,” and then explained how it works. The idea is that you’ve got the basic algorithm for the blue links, and then there’s a module alongside it that uses either different features, or the same features with different weightings?
Ali Alvi: Maybe I should take a step back.
Jason Barnard: Absolutely. I went too fast.
Ali Alvi: You jumped straight to what makes results rank at the top. Let me back up. Search engines historically have been just ten blue links, and that’s how it was for around fifteen years. Q&A, or featured snippets, started coming around about three or four years ago. The idea is: we have a query, we narrow it down to the ten, fifteen, or twenty most relevant documents, and then Q&A asks, “Can we have the machines read through those documents, do some comprehension on top, and extract the specific part that actually answers this question right on the spot?”
Jason Barnard: So Q&A is actually based on the results underneath it. And video, images, and knowledge panels are based on completely separate processes? I’m beginning to understand. You’re working vertically from the blue links and asking: what can we pull out that gives a definitive answer?
Ali Alvi: Well, it depends on the question. If you ask “How tall is the Eiffel Tower?” that’s very definitive. But sometimes you ask things like “What is the average salary for a computer scientist?” and there’s no single definitive answer, so sometimes we give a range. You can also ask subjective questions like “What vegetables take the shortest amount of time to grow?” and there’s no one right answer there either.
Jason Barnard: I’ve been saying this for a while: if you want Bing to put you at number one, you’re asking Bing to recommend your content as the solution or the answer. But the featured snippet is different. It’s not recommending, it’s saying, “This is the answer we’ve found to be the best.”
Ali Alvi: Yes. When you have a featured snippet or a Q&A, it becomes tricky from a user’s perspective. They think Bing is telling them “This is definitively it.” But the reality is, we’re saying: given the context, this is the best answer we found. We’re not declaring it the absolute truth.
Jason Barnard: But isn’t that a sign that people trust you and Google? We’ve got to the point where we just accept it as the answer, and when it’s wrong, we get really upset.
Ali Alvi: Absolutely. Part of my job is to channel that sentiment from users and drive that empathy through the whole product. Even when we picked the best answer we could find, sometimes it’s not correct, or it’s off-topic, or it’s hurting people’s sensibilities. When that happens, users perceive it as something Bing did. So we have to own the message.
Jason Barnard: You have a feedback button on the SERP, so you get a lot of direct feedback from people?
Ali Alvi: We get it right a lot of the time, but we do get it wrong. We make sure we’re as close to the customer as possible. We call it “zero distance to our customers.” That means doing user surveys, bringing people in-house, asking questions. And any feedback we get, we respond to internally, at least to direct it to the right people.
Jason Barnard: Do you read everything?
Ali Alvi: Yes. Everything.
Jason Barnard: Back to the algorithm. It’s based on the blue links. What are the most important features you feed into the machine learning algorithm? I’d immediately think expertise, authority, and trust.
Ali Alvi: I was actually going to flip the question: as a user, when you come and ask Bing a question, what would you say makes a good answer?
Jason Barnard: I’d imagine you’re looking at expertise, authority, and trust. And it also needs to be accurate.
Ali Alvi: Accuracy is a hard thing to judge.
Jason Barnard: Good point. But I keep hearing that accuracy is based on accepted opinion.
Ali Alvi: You need to figure out what accepted opinion is, and that’s the biggest part of my job: defining the right metric. As a product manager for an AI team, we don’t write code. We define what the algorithm needs to do and how to measure whether it’s doing it correctly. That responsibility is 100% on the product manager. I’m the one who defines that metric and holds the entire team accountable for meeting it.
Jason Barnard: So the metric is the secret sauce you’d never tell me.
Ali Alvi: You already said what it was. It has to be relevant, authoritative, trustworthy, not offensive, and fresh. What I would add is that we’re using almost entirely neural networks and deep learning-based solutions. And by definition, with deep learning, we don’t know exactly what the features are. The machine takes text, gets the query, gets the passages we give it, and comes back and tells us which one to show.
Jason Barnard: That reminds me of conversations where people say there’s no point asking what the ranking factors are. So what I understand is: your team labels this as a question and this as a correct answer, you build a dataset, and you feed it into the machine with the metrics you’re looking for?
Ali Alvi: Exactly. And you can see that if the metric is wrong, the machine will latch on to whatever the metric says. With deep learning and end-to-end neural networks, all you get is data in and a labelled outcome.
Jason Barnard: That’s elegant in its simplicity, even if you can’t see inside it.
Ali Alvi: Yes. These are transformer-based deep neural networks for natural language. People who know these algorithms understand that you can’t really define what the model will do, except by controlling what data goes into it.
Jason Barnard: That’s why I say to people: imagine Bing and Google as black boxes. Don’t try to understand what’s inside. The art, as marketers, is to say: if I can predict what comes out based on what I put in, I don’t need to understand everything in the middle. That was already true when it was if-then-else logic. Now it’s even more true.
Ali Alvi: The models we have in production have hundreds of millions of parameters. There’s no way to go in and understand what’s happening. It’s just millions of floating-point numbers. The only way to measure it is: give it input and measure the output.
Jason Barnard: That’s exactly what we should be doing as marketers. Forget the bits in the middle and focus on what we can measure.
Ali Alvi: Yes. And with machine learning, every time it runs, it runs slightly differently because it’s learned from the past run. There is predictability to some extent, but the more data the models see, the more they evolve. And since we build on top of the ranking stack, if the ranking changes, everything downstream changes. Change the query slightly and, as far as the machine is concerned, it looks completely different.
Jason Barnard: What I find interesting is that it’s actually very human at the end of the day. We’ve got these machines, but the responsibility is human. And you’re the ones who take the blame every time it goes wrong.
Ali Alvi: AI is built for humans, by humans. For humans means all of our users. By humans means that even though we’re not writing code, we’re the humans who define the guardrails within which the AI has to operate. We decide what data goes into the system, and we decide what we label as good or bad output.
Jason Barnard: That weighs heavily on your shoulders.
Ali Alvi: Yes. Especially when you take it and amplify it. When it’s at the top position and you consider voice scenarios for smart assistants, if you ask Cortana something and Cortana speaks the answer back, that could be powered by the same engine underneath. And now the user doesn’t see a URL or any text. They just know: I asked Cortana a question and Cortana answered me.
Jason Barnard: I have a question about E-A-T: expertise, authority, and trust. To judge authority or trust, you have to have understood the entity. So in order to apply E-A-T to me or my company, you’d need my name and my company in your knowledge graph. Without that, E-A-T means nothing. Is that a correct statement?
Ali Alvi: We don’t rely heavily on the knowledge graph. Some of our answers do come from the knowledge graph, but the AI I’ve been describing works primarily on unstructured data, basically all hundred-billion-plus URLs and documents on the web. The way we build authority is by building on top of the ranking stack. Authority and trust are something we hold the ranking team very accountable for. On top of that, ranking signals come back attached to all those documents, and we take those into account. And these neural networks are very good at figuring out: is the entity the query is asking about actually present in this answer, and are they a good match?
Jason Barnard: So my question was really saying: to apply E-A-T, you need to understand the entity that’s giving the answer. And you’re saying, actually, not really?
Ali Alvi: It’s implicit within the content. What’s more important is understanding the entities mentioned in the text that correspond to the answer I’m looking to provide. There are things we do explicitly, but those are mostly to prevent bad outcomes: filters for hate speech, fake content, offensive material, adult content. It’s very hard for a single machine learning system to know all of those things, so you have to have some kind of guardrails.
Jason Barnard: So the filter does two things: one is corporate responsibility, serving users the way they’d want to be served; the other is an algorithm built from user feedback, from when people complain.
Ali Alvi: Most of the filters are also internally trained algorithms. It starts with flagging something as bad, you accumulate that data, you expand it into a big enough dataset that’s representative of that class of issues, you train a model, and that model can then apply the filter on the fly.
Jason Barnard: People in digital marketing are getting worried that users aren’t clicking through to their sites because Bing is answering the question directly. What would you say to them?
Ali Alvi: I understand why that sentiment exists, but the reality is that everyone is trying so hard to be number one in the ranking. The Q&A block is actually putting you above everything else. If a site has good, reliable content that’s very relevant to the user’s question, it’s almost impossible for us to answer every single question with 200 characters. A lot of the time, users do click through. We actually have a very high click-through rate on the Q&A block.
Jason Barnard: And if I can get my brand name into that featured snippet, I’m getting branding even without the click. Because Bing is saying “I trust the answer from this brand.” By association, you’re giving me credibility.
Ali Alvi: Yes. And statistically, traffic does go to those websites. We have very high click-through rates.
Jason Barnard: Google has just removed the second link in the page when a site is featured, and people are complaining about that.
Ali Alvi: We don’t do that at Bing. If you’re appearing in the blue links, you’re still there.
Jason Barnard: A lot of people argue you have to be on page one. But you’re saying it could be on page two, or not even in the current results, and you’d still be the featured answer, because you’re building on top of the blue links as they were at a given point?
Ali Alvi: We build on top of the blue links, and most of the time it’s done on the live set. But sometimes, if we’ve shown an answer that performed well, we memorise it. When we show it again, that URL might not be ranking where it was. It could have changed over time. So there’s a chance the URL won’t be in the top ten blue links at that moment. It’ll be somewhere in the ranking, but I can’t guarantee it’ll be in the top results.
Jason Barnard: I was talking to Fabrice about the way he annotates the data they put in the database. If a blue link comes up but the answer isn’t in the meta title or description, you go down, pull the annotation, and use that?
Ali Alvi: Absolutely. That’s the high-ambition AI I mentioned: what we call machine comprehension. We can give the algorithm an entire webpage and say, “Here are a thousand sentences. Here’s a question. Find the best part of this page that answers it.” The applied scientists and researchers we have are building those algorithms using natural language techniques. We’re genuinely teaching machines how to read, comprehend, and understand.
Jason Barnard: But initially you depend on Fabrice’s annotations to create the blocks and extract the relevant segments, and then on top of that you layer your own natural language processing?
Ali Alvi: We absolutely rely on Fabrice and his team. We can’t build these algorithms on raw HTML. We need to extract what we call the document body: just the clean text of the page. Once we have that clean text, we can go in and extract anything out of it.
Jason Barnard: Which explains why, when I’ve got a poor meta description or one that doesn’t contain the keyword, Bing just digs down and pulls something else out.
Ali Alvi: Absolutely.
Jason Barnard: That’s the best conclusion of the lot. Thank you so much. An amazing insight right at the end.
Ali Alvi: Thanks everyone. Thank you so much.
