Frédéric Dubut with Jason Barnard - The Bing Series

Frédéric Dubut with Jason Barnard at The Bing Series

Frédéric Dubut talks to Jason Barnard about Bing’s search algorithm.

Episode #1 in a series about how ranking functions at Bing

This conversation confirms that the overall system for ranking at Bing functions in the same way as Google (as explained by Gary Illyes) – Darwinism in Search

But things rapidly become more interesting still…

Frederic is the 10 blue links / core algorithm team lead but goes on to explain a little about how ranking works for featured snippets, images, videos… and intriguingly how the whole page algorithm works.

After this interview, he generously organised a series of interviews with other team leads at Bing.

The Bing Series

Listen to (or watch) this episode with Frédéric to whet your appetite for the stunning revelations that the Bing team leads give me in the other four. Frédéric shared a lot of interesting information me in this conversation. But that is nothing to what I learned in the other four.

  1. How Ranking Works at Bing – Frédéric Dubut, Senior Program Manager Lead, Bing
  2. Discovering, Crawling, Extracting and Indexing at Bing – Fabrice Canel Principal Program Manager, Bing
  3. How the Q&A / Featured Snippet Algorithm Works – (this episode) Ali Alvi,  Principal Lead Program Manager AI Products, Bing
  4. How the Image and Video Algorithm Works – Meenaz Merchant, Principal Program Manager Lead, AI and Research, Bing
  5. How the Whole Page Algorithm Works – Nathan Chalmers, Program Manager, Search Relevance Team, Bing

The Bing Series (Part 2)

Assuming these 5 episodes are well received, we’ll do another series of interview-cum-conversations later in 2020. Hopefully getting more detailed insights into other things Frédéric mentions – I’m hoping for Ads (that work on much the same principle as other rich elements), knowledge panels, local results… and more.

Full Corrected Transcript for How Ranking Works at Bing (Frédéric Dubut with Jason Barnard)

Jason Barnard: Welcome to the show, Frédéric Dubut.

Frédéric Dubut: Thank you for having me.

Jason Barnard: Absolute pleasure. We’ve met a couple of times.

Frédéric Dubut: That’s right.

Jason Barnard: I’ve talked to people about the ranking algorithm and nobody will tell me any secrets. I suppose you won’t either, but I wanted to talk to you about the candidate sets — the idea Gary Illyes described, where the blue links form the basis and the candidate sets are bidding for a place. They need to outbid the top blue link, and if they do, they win the position.

Frédéric Dubut: Sounds about right. In the end, what we want is to really serve our users. The ten blue links are the basis of everything. And then, if the query is a question or something we can answer with an intelligent answer — what Google calls a featured snippet, and what we call Q&A internally — that comes on top. And then there are all the other answer types. So there’s a different team for each. I’m the ten blue links. I have a colleague who handles Q&A. And then there’s a team called Whole Page.

Jason Barnard: [Laughs]

Frédéric Dubut: The Whole Page team, as the name suggests, runs the entire page as an end-to-end product. They arrange ads when the ads team tells them there are ads to show. They look at the ten blue links. They look at potential answers — video answers, image answers, news answers — and if they think those answers are going to satisfy users more than some of the blue links, that’s where they start inserting them.

Jason Barnard: So my Darwinism framing is that these candidate sets bid for a place, and they live or die by whether they can convince the algorithm they have more value than a blue link. But you’re saying it’s actually teams deciding that their specific element is more interesting and inserting it manually?

Frédéric Dubut: No, no, no. It’s not manual. If, say, I’m working on videos, I generate the best video answer I can for a given query. But my team is not the one deciding whether it shows up on the page. That’s the role of the Whole Page team.

Jason Barnard: And it’s all working on the same algorithm but with different weightings?

Frédéric Dubut: Yes. For featured snippets, being accurate, being fresh, and being authoritative is going to matter much more than having links, for example. It’s the same central algorithm working with different weightings, and each team is tweaking it for their specific rich element and their specific need. And what you show really depends on the query. Take a query like “Beyoncé.” It’s very important to show videos and news — that’s what users want. In that case, the ten blue links matter less. The master algorithm makes the call. But for a simpler, more general query — “what is one plus one,” say — the calculation is completely different.

Jason Barnard: “What is one plus one” — I’d imagine that gets a direct answer, a featured snippet. But if you just type “one plus one” without the question framing, you might get a hosting company called One on One, or a song. It’s not as simple as it looks.

Frédéric Dubut: Probably, yes. And each team also signals how confident they are in their own answers. If the video team can pull strong results for “1+1” — whether it’s a video explaining the maths or a Beyoncé song — that confidence level feeds into the Whole Page decision.

Jason Barnard: Coming back to ads: is the same principle at work there? Is there an algorithm calculating whether an ad is valuable to the user, with a team behind it similar to the video team?

Frédéric Dubut: Yes. The key principle for ads is that we still want to satisfy users. We want ads to be relevant, and we want users who click on an ad to find that what’s on the other side actually satisfies their query. It’s a sponsored auction, but the same principle of user satisfaction applies. Same system of candidate sets bidding for a place. The difference is transparency: we don’t interleave ads within the organic results. Ads sit on top and at the bottom. But other answers, like video, can be inserted between positions three and four.

Jason Barnard: And how do you decide whether a video appears at position one or between three and four? Is it saying: “This video satisfies the user enough to get on the page, but not enough to be the very first answer”?

Frédéric Dubut: Yes, that’s the right intuition. Whatever the algorithm places at position one is probably the best result for users. Showing the video between three and four says: the top three results are probably more satisfactory overall.

Jason Barnard: But you said “intuition,” not algorithm. The algorithm is built by humans, right?

Frédéric Dubut: It’s a machine learning model. The definition of the algorithm is built by a human — you put your intuition into it. What signals are important? What signals are not? And then, using machine learning, you train the model to balance all those signals.

Jason Barnard: Machine learning is like cooking. You’ve got ingredients, utensils, and a chef. The data is the ingredients, the algebraic models are the utensils, and the intuition of the person building it is the chef. Is that right?

Frédéric Dubut: Yes. For traditional machine learning, you still need to tell the machine what signals — or features, as we call them — you think are important.

Jason Barnard: “Ranking features,” not “ranking signals”?

Frédéric Dubut: In machine learning, every kind of input is called a feature. You can have hundreds of them in any algorithm, not just in search. The classic example: if you’re Zillow — a US real estate company — and you want to predict the price a house will sell for, your features might be square metres, number of bedrooms, location. Those are all different features that you, as a human, define as inputs.

Jason Barnard: So when we in SEO say “ranking factors,” we should really be saying “ranking features” in machine learning terms?

Frédéric Dubut: That’s right. You, the human, tell the machine: these are the things I think matter. Then you give it a lot of examples, and the machine assigns different weights. To carry your cooking analogy further: you tell the machine you probably need eggs, milk, and butter. Then the machine determines how many eggs, how much milk, and how much butter to make a good pancake.

Jason Barnard: And the intuition of the person programming it is to get the machine off to a good start — pointing it in the right direction so it doesn’t produce the world’s worst omelette on the first attempt?

Frédéric Dubut: Yes. And one of the nice things you can do is cook ten different pancakes with different proportions and have a human taster judge each one: this is very good, this is very bad, this is okay. The machine then adjusts the formula automatically. That’s exactly what we do for web ranking. We have a set of queries and URLs, and our human labellers — sometimes called judges or raters — say: this is good, this is bad.

Jason Barnard: The machine then looks at which features are the most predictive of something being good or bad?

Frédéric Dubut: Exactly.

Jason Barnard: That’s the same principle as Google’s quality raters?

Frédéric Dubut: Yes. When we write the guidelines — which are essentially the product specification — we want them to produce universally good results, applicable across all the markets we operate in. General principles: what it means to be on-topic for a query, what determines a quality website. We want the guidelines to be objective enough that two different raters who understand them and judge the same query and URL will arrive at the same rating.

Jason Barnard: But two people will never give exactly the same rating — there’s always an element of judgement.

Frédéric Dubut: True, which is why you need a large enough scale. And the guidelines are fairly objective. Our goal is to reduce the variance as much as possible, even if you can never eliminate it entirely.

Jason Barnard: Are the guidelines pushing Bing towards being more multimedia-rich? Is the ten blue links format dying out?

Frédéric Dubut: No. The ten blue links are not going anywhere. They’re the bread and butter of search results. The guidelines I work with only address the ten blue links. I don’t know exactly how the Whole Page team’s guidelines work, but when my judges are evaluating things, they’re only looking at the blue links in context.

Jason Barnard: So you have judges evaluating each different element separately?

Frédéric Dubut: For me, yes: only the blue links. The Whole Page team has people looking at other elements. It’s the bread and butter of machine learning — you need people rating the elements they’re responsible for so the data coming in is clean and accurate.

Jason Barnard: Are the ten blue links being reduced over time? We used to get ten systematically, and that seems to be declining. What’s your sense of the average? Seven or eight per page?

Frédéric Dubut: I don’t know exactly. That might be roughly right. In the end, you want the page to stay roughly the same size, so any time you add something, you probably need to remove something else. Though sometimes you do see a page with rich elements and still nine blue links — the page just gets longer.

Jason Barnard: My instinct was that adding a video block or image block means one blue link drops off the bottom. But that’s not always true?

Frédéric Dubut: It’s the Whole Page team’s decision. Sometimes you want to be comprehensive with the blue links because the query has several different intents, and you want good diversity and coverage — several results per intent — alongside the rich elements.

Jason Barnard: I’d call that hedging your bets. When it’s ambiguous — take contrôle technique in French, which is the MOT check in the UK — someone might want to know what it is, in which case you show Wikipedia; or they might want to book one, in which case you show local results. That ambiguity means you’re not really aiming at ten opportunities. If the page is showing a local pack and a news block, a sales page might only realistically be competing for three spots.

Frédéric Dubut: That’s fair. If there are three different intents, you want good coverage across all of them. Though there’s a nice feature for more specific queries — if you search “download Office 365,” for example, you can add keywords at the top of the results page to really nail the intent, whether that’s 32-bit or 64-bit. In that case you can still have ten results targeted to that specific intent.

Jason Barnard: But only if the user actually engages with that functionality — and most people don’t.

Frédéric Dubut: People do use it. It appears at the top of the page before the results. And if you really know what you want, getting ten highly targeted results is better than getting three for your intent and seven for an intent that isn’t yours.

Jason Barnard: Fair point. One last question, entirely off-topic: there’s a perception that Bing is used primarily by people who haven’t changed their default browser settings — less technically engaged users. Is that a real weighting in how Bing approaches its results?

Frédéric Dubut: Not at all. When we write the guidelines, we don’t target any particular demographic — no concept of specific age groups or gender. We want results that are universally good and applicable to all markets. General principles: what it means to be on-topic, what determines quality. No subsection for “good results for men” or “good results for women.” That would introduce biases we certainly don’t want in the model.

Jason Barnard: Good. And the thirty percent US market share figure that gets thrown around for Bing — is that reliable?

Frédéric Dubut: The numbers we use come from comScore. They measure what they call “Explicit Core Search” — searches that were made intentionally, not searches that were triggered incidentally without the user intending to search. The exact methodology and definition are published on the Bing Ads website, if you want to check. There’s a five-line footnote explaining exactly how the numbers are computed.

Jason Barnard: “Intentional searching.” When would someone make an unintentional search?

Frédéric Dubut: I’m not entirely sure. It sounds strange. I suppose it was enough of an issue for comScore to create a specific definition for it.

Jason Barnard: You’ve answered everything with great clarity. Thank you very much, Frédéric.

Frédéric Dubut: Thank you.


[Cut to UnGagged Conference, Los Angeles, 2019]

Jason Barnard: Hi, Jason Barnard here. We’re at UnGagged 2019 in Los Angeles. I’m with Frédéric Dubut.

Frédéric Dubut: Perfect pronunciation.

Jason Barnard: Thank you — I’m French. One question: why did you come to UnGagged?

Frédéric Dubut: I came because Mitchell Robbins, who I know from SMX, thought it was a good idea to have some Bing people talk about the evolution of search and ranking.

Jason Barnard: And you had a great time?

Frédéric Dubut: It was great. A very engaged audience, a lot of questions. We had to run over time because I had 45 minutes of questions after my session.

Jason Barnard: The best thing about UnGagged: 45 minutes of questions because people were genuinely interested in what you had to say.

Frédéric Dubut: You nailed it.

Jason Barnard: Brilliant. Thanks a lot.

Similar Posts