Meenaz Merchant with Jason Barnard Bing

Meenaz Merchant with Jason Barnard at The Bing Series

Meenaz Merchant talks to Jason Barnard about the video and image algorithms.

Meenaz is team lead for multimedia – and that includes image search, video search and camera search.

The single most important factor for multimedia is relevance – ie fit for purpose / answers the intent. But also authority and trust (that are built over time – proving oneself) and engagement.

To understand images, they use alt tags, surrounding content, but also analysis of the image using machine learning.

They process every single image and recognise objects in the images. In terms of image analysis / object recognition, they started with faces of celebrities, and developed their machine learning from there.

For video, quality is vital… but also engagement. And the platform. And the player. Then also the sources that dominate depend on the type of query – news would tend to pull the BBC, CNN et al, entertainment would be mor YouTube, industry specific how-to’s more industry sites…

So there is not a ‘one rule fits all’. As usual 😉

Catch the rest of the Bing Series:

  1. How Ranking Works at Bing – Frédéric Dubut, Senior Program Manager Lead, Bing
  2. Discovering, Crawling, Extracting and Indexing at Bing – Fabrice Canel Principal Program Manager, Bing
  3. How the Q&A / Featured Snippet Algorithm Works – Ali Alvi,  Principal Lead Program Manager AI Products, Bing
  4. How the Image and Video Algorithm Works – Meenaz Merchant, Principal Program Manager Lead, AI and Research, Bing
  5. How the Whole Page Algorithm Works – Nathan Chalmers, Program Manager, Search Relevance Team, Bing

Full Corrected Transcript for How the Image and Video Algorithm Works at Bing (Meenaz Merchant with Jason Barnard)

Jason Barnard: Welcome to the show, Meenaz.

Meenaz Merchant: Lovely to meet you, Jason.

Jason Barnard: I spent so long trying to remember “Meenaz” — it’s a completely unique name. I take it nobody else in the world has it?

Meenaz Merchant: Well, I’d say I’m the only male Meenaz in the world.

Jason Barnard: Brilliant. I was concentrating so hard on “Meenaz” that I forgot “Merchant,” so I had to write it on the board over there to cheat.

Meenaz Merchant: That’s very kind of you.

Jason Barnard: We’re at the Bing offices looking out over Seattle, which is absolutely beautiful.

Meenaz Merchant: Yes, it’s a glorious day today. This is the view you have every day here. I think it inspires us — the Evergreen State of Washington.

Jason Barnard: Stunning. But we’re not here to talk about Seattle. We’re here to talk about video, images, and multimedia. What exactly is your remit?

Meenaz Merchant: We handle image search, video search, and also camera search, which is a new area. With image search, you type a text query and get a set of image results back. With video search, you type a text query and get videos back. And the new area is visual search: you take a picture of something and search using that image.

Jason Barnard: I heard Microsoft is really good at that — identifying specific areas and objects within an image?

Meenaz Merchant: Yes, that’s another area we’ve been working on as well.

Jason Barnard: So you handle image search, video search, and camera search. I’m particularly interested in how these results get into the main SERPs. I’ve been looking at Brand SERPs and trying to figure out why Bing sometimes shows images, sometimes videos. With my own brand SERP, I’ve noticed I tend to get one or the other, rarely both.

Meenaz Merchant: That’s not entirely right. The way it works is: we look at what the intent of the query is. If someone types “flower images,” the image intent is very clear — they want flowers and they want images, so we’ll show a big image block on the SERP. In that case, it’s quite likely that “flowers” doesn’t carry video intent, so we won’t show video results. But for around ten percent of queries there’s an overlap, where both image and video intent are present.

Jason Barnard: And in those cases, you’d show both?

Meenaz Merchant: Yes, it’s quite possible. One might appear at the bottom of the page, or even at the top of page two, but one will always have higher intent and higher engagement over time and will naturally rank higher.

Jason Barnard: Let’s start with images. With image boxes on the SERP, it appears that source diversity is very important?

Meenaz Merchant: We don’t show image boxes if all the results come from just one site. We monitor diversity, and diversity is important. But the most important thing is relevance. If someone is looking for images and only one web page had six really good images of that thing, we’d select all six from that same page, because relevance trumps everything.

Jason Barnard: And relevance, in this context, means matching what the user actually intended to find?

Meenaz Merchant: Exactly. If someone asks for pictures of San Francisco from Alcatraz Island, they’re being very specific. If one photographer has a beautiful set of images taken from Alcatraz, we might select several from that page, because they’re the best match for what the person asked for — even ahead of someone who has taken a similar-looking picture from the Golden Gate Bridge. Relevance means accurately matching the user’s intent.

Jason Barnard: And relevance depends on you understanding what’s actually in the image — not just the alt text and filename, but also the title, caption, and surrounding content?

Meenaz Merchant: All of that’s true, but in the last three or four years there’s been a lot of advancement in deep learning. We now not only understand the text around the image, but we understand the image itself. We know if the image is of San Francisco City. So even if the surrounding text is imprecise — it might just say “pictures taken from Alcatraz” — if the image itself shows San Francisco, we can combine both signals and determine that this is likely the best result, because the image is of San Francisco and the text says it was taken from Alcatraz.

Jason Barnard: I’d always assumed you didn’t analyse images systematically because it’s computationally expensive.

Meenaz Merchant: We process every image. We understand all the objects in it. It’s a relatively new field and we’re making progress. We started with faces — recognising celebrities, for example. From there we expanded. We can identify landmarks like the Space Needle or the Empire State Building. We can identify a German Shepherd, or a specific variety of rose. All of that object recognition feeds into improving our relevance when someone queries.

Jason Barnard: That means there’s no point in gaming your alt tags, because you’re going to analyse the image anyway and spot the discrepancy.

Meenaz Merchant: Yes. It’s a machine learning algorithm that takes everything into consideration. Over time it can learn to trust or distrust alt tags from specific domains. Authority is another important factor: we trust authoritative sites more, and we know they’re less likely to make such errors.

Jason Barnard: Trust is something I find genuinely interesting. My site has been around a long time, and when I publish something, it appears in Bing almost immediately. I assume that’s related to the trust my site has built up.

Meenaz Merchant: Yes. Authority is an ongoing thing. We look at many signals: inbound links from other sites, organic clicks, the diversity of those clicks and referrals. All of that tells us that a site is authoritative. People trust it, they refer to it, and that trust accumulates over time.

Jason Barnard: And for images specifically, is the trust more about whether people like your images and click on them?

Meenaz Merchant: It’s both. Authority and trust, yes, but also the quality of the images themselves — high resolution, high quality. That’s also an important factor.

Jason Barnard: Do you hate stock images?

Meenaz Merchant: We don’t hate any kind of images. It depends on context. If the user is specifically looking for stock images, we’ll show stock images. But if someone is looking for a red rose and there’s a stock image next to a beautiful photograph taken by a real photographer, we’ll show the real photograph.

Jason Barnard: The small image boxes on the SERP — the three or four thumbnail images — are they simply the top results from the image search on the equivalent query?

Meenaz Merchant: Yes. We have what we call the image vertical, where you can scroll through image after image. On the main SERP, we have a lot of content to show — images, videos, news, local — and limited space. So we decide what’s the best user experience given the query, and then we show the top results from the image service.

Jason Barnard: And the same logic applies for video boxes?

Meenaz Merchant: Correct. There are two different aspects. One is the ranked set of results in the video vertical. The other is what we surface on the main Bing page. For the main page, we need to determine that the query has video intent. If it’s about news or entertainment, there’s a high likelihood the user is interested in videos, and in that case we’ll show what we call a video answer or video caption.

Jason Barnard: I’ve managed to trigger video boxes for my name and for my company name. I’ve been building a video strategy this year and last year. Is it true that a strong video strategy will get you video boxes in your Brand SERP?

Meenaz Merchant: What do you mean by a great video strategy exactly?

Jason Barnard: I’ve been producing a reasonable volume of videos that my audience engages with, making sure the quality is good, and distributing them across YouTube, Twitter, LinkedIn, and other platforms. My point is: to trigger the video boxes, you need quality video that your audience actively engages with.

Meenaz Merchant: Yes. Engagement is very important — how popular your video is matters. The title of the video matters. The quality matters, as does the player it lives in, and how well that player performs. All of these things affect whether a video gets shown.

Jason Barnard: And you pull heavily from YouTube simply because YouTube is the biggest platform?

Meenaz Merchant: YouTube is very strong for entertainment and do-it-yourself content. If you want to know how to fix your refrigerator, YouTube is probably the best place for that. But we’re a search engine: we get the best content from wherever it’s available. If you search for news, you’ll see BBC, CNN, Fox News — not YouTube. We depend on the query. Our goal is always to show the most relevant, diverse, authoritative, and popular results. We don’t try to favour any single site. We just look at what’s popular and engaging for that query.

Jason Barnard: So Bing doesn’t have a bias towards Microsoft properties, or against Google properties. You’re purely trying to satisfy the user’s intent?

Meenaz Merchant: Yes. The best result for the query they’ve just given. Exactly.

Jason Barnard: Wonderful. Thank you very much, Meenaz.

Meenaz Merchant: Thank you very much. That was great.

Similar Posts