Exercise 68 – You are the PM for a streaming video service. You come into the office and see that one key metric has dropped by 80%. What will you do?

Post and review answers and feedback to answers in the comments section of this post.

See also:

List of problem solving questions for product manager job interviews

Leave a Reply

avatar
  Subscribe  
newest oldest most voted
Notify of
Scott Lin
Guest

80% is a huge drop in any key metric.

I would try to first narrow down exactly what metric this is so I would ask the interviewer if they could tell me if the metric is new user retention, churn, monetization etc.

Second, I would try to understand if this was a sudden or gradual drop. for 80% definitely sounds like a sudden drop or else somebody would’ve said something already.

If it’s a sudden drop, I would try to pinpoint around what time this drop occured and figure out if there were any internal/external factors that could have caused it.

internal factors include: new feature was released, server went down, a new bug became prevalent. For the last two, you can segment it by region, browser/device type, and OS type. The issue could also be that the metrics we are grabbing is incorrect.

external factors include: a new competitor has joined into the market, bad PR, maybe a firmware was pushed outside of your control. It could also be due to seasonality or a major temporary event. If it’s a major temporary event, you should see KPIs begin to return to their normal state shortly.

Third, I would try to see if any other relational KPI drops. It’s easier to know what KPI it is before, but we can go along the user journey and see if any KPI before it dropped.

IE: A user signs up for the service -> enters in a credit card for payment (optional) -> clicks on a video to watch -> Watches the video -> chooses another video to watch

This is important in narrowing down exactly when the problem first starts. For example, if a key KPI is number of videos watched, perhaps the sign in is where most people are failing.

If the issue is a feature, I would try to clarify what the goal of the feature is. It could be possible that we started doing targeted ads and conversion dropped but the first time purchase after clickthrough increased. It would be important to understand if the goal of the feature change was met even with this big of a KPI drop.

If I can ascertain the exact issue, I would work with Sys Ops, Engineering, and other people on my team to try to address it. If the issue is a bug, we would have to issue a hotfix. If the issue is a server, than sys ops can look into it. If the issue was due to a feature release, we should probably look into either fixing it or reversing it quickly.

If the issue is external, this would be harder to solve immediately and would often require going through the normal cycle of product development to address them.

So in summary, I would first make sure we can ascertain if the drop was temporary or permanent, gradual or sudden and if the KPI drop may have occured elsewhere int he user funnel. I would look at internal and external factors to try to see if I can pinpoint the issue. Third, if the issue can be fixed immediately, I would contact my team to put out a hotfix or roll back a change that we may have made. If not, we should understand the issue thoroughly before acting and let people in the company know of our findings.

Bijan
Guest

Great answer. On the external side, seasonality is also something to consider

Anil Punjabi
Guest

Nice response Scott.

Let me know what you think about informing the larger team immediately.
An 80% drop is something people would notice and I went the route of informing the larger audience before it created a panic.

Anil Punjabi
Guest

Any drop of 80% is a big drop.

Start by alerting your teammates: Customer support team, Leadership, IT, Security, Dev team about this issue.

Setup a quick call so that people can jump into a teleconference line to discuss the issue.
Send a reminder for the Customer Service, IT, Security & Dev team leads to join immediately.

Start the call & inform everyone about the issue and ask them to start looking into their respective dashboards for any additional clues.

Meanwhile, you should start looking at user feedback and other dashboards that might indicate an issue.

If the impacted conversion is something like Checkout, then try to purchase an item and see if it goes through.

Try to access the site from inside & outside the network.

Within 30 minutes, if folks on the call haven’t been able to figure out the issue, then drop a note to leadership informing them about who all is on the call & are investigating. Let them know that you will provide the next update within 30 mins.

Ongoing updates do not need to be every 30 mins, but you do need to keep digging in to figure out the problem.

That covers the few main areas in a Crisis:
1) Informing quickly
2) Escalating
3) Problem solving

Generally such a big drop usually means:
> DoS attack
> Problems with a new rollout
> Problem with the backend system or machine

Your team should be able to help you narrow it down to a solution.

Once the team has figured out a solution, continue to keep everyone informed and then address the fix with the right team and see it to completion.

Once fix is implemented, thank everyone, especially the team that figure out the problem & fixed it quickly.

anonoymous
Guest

1) Did this key metric drop for a particular market (US/UK)?
2) Did this key metric drop on a specific platform (PC, mobile (android, iOS..))?
3) If this metric is usage related, did the flow from source/referrers (SEO, SEM, partners, FB..) to our streaming service change? If so, which one?
4) Is this fall in key metric seasonal? Has it happened before? Over what time frame has this key metric fallen?
5) If this metric is usage related, did the streaming video usage go down for ALL services, or was it just ours?
6) Was this drop in the metric related to problems of internet connectivity? DNS failure? Data center down? if related to a specific market, did the ISPs go down?
7) Was there a natural calamity that has caused this metric to go down?
8) Is the metric going down a telemetry related issue?
9) If this metric is usage related, is the metric going down a result of a recent app redesign ? (user error)
10) Is the metric going down a result of the app misbehaving? (app error)
11) Have the users gone to our competition?

Adarsh Mohapatra
Guest

We need to have a complete understanding of the key metric before determining next steps. In this example, the interviewer is asking about streaming service. The best way to start to think about a popular streaming service like Netflix and the relevant metrics.

More information is needed-
What is the frequency of measurement (daily/weekly/monthly/quarterly/yearly)?
How is data collected and processed for this key metric)?
When is the data presented?
Is this metric is related to acquisition, activation, retention, revenue?
Is this metric specific for a device, show, region, language, genre?

Depending on the answers, we can take specific steps.
Few scenarios-
1. New user signup in the USA using a mobile device dropped 80% compared to the previous day. This metric is measured daily, data collection is fully automated, and data is presented daily with zero latency.
I will first rule out few reasons because Product managers are supposed to know what is going on with the product.
• I am ruling out any issues with cloud services (AWS or own data centers) because everybody should have heard about it even before checking key metric
• Everybody should have heard if the entire streaming service is down for a significant amount of time either due to a new deployment, dos attack during the previous day.
• Somebody hacked and stole user data

Now, we can start identifying probable root causes.
• Is this normal? Did it happen before?
80% is a big drop but who knows if this has happened before. Let’s assume not many new users sign up for Netflix on super bowl Sunday or some big event.
• We are comparing 2 days of data. Let’s say we are on day 4 of the week comparing the data for day 2 and day 3. Let’s analyze the data for day 1 to check for any significant increase in user signup on day 2 compared to day 1. This may happen if Netflix has released a new season of a popular show.
• Was there any disruption in the signup service and third-party credit card system?
Even if the streaming service was working fine there may be issues with specific system components related to sign up.
• Check the news for anything special happened yesterday?
Few examples, Winter storm for most parts of the country, terrorist attack, stock market tanking 3000 points

2. Revenue related to independent shows for last quarter dropped 80% in the USA?
This metric is measured on a quarterly basis, data collection is not fully automatic, and data is presented a month after the quarter ends. This is tricky, but Netflix produces the movies shows and make the shows available free. They should have a way to calculate the revenue tied to the following
• New users signed up because of the new show
• Drop in user churn rate because of the new show
This is key but complex metric requires extensive analysis related to cost of the shows, no of shows, user growth for the quarter. As data collection is not automated and straightforward, there is a need to check the assumptions and calculation model.

Anonymous
Guest

I think this is a great, well thought through reply. The approach is really good. You might want to provide the customer journey for a video streaming service at the beginning of the problem solution so that you can then break it down step by step. But overall- great answer!

krishnendu
Guest

1, Check the latest build along with the QA and devs
2, Minutely analyze the changes made in the latest build and try to correlate; if any change in the update can make an impact on the affected metric
3, Verify the same metric in the older version of the product
Steps 1,2,3 – Can point at some possible bugs
4, Request the QA to test the latest build along the lines of the metric
5, Request the devs to verify any possible backend issue, check error logs (if generated from users), narrow down the list of devices on which the bugs are possible