0 votes
1.7k views
asked in Estimation by (22 points) | 1.7k views
0
what does queries per second in Gmail's context mean?

1 Answer

+1 vote

Gmail is the number 2 Email Client after iPhone with 27% market share.

I would define what query means:

The user opens his Gmail, makes a query to the server, if the Emails are already in the cache, they are loaded directly, otherwise server sends back the top X Emails (depending on how many Emails are shown in the first page)

The user does three main things in Gmail:

  1. Reads Emails. When the user clicks on an Email, if it is already in the cache, it is loaded from there, otherwise it is coming from server. Let's imagine it is coming form server.
  2. Writes Email: The user either answers to Emails or write a new Email. The user Sends the Email.
  3. Draft: After the user is done writing the Email, he does not send it and saves it to the draft: In this case a write query is send to the server to save the draft in the database.
  4. Search: User searches in his Emails.
  5. Delete: User deletes Emails. User can delete Emails one by one (multiple requests to server), or select several and delete with one click (one query to server).

There are two ways to group the users:

  1. based on age. I assume people in the age group of 20-50 receive more Emails, because they have more responsibilities in life, e.g. being parents, university students, need to submit university exercises, book hotel for family, etc. The user group in age <20 are also more into using FB, or messenger apps rather than Email.
  2. based on their use case for Gmail: people who use Gmail only for private use cases, people who use Gmail for both work and private life. I assume the second group has a lot more read, write and search queries as they receive more Emails and they need to react to several of them.

We also have three groups of Emails in Gmail:

  1. Inbox
  2. Promotion
  3. Social

I assume most people read and answer to Emails in Inbox. Promotion Emails are often ignored (not even opened), and and a lower percentage of social Emails are opened, and they mostly don't need to be answered. 

Facts and Assumptions:

Gmail has 1.2 bio users.

I assume on a daily basis, 80% of users are active---> 80%X1.2bio = 9.6 bio ~ 1bio

  • Group 1: I assume 20% of DAUs use Gmail for both work and private life. 20%X1bio = 200mio
  • Group 2: I assume 80% of DAUs use Gmail just for private life: 80%X1bio = 800mio

I assume 80% of group 2 are in age of 20-50 (80%X800). I assume the number of queries for them is 30% more vs. the other 20% (<20 or >50) because as I mentioned, age<20 uses more FB, or messenger apps rather than Email. Age >50 has little reason to use Email, they are not used to it.

Group1: private + work

  1. Search query: 1 time a day ==> 1 req/day
  2. Read
    • Inbox: 20 work Emails, 10 private Emails. The user opens 95% of work Emails, and open 90% of private Emails. ==> 20 * 95% + 90% 10 ~ 30 req/day 
    • Promotion: User receives 20 Emails a week, opens 10 Emails every week. => 20 * 50% /7 ~ 1.5 req/day
    • Social: User receives 21 Emails a week, opens 7 every week = > 21 * 30% /7 = 1 req/day
  3. Write: User initiates 5 writes a day, and answers 5 Emails a day => 10 req/day
  4. Draft: User saves 2 drafts a week => 2 / 7 ~ 0 req/day
  5. Delete: User deletes 30% of total Emails received each day.  => 5 req/day

Total req/day = 1 + 30 + 1.5 + 1 + 10 + 0 + 5 = 50 req/day

Group2: private 20<age<50

  1. Search query: 1 time a day ==> 1 req/day
  2. Read
    • Inbox: 10 private Emails. The user opens 90% of private Emails. ==> 90% 10 ~ 9 req/day 
    • Promotion: User receives 20 Emails a week, opens 10 Emails every week. => 20 * 50% /7 ~ 1.5 req/day
    • Social: User receives 21 Emails a week, opens 7 every week = > 21 * 30% /7 = 1 req/day
  3. Write: User initiates 3 writes a day, and answers 3 Emails a day => 6 req/day
  4. Draft: User saves 2 drafts a week => 2/7 ~ 0 req/day
  5. Delete: User deletes 30% of total Emails received each day.  => 3 req/day

Total req/day 20<age<50 (80% of group 2)= 1 + 9 + 1.5 + 1 + 6 + 0 + 3 = 20 req/day

Total req/day age<20 or age>50 (20% of group2)= 20 * 70%= 14 req/day

 

Total queries = Group 1 + Group 2 (age <20 , age >50) + Group 2 (20<age<50)

Total queries = 200mio X 50 req/day + 800mio X 80% X 20 + 800mio X 20% X 14 = (10k+12.8k + 2.2k )mio req/day = 25 bio req/day

To convert number of req per day to number of req/sec

  • calculate number of req/month (M) 
  • number of req/sec = M*400/1bio

number of req/sec = 25 bio req/day * 30 * 400 / 1 bio = 300k req/sec

---------------------------------

Conclusion:

  • We divided Gmail users based on their usage of Gmail (work+private vs private) and based on age.
  • We divided the type of queries to Search, Write, Read, Draft, Delete
  • We divided the types of Emails to three types: Inbox, Promotion, Social
  • We concluded that there are 300 k req/sec to Gmail.

answered by (82 points)
0
Hi Pegah,
Thanks for the structured and well crafted answer. I think it's easy to follow your thinking structure . I really like how you took into consideration the different user behaviors that people in different age groups have. Couple small tweaks you can make are:
- Consider edge cases that require more queries (synching between multiple devices)  
- Add a sanity check in the end to make sure your estimate is reasonable
0
how would you sanity check this?

Post answer and get feedback

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
To avoid this verification in future, please log in or register.

Related questions

0 votes
3 answers
+4 votes
3 answers
asked Aug 12, 2018 in Estimation by bijan (1.2k points) | 431 views
0 votes
2 answers
asked Aug 29, 2018 in Estimation by Bryan_Al (115 points) | 668 views
0 votes
5 answers
+4 votes
2 answers