Overview
Today we’ll look at 6 examples of problems associated with Web Service:
- how the internet works
- DNS
- Web server
- Music Player
- MP3 file
- 秒杀
Question 0
how to solve raido-play failures
> Failure rate = % user who can't listen to music properly
> = # user who fail to plya one song / # total users
Misson: reduce failure rate.
How does server identify a user?
If a server uses Cookie to identify unique users, the result might be > real users.
However, if server uses IP address, the result might be < real users.
How to collect data for failure rate
Version 1
Log:
- user send a log to server when it visits
- user send another log after it plays a song
- we can identify users who failed to play a song
In fact, everything should be logged, including play, pause, switch song, refresh etc.
Version 2
User login are, in fact, automatically logged when user visits. Thus user ONLY have to send log after it plays music.
Summary
- define failure rate
- user cookie to identify user
- use log to collect failure data
- analysis pattern of failure againt date/time
Question 1
the process of playing music
Prepare
Send DNS request
Prepare DNS reply
Send DNS reply
Process DNS reply
-
Send webpage request
Prepare webpage reply
Send webpage reply
Process webpage
-
Request music player
Prepare music player
Send music player
Process music player
-
Request MP3
Prepare MP3
Send MP3
Play MP3
What is process Music Play?
Local browser will do rendering, flash decoding etc. If any point of this 17 steps went wrong, the music-play fails.
Is there a system/browser default Music Play?
HTML player is, but flash player is not. So the flash module have to be requested every time.
Real data: failure rate 20%
In practise, the real failure rate is 20%. Which is:
- 8% DNS
- 5% Web
- 5% MP3
- 2% Player
Question 2
fix DNS problem
First of all, how to find out DNS failures? There are 2 ways. First way, help desk do it. Second way is to use the Desktop app to help detect the host address.
Step 1. HOSTs hijack
Some users’ host file can modified by competitors.
- ping the website url
- modify host file manually or by desktop app
Step 2. ISP
Each ISP have different DNS service. Eg. CSTNET fails to update the latest DNS, after a server change.
After this step, DNS failure rate fall from 8% to 1%. Why still 1%? Some companies bans music play in company web.
Question 3
fix the web problem
Highest failure rate:
- 3pm office hour
- 9pm highest bandwidth nation-wide
Solution 1, reverse proxy
Reverse proxy w/ more servers. Reverse proxy acts like a load balancer.
Reverse proxy is a type of proxy server that retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client as though they originated from the proxy server itself.
Common uses for a reverse proxy server include:
- Load balancing
act as a “traffic cop,” sitting in front of your back-end servers and client requests. Try to maximizes speed and capacity utilization while ensuring no one server is overloaded.
If a server goes down, the load balancer redirects traffic to the remaining online servers.
- Web acceleration
can compress inbound and outbound data, as well as __cache commonly requested content__
also perform additional tasks such as SSL encryption to take __load off of your web servers__
- Security and anonymity
By intercepting requests headed for your back-end servers, a reverse proxy server protects their identities and acts as an additional defense against security attacks.
Solution 2, reduce size of web page
- simplify javascript files
- compress images (lower dpi)
- merge large images to 1 image (less requests)
- lazy loading (Pinterest uses it a lot)
Solution 3, more cacheable pages
Change dynamic webpages to static pages. The advantage of this is:
- more search engine friendly.
- more cache friendly.
Summary on caching
Caching can happen at place Number 1, 2 and 3:
AT Number 4, we can add more servers. Number 3, reverse proxy. Number 2 is caching within the ISP network, which avoids requesting info again from backend. Number 1 is front-end browser cache.
After this step, Web failure rate fall from 7% to 4%. Why still 4%? Well, these failure is mainly from the junk users created by marketing.