On most servers of WoW Classic it should be through by now, the opening of Ahn'Qiraj. And as suspected in advance,
there was alot of drama
inthe course of the big event on various realms, after all, an exclusive mount is involved in the associated quest series, but also the lag-fest in Silithus following the gong.
In a new, very extensive "Behind the scenes" article on the official WoW site, Blizzard officials have now revealed in detail why the event was still a big challenge in 2020 and how they tried to prepare the servers technically for the player rush
.War is upon us. Earlier this month, one of the most hotly anticipated events of World of Warcraft (buy now €14.99 ) Classic was released - the war effort for Ahn'Qiraj. Entire Classic realms- the combined might of the Horde and Alliance - banded together to provide resources to open the gates and unlock the raids of Ahn'Qiraj. When the War of the Sandstorms first (and only) took place in 2006, thousands of players made their way to Silithus to witness or participate in the chaos. The turnout exceeded even the development team's wildest expectations many times over. They were simply not prepared for these masses. The servers were overloaded in no time, and many players were trapped in a seemingly endless cycle of logging in, losing connection, and then trying to reconnect over a twelve-hour period. And through it all, our engineers were running around trying to fix bugs and get players to log in.
While we managed to stabilize the servers during the event and learned a lot from it, there was still plenty of room for improvement. Then, 15 years later, we were ready to revive one of the most epic moments in WoW history in WoW Classic. To do this, we had to optimize our servers to prevent lags and server crashes - and we did it with twice as many players in Silithus as the first run of the event in 2006.
In this article, we'll take a closer look at how we were able to recreate this eagerly anticipated event
.
To that end, we'll take a look at how we used automated player and stress testing to identify vulnerabilities and develop tailored approaches for optimization, how we used software to find solutions to problems that were unsolvable for hardware, and how we were able to organize a worldwide event that kept server crashes in check while preserving the WoW Classic game experience.
In thinking about how we wanted to put on this event, we had three specific goals: We wanted to prevent a series of crashes from occurring, increase the number of players who could be in a zone at the same time, and find out what the pain threshold for lags was before players would be teleported out of Silithus. Before we can look in detail at how we maximized server performance, we need to go into the framework we were dealing with: the limitations imposed by WoW Classic's code, how population caps work, and how it all affects gameplay. WoW Classic: Opening of Ahn'Qiraj - internal insights from Blizzard Source: Blizzard
The modern version of World of Warcraft was created based on the original code that was released 15 years ago. Since the game's release, we have developed several modern methods to deal with a high number of players in Battle for Azeroth, most notably sharding. Shards allow WoW servers to accommodate many more players simultaneously than was possible in 2006. In Battle for Azeroth, we use shards to reduce the load on a server by creating a copy of the respective zone (e.g. Zuldazar) as soon as the number of players reaches a certain threshold. We avoid potential lags by distributing players across different versions of the zone, because player interactions take up a lot of processing power
due to the constant need to send large amounts of data packets to the server to accurately reflect movements and spells cast. In addition, sharding mitigates potential lags caused by players moving to a new zone where the player count exceeds the threshold. This all sounds pretty straightforward, but there's a catch - WoW Classic was created to be as faithful a reflection as possible of the original data from patch 1.12, and that includes the associated gameplay quirks.
Inrare cases, sharding can cause your target (such as an enemy player or NPC) to disappear when you move to another zone.
Keeping sharding would result in losing some of the nostalgic gameplay moments where players chase NPCs or other players across zones. So we needed to find a solution that wouldn't affect the original gameplay, but at the same time would allow us to have more players on a server without making the game unplayable due to lags.
To solve this problem, we decided to use layers - copies of entire regions (such as the Eastern Kingdoms) - to keep population density and lags under control without losing the unforgettable charm of the original game. Using this approach, players would be able to re-lure world bosses to other zones and track opposing players across borders within a region without risking being assigned to a different shard. But layers were never intended to be a permanent solution. Since the original 1.12 release didn't use sharding or layering, we promised players that we would only use layers for the release of WoW Classic and deactivate them over time as players became more evenly distributed across the world.
There are afew instances where we still use layers due to high active player numbers (for example, on the North American server Faerlina), but we've greatly reduced the number of active layers on those realms since the game's release. With 15 years of anticipation, the event surrounding Ahn'Qiraj is one of the most eagerly anticipated events of WoW Classic. According to our expectations, apart from the launch areas on the day of release, this would see the largest number of players in an area to date, and that's without layers to mitigate the impact. Without the technological help of layering or sharding, we needed a creative solution, and fast.
WoW Classic: Opening of Ahn'Qiraj - Internal Insights from Blizzard
Source: Blizzard
We started our search for a solution to population density without layering or sharding by creating so-called "headless clients" - automated player characters - that we made mimic the behavior of real players. So we had them cast spells, fight NPCs, and run around. This gave us an idea of what server performance might be like with thousands of players operating in a single zone. After these simulations, we then organized stress tests with volunteers to look at realistic player behavior and compare this data with our previous results. This gave us an idea of where certain weak points were and which parts of our server code had the most problems when the number of players was high.
Theserver frame time data was analyzed in detail to see how close it came to the state where a server stops responding (also called deadlock).
Thenext step was to analyze what exactly was affecting server performance so we could slowly break down this daunting task into manageable goals. We faced a polynomial task, which meant we couldn't solve it by using faster hardware because hardware doesn't improve exponentially. Instead, we had to do the optimization by hand, by deliberately choosing what data was transmitted to players and how often. Let's illustrate this problem: Say we have 20 player characters bouncing in circles. The server transmits each character's actions to the other 19 players using data packets. With this group of 20 players, the server processes 380 data packets (20 players total * 19 recipients = 380 packets).
Thewhole thing gets even trickier when a larger number of player characters in a zone perform the same action. If we extend our example to 500 players, then 249,500 packets will be sent from the server. If we increase the number to 1,500 players, it is already 2,248,500 packets. Depending on what actions the players perform, several data packets are transmitted per second - remember, the above examples refer to only one action at a time. The more packets are transmitted to the server, the longer it takes the server to process the actions of a single player before it can tackle those of the other players. As this problem worsens, the servers approach a deadlock. In WoW Classic, we have significantly more players on each realm than we did back in 2006, so it's expected that more players than ever will be able to stay near the gates.
Our servers are designed to crash and then restart in the event of a deadlock. So we knew we had to do everything in our power to minimize processing time. After some testing, it was clear that motions were the biggest part of the processing power that was hitting our servers. First, we stopped character alignment updates (which indicate which direction each character model is facing) and only sent player updates when the player initiated movement, completed movement, or moved their character using the keyboard. Since latency is already compromised with a large number of players, using processing power for minor character alignment updates only made the quality worse.
Therefore, it was better to stop them. We decided to rather put more player characters in one zone and send alignment updates less frequently. It should be remembered that our goal was to find the border before the servers collapsed, and at the same time let as many players as possible into Silithus. After all, it's better to get a few less movement updates than not be able to log in with your character at all. We also started throttling low priority data. Actions deemed "less important" should not be sent at the same rate as "more important" actions. Many messages were sent all at once regardless of importance. So we optimized the code so that less important information was only collected and sent less frequently.
Strengthening and weakening effects also negatively impacted server performance
.
Strengthening and weakening effects are used all the time everywhere in the world, especially in combat. This may sound trivial, but with so many player characters in a small space, this information has to be communicated to everyone first. Similar to the throttling of low-priority data, we bundled the buff and debuff effects to avoid sending multiple packets of data to players one after another.
While we optimized our servers to handle more players in each zone, we still realized that we couldn't possibly accommodate the population of an entire realm (more than double the number of players on the original WoW realm from version 1.12) in Silithus at the same time. We had to make the hard decision to restrict access to the zone by determining who could enter and how many players could be there at a time. We decided that only level 60 characters would be allowed into Silithus, and that even those characters would be denied access once the maximum number of players was reached. This restriction was the right decision, since the event in Silithus was known to be designed for maximum-level characters.
Furthermore, low-level characters could still join the war effort in other zones, for example, by fighting the Anubisath, who roamed the Barrens and were designed for level 20-30 characters. The second issue of contention was this: We knew the maximum number of players allowed to be in an area before the server went down. But how much should we lower that number to ensure the best server performance for everyone? Through testing, we found that the optimum was around 1,500 players if the characters were all in one spot. But since the event took place in the entire zone, it turned out that there were minimal problems when players were spread out.
Theevent was supposed to take place in all regions, so we had to make sure that everything worked across multiple layers. This means that the bearer of the scepter who rang the gong on one layer should also trigger the event for all other layers of that realm. Since the trigger of the event was linked to a player action, we wanted to make sure that the bearer of the scepter was also visible to all players of the same realm on each layer. This created an interesting problem because servers now had to communicate this information that they would not normally share with each other. There can be a lot of complications when we're putting together updates and sending them through the servers to make sure that we're transmitting the data on each layer to potentially thousands of players.
Westarted developing this technology when we introduced the fishing contest in Stranglethorn Valley. Later, it was used in the global buffs associated with Onyxia, Nefarian, Zul'Gurub, and Rend. Finally, when we were sure that everything was working as intended, we were ready to test all of our technology for the event around Ahn'Qiraj.
WoW Classic: Opening of Ahn'Qiraj - Internal Insights from Blizzard
Source: Blizzard
Now that we had solved the biggest technical problems and found several ways to optimize server performance, it was time to put our work to the test. We created a shortened version of the ten-hour war that would last only one hour.
During the first stress test, we let almost all players into the zone to see what would happen
.
At one point, we were at almost 150% of the capacity of an entire realm from version 1.12. And at that exact point, our test realm crashed. We knew that we had chosen a very high number of players as the limit for the zone, and that number exceeded that limit many times over. When we investigated, we found that the code that allowed players to both enter and leave a zone was a queue that couldn't handle many players at once, so players weren't
teleporting out of the zone and were stuck on flight paths for an unusually long time. We brought the server back up to speed and continued the stress test. And meanwhile, we adjusted a few things. We reduced the limit to a point where the lag was still noticeable but bearable, and kept a much larger number of players in the zone than ever before. The event should have lasted only an hour and a half. In fact, it took us up to four hours due to crashes.
The second stress test was conducted a week later. This allowed us to see if our optimizations were having an effect. Immediately upon logging in for the stress test, the improvements were noticeable - no players were stuck on the flight paths to Silithus anymore! We were able to collect enough data to show how many players Silithus could handle without major problems. After both tests, we were then able to settle on a number that we felt was best for handling lags and server stability. These tests allowed us to determine if our optimizations were working, and since we were able to figure out the optimal number of players per zone, they were a complete success.
Originally, the optimizations were only supposed to be active in Silithus during the War of the Sandstorm. After ensuring that they could be applied globally without any problems, we implemented these changes across the entire game world with patch 1.13.5. With the start of the war effort, players began to drop resources and scavenge bug carcasses en masse. Not only did player numbers skyrocket in Silithus, but also in the main cities and the open world. These tweaks helped make the game experience feel smoother and enabled huge PvP battles across Azeroth. Some players even went so far as to summon the world boss Thunderaan to help them drive the other faction from a swarm build.
Although the event to open the gates had not yet taken place, some servers were experiencing strange bugs that prevented them from advancing the war effort. The pace at which some servers were advancing the war effort was so rapid that, in the logic of each submission, a race situation was created that prevented the five-day timer from starting. Because the likelihood of such an exception occurring was so low, we were able to manually fix the bug for those servers and then ensure that it could not occur in future war efforts of other servers.
WoW Classic: Opening of Ahn'Qiraj - internal insights from Blizzard
Source: Blizzard
After the war effort was completed and the five-day timer expired, we kept an eye on the Chinese realms that would be the first to open the gates. The first server in China for which the gong became active was Ouro. As we noticed, on each layer the most players were in Silithus. The event would start on multiple layers with maximum occupancy for several thousand players at once. We had never tried anything like this before. Although there were significant lags, our servers did not crash when the gates first opened in China.
On August 4, it became clear that several realms in North America would be ready to ring the gong shortly after the server reboots. Using Game Master accounts and our monitoring tools, we kept a watchful eye on these realms to troubleshoot any potential issues. All realms booted up and began the event smoothly. The bearers of the scepter received their prestigious Black Qiraji armored drone as a mount, the players were able to compete against even larger bugs, and we rejoiced in the stability. While waiting for our first server to complete its five-day wait after the server restart, we noticed a serious problem: the event did not persist after the server restart.
This bug had been around since the beginning of WoW Classic
development
, but therejust hadn't been many events that needed to
persist past the server restart.Our team was able to quickly fix the bug, but we had to make sure that no further server restarts occurred until we had applied the hotfix and saved the status of all war efforts to date in our database, all without disrupting players.
WoW Classic: Opening of Ahn'Qiraj - Internal Insights from Blizzard
Source: Blizzard
Some would argue that the server crashes made the original War for Ahn'Qiraj so chaotic and therefore so memorable. Instead, we wanted to evoke that same passion by creating a much more stable game experience that players could experience simultaneously with 1,500 other players in Silithus. We wanted the War for Ahn'Qiraj in WoW Classic to be remembered as an event where as many players as possible could participate in the ten-hour war without interruption. There were some server crashes, but the servers were always back online quickly. These realms fully recovered and were back online within minutes with no further crashes.
Over 4,000 players became Scarab Lords worldwide, and that number continues to grow as each server advances in the war effort. The enthusiasm and eagerness of players in WoW Classic since the war effort for Ahn'Qiraj began is indescribable to us. Thank you to everyone who has joined us for the second War of the Sandstorms!
Support buffed - it only takes a minute. Thank you!All readers get free daily news, articles, guides, videos and podcasts about World of Warcraft, Pokémon Go and other favorite games from us. Up until now, we've funded this site through advertising and kept it as free of paid articles as possible, but since COVID-19, that's become increasingly difficult. Many companies are cutting or eliminating their advertising budgets for 2020. Budgets that we unfortunately have to rely on if we want to continue offering buffed free of charge in the future in the form we're used to.
That's why we're turning to you now. You can support us as buffed supporters so that we can continue to offer our content free of charge in the usual form without introducing a paywall or publishing misleading news. Every contribution, big or small, is valuable. Support buffed - it only takes a minute.
Support nowWe thank you in advance
. Also read these interesting stories 0