Archive for the 'Service Metrics' Category
We wanted to remind everyone again about the minor maintenance planned to our support system this afternoon, Tuesday, between 3.15 PM PST and 4.15 PM PST.
During this time access to Live Chat for Billing and Support issues will be unavailable. Telephones support for billing issues will also be offline.
Sorry for the loss of access but we’re hoping it will be a quick update and then the system will be back online again.
Best Wishes to you
Hello everyone!
I have updated our Service Quality Metrics page with updated data through the end of January 2008, and the downloadable data and charts are also available in excel, open document, and google docs formats. The format of the charts and datasets has been truncated to a “rolling 7 months” format to remain concise and to increase the legibility of the charts. The full historical detail of this data to the daily level is still available in the end-of-2007 data files from my last SQM post.
January had a larger volume of Planned Outages than prior months due to the planned system outage on January 9th to expand our asset servers, and a larger number of Rolling Restarts than usual as we rolled out a very aggressive number of bugfixes. The rolling restarts, as announced, were on January 2nd, 17th and 31st. These outages, while regrettable, are a necessary part of our process of stabilizing Second Life and making it more available. Even with this large number of planned outages (our most aggressive deploy schedule to date), due to increased quality of testing and better automation of deploy processes, we still maintained less than a 1.3% planned outage total, which is significantly less than the 1.8% we were experiencing in March and April 2007. I am proud to point out that despite those planned outages, and the small number of intermittent login problems and website 503 errors, in January we had the lowest percentage of Unplanned Outages to date - around 0.15% !
We had a slight increase in overall viewer crashrates for January, from ~ 20% to nearly 21%, but testing of the newer RC and Firstlook (Windlight) viewers is showing some very encouraging trends in both improved crashrates and FPS performance, and we look forward to those statistics becoming the norm when we roll these features and fixes into the main production release client. If you have time, please download and help test these RC and Firstlook test clients - the more customer data we have on how these clients perform on your hardware and network configuration, the better we can fix and tune them before they hit prime-time, and please don’t forget to log any bugs you find in our Issue Tracker.
Average Viewer performance as measured by Frames Per Second increased slightly in January, and we look forward to some of the performance boosts built into the Firstlook viewer to help boost that even higher once it is released to mainstream as well. I’m curious to hear your perspective - are you seeing better or worse client performance with Windlight? From an aggregate statistical standpoint, it appears that your experience is better, so let’s hear your story.
The Percentage of time with regions below FPS thresholds also decreased nicely during January - not quite reaching our alltime low of October 2007, but coming close. I look forward to the work on Havok4 and Mono completing and helping drive this even lower in the near future.
If you have questions on these metrics please join me at my Office hours this Friday at 10 AM PST, and this week I am experimenting with holding a second office hrs at a time more suitable for our European customers on Thursday morning at 8 AM PST (i.e. 4 PM GMT), or as always IM me offline with specific questions. Thanks, and I hope to see you soon!
Hello Everyone!
The Service Quality Metrics page has been updated with stats through the end of December, 2007. Charts are directly visible on that page and downloadable versions of the detailed daily data for the full year are available in Excel, OpenDocument, and Google Doc formats.
December had its share of small challenges, with our unplanned and planned outages growing slightly over our November stats, though as Philip noted overall for Q4 we are staying over 98% availability - a great improvement over Q3.
I am pleased to announce that for the first time all other quality metrics - Viewer crashrates, Viewer performance in FPS, and Region performance in FPS, improved steadily in December and we look forward to further changes and fixes continuing this trend. I especially appreciate everyone’s enthusiastic participation in our Beta grid testing which is bringing Havok4 closer to fruition, and in testing the beta Release Candidate and First Look viewer versions, so that our production releases of these changes will bring smoother and more reliable user experience. Please keep up the participation, and keep that feedback and JIRA bug reporting coming!
If you like, we can discuss these Service Quality metrics and our Economic Metrics inworld this Friday morning 10 AM PST at my Office Hours in Beaumont. See you soon!
Hello everyone, please note that our Service Quality Metrics through November have been posted at our Secondlifegrid.net site, along with the detailed data in Excel, OpenDocument, and Google Docs formats.
Overview: The Service Outage chart for November reflects the problems that we suffered in the latter two weeks of the month, with over 1.2% unplanned outage, though the Planned outage metric is still excellent, at around 0.3%.
Otherwise good: Viewer crashrates are slightly down (good), viewer framerates are slightly up (also good), and for the region FPS data, while the percentage of regions impacted is up (not so good) slightly for the moderately busy regions, it does remains steadily less than 0.5% for the heavily overloaded servers (good).
Fine Detail: there is a sourcing issue with the details of viewer crash data past Nov 20th specifically with the counts of sessions “disconnected from a region”, but the high-level average on the chart is valid across the month and we do have the history of the actual viewer crashes (an area of strong focus for us in Engineering) during this time.
Related: Yes, I will be posting the updated economic stats under a separate blog post today.
The Second Life 1.18.5 Server release included updates for several systems, including new python libraries, backbones (a piece of infrastructure which handles a variety of services, such as agent presence and capabilities, and proxies data between systems), and simulators. The deploy as planned for November 6th did not require any downtime – all components could be updated live. We planned to perform the rollout per our patch deploy sequences: updating central systems one by one, then simulators.
Read on for the day-by-day, blow-by-blow sequence of events which followed…
(more…)
Hello everyone!
We have updated and expanded the Second Life Service Quality Metrics page through the end of September 2007, and as promised, we’re including the Viewer and Region crashrates and performance statistics on that site instead of in the Economy metrics where we temporarily posted them in July.
Highlights of the new data include 3 monthly charts as well as detailed data at a Daily level from Jan through September 2007 in Excel, OpenDocument, and Google Doc formats. One interesting addition to these data file formats is metadata in the headers - click on the report titles or field titles to be hyperlinked to the data’s business definition on our Metrics Glossary wiki page.
The overall Service Availability chart shows a serious improvement (decrease) in Unplanned Outages for September. We are also working to reduce the impact of our Planned Outages and look forward to keeping the level of Unplanned outages down consistently over time, as Ian discussed in his blog post on availability.
Each of the 3 new charts represents a subject area that Linden Lab is focusing on improving. Viewer Session crashes are being improved first by focusing on Region Sessions terminated, which will be reduced along with Region crashes. The Havok4 project as well as several other projects should help significantly with those. The client-side crashes are also being addressed, and recent updates to our Viewer and Region code are improving the error logging so that we can focus on the most significant sources of those crashes.
Viewer performance in average Frames Per Second (FPS) benefited most this year from our Pipeline changes in April, and residents who are experiencing poor client FPS can most benefit from ensuring they are using at least Minimum Supported hardware. More information will be forthcoming about the average FPS for the graphics cards you are using, which will clearly demonstrate that the unsupported cards are the source of the poorest user experience and highest “lag”.
Region performance in FPS is one area we have been focusing on closely. The Havok4 engine should help improve and stabilize this environment, and other Linden Lab developers have found other system improvements to bring these figures up before the Havok change, so you should expect to see some improvements in the October figures. If you are a private estate owner interested in managing your own region’s FPS, you can get more information at our new Region Performance Improvement Guide wiki page.
Thanks for everyone’s attention, and we be back with more Service Quality Metrics improvements next month.
Last month, Ian Linden described Linden Lab’s efforts to improve grid stability and there has already been an improvement in September’s unplanned outages, although it’s much too early to declare victory. However, one thing that was not discussed in detail in that blog post was what we are doing about Resident inventory loss. Residents have shown their frustration with inventory loss via numerous emails, calls, support requests, Office Hours discussions, Town Hall Meetings, and Project Open Letter. In response, we have begun an Inventory Loss Reduction Initiative within Linden Lab. There are currently a number projects under this initiative, which I’ll describe in this post.
Currently, the Second Life inventory consists of over 1 billion unique Resident assets whose size is 98 terabytes on disk. Each month over 15 terabytes of new data on disk is created by millions of inventory transactions. The Second Life Grid is large, with over 900,000 unique Resident logins each month across over 14,000 Regions. As a percent of all inventory transactions, the rate of inventory loss is low; however, when it happens to a Resident, it can be devastating. The primary challenge with Resident reported inventory loss is that we often cannot verify precisely where in the complex inventory system it occurred. As part of our ongoing efforts to focus on stability and performance, we have begun a Reduce Inventory Loss Initiative, which includes the following internal projects:
* Metrics Instrumentation
* Region Crash Reduction
* Asset Collection Improvement
* Bug Fixing
* Resident Reported Inventory Loss Analysis
* Architectural Enhancements
* Perceived Inventory Loss
(more…)
|
off