Second Life Grid availability in April

Friday, May 9th, 2008 at 6:11 PM by: Ian Linden

Concurrency graph during outage

We’ve just updated the Quality Metrics page, and the numbers show what you already know: April was not a good month for Second Life Grid availability. Our internal outage tracking tool estimates that about 630,000 usage hours were lost to global system failures over the course of the month, which is about 1.9% of the total (up from 0.06% in February and 0.22% in March), and resident surveys clearly indicate great unhappiness coinciding with these failures. (We define lost usage as how much time Residents would have spent logged in but did not, due to Grid failures; it is meant as a global availability metric and does not cover local failures like sim crashes, inventory problems, and the like. See actual[black] vs predicted[blue] concurrency graph excerpt, right.) I’d like to address the causes for this, and what we are doing about it in general terms.

What happened, and Why?
So, on to why April was so tough: virtually every piece of mission-critical Second Life Grid infrastructure failed catastrophically at least once during the month. Here are the biggest sources of downtime; I list these not to shirk responsibility, but to illustrate the near-perfect storm:

  • Intra-Grid Network By far, the largest event was the near-total loss of intra-grid network connectivity on April 4/5, caused by a line failure within the backbone of our primary network provider - who delayed, botched, and again delayed the fix, extending what should have been a relatively brief outage for many hours… and then the failure was repeated on a smaller scale on April 25. We’re working with the provider at the executive level to address the risks that allowed this to happen.
  • Central Database We encountered a new crash bug in our central database, resulting in a series of four database outages. We have subsequently identified a work-around, and should be able to avoid this particular crash in the future. In general, eliminating this cluster as a scalability bottleneck and failure point is a very high priority for Linden Lab.
  • Asset Storage Cluster The asset storage cluster crashed during routine maintenance; this was a repeat of the crashes that afflicted us in December and January. The vendor had previously assured us that this problem was fixed, and so we continue to work with them to investigate the cause and potential solutions.
  • Data Center/Transient Data Services Our data center in San Francisco cut power to many of our most critical servers over the course of two days. We had advance notice of this one, but working around the power cuts did cause some disruptions. This was a learning process as well, and we’ve refined our processes for dealing with events like this.

For more information on the systems that make up the Second Life Grid and how, when they fail, they impact the platform, please check the Service Disruptions page. Because component failures are never wholly avoidable, our goal (and the goal of online services everywhere), is to reduce the extent to which the Second Life Grid is affected by the loss of key systems. With a platform this technologically complex, adding the necessary redundancy and failure management is a long process, but we have not been standing still: on average a database crash in April cost about 14,000 lost usage hours, vs 53,000 hours for a similar crash last August (at a time when substantially fewer people were using Second Life). We will continue to reduce the impact (on both logins and in-world functionality) of these crashes, until they cease to be significant, at which point they can come off the Service Disruptions page.

Clearly, though, there is still a great deal of work left to do. Our long term strategy includes specific plans to eliminate the risks associated with all of the failures I listed above. We make progress every day, and continue to hire the best and brightest technical staff at our offices in San Francisco, Mountain View, Seattle, Boston, and Brighton. (See our hiring page.) In the meantime, our service record is not perfect but we are confident that we have identified some key areas to improve and will continue to move forward.

Finally, I’ll mention our new coalesced status reports page, which replaces system status updates on our primary blog. Look there for all Grid-status information, including information about upcoming scheduled outages: http://status.secondlifegrid.net/

149 Responses to “Second Life Grid availability in April”

  1. 1 U M Says:

    Woohooooooooo Someone spoke truth!!!!!!!!!!!

  2. 2 KMeist Hax Says:

    best blog post ever. it explains everything.

  3. 3 Tired Sim Owner Says:

    And I get compensation how? _ don’t offer open sims which just add to long term costs . I want cold hard cash adjustments for lost simaccess and sales revenue.

  4. 4 U M Says:

    Shocking yet pleasing to see truth and these matters. Then agian Ian Linden is that type of Linden. Lets make this a Habit LL ok :)

  5. 5 Smokey Newman Says:

    Erm we allready knew the stats would be down.

    So what are you going to do to compensate customers. Yes we are customers and we have customers we cant just show them a graph and explain technical issues. Please listen to us all and give is some compensation.

  6. 6 urewshmycmd Says:

    you know, no matter how good or bad the news is for something there is allways those that have to complain about what could be. If those of you really wanted to make SL better you would let the lindens do their work and stop complainning everytime something does not go your way. it amazes me the people who post here in the blogs and not 1 of them can do better than the ones working on the issue to begin with. i figure if anyone can do better i’m sure that LL would love to have you and if you feel you can do better feel free to take your attempt at an application.. who knows.. maybe just maybe you can do better.. if so *Clap Clap Clap*..

    ~Jamie~

  7. 7 Smokey Newman Says:

    Me again but after the What happened and Why should there be and how can we help you. Those who lost out on sales,those who lost tenants, those who still pay a full tier for a service that did not work.

    Thae fact is people have lost trust and LL are not compensating.

  8. 8 Ann Otoole Says:

    And when, exactly, will you implement a true transaction manager ala Tuxedo or equivalent mature proven scalable technology that will provide rollback and logging of failed transactions that, in turn, will automatically close the grid to any transactions when the system exceeds a threshold of failures and notifying/paging staff on the side while it is at it?

  9. 9 Malachi Petunia Says:

    “We define lost usage as how much time Residents would have spent logged in but did not, due to Grid failures; it is meant as a global availability metric and does not cover local failures like sim crashes, inventory problems, and the [teleport, search, and asset failures]”

    So in other words this is much worse than LL is actually admitting to, whilst blaming their data center, their database, the power supply. Given that Second Life Grid is being sold as an internet service, by their most charitable possible appraisal they are up only 23.5 hours of each day. And people are dazzled by this “honesty”.

    It is one thing when others use statistics to lie to you; it is another thing when you use statistics to lie to yourself.

  10. 10 U M Says:

    But how many times does LL Omit they had issues resulting in this bad of a downturn……….LL is going to have one ***** of a trick coming out of their shirt if to make the following month look real and upbeat….. Population stats are always fixed and lied to show resents from around the world who is logging in……That set of facts and stats will always be lied about

  11. 11 JB Kraft Says:

    Nice spelunking! :) I can live with not being able to log in. My ISP goes into the aether from time to time too. The real trouble however is when the grid *is* available. Perhaps superimposing a lag, inventory loss, failed transaction, no tp, client crash graph, just to round out the full experience? It would make that chart so much more informative; like a cross-section of the san andreas fault laid over the marianas trench with the himalayas for a background. Now that’d be purdy! :)

  12. 12 Aminom Marvin Says:

    This is a step in the right direction, but there needs to be more. Don’t just give us generals. Give us specifics: describe actual points of weakness in the SL system, their causes, and plans of action to fix them.

    The grid has gotten so bad that flying vehicle use is almost completely broken. Using a flying vehicle with nobody sitting upon it except the pilot, there is a region cross failure 1 out of 20 sim crossings, resulting in a need to relog or teleport.

  13. 13 Jacek Antonelli Says:

    Thank you for the honest and informative post, Ian.

    It’s great to learn of the causes for the major problems, and the actions being taken to prevent similar events in the future. Without this sort of information, we tend to cynically (and rather ignorantly) assume that some Linden spilled their coffee on a server rack, or unplugged the asset cluster so they could plug in their Xbox and play GTA IV. ;) I kid, but there’s often an air of “Ugh, the Lindens screwed up again!” when something goes wrong and isn’t explained clearly.

    It’s also very nice to hear responsibility being taken where it should. I would have been rather disappointed if this post had been honey-coated, head-in-the-clouds, “Everything’s groovy, you guys need to mellow out” PR fluff.

    So in summary, I thank you and salute you for your honest and informative post! As #4 said, let’s hope this sort of good communication becomes a habit. :)

  14. 14 Katt Linden Says:

    Folks, please be mindful about what you say here. Wild accusations are not appropriate.

    I hope that you can all appreciate this post, transparent and open about what was causing some big issues.

    Thanks.

  15. 15 Smokey Newman Says:

    Katt sorry

    But nobody is helping here we have not even heard the word sorry.

    We Pay Lindens and we are getting no customer satisfaction at the moment.

  16. 16 Damona Rau Says:

    As first, thanks for this informations Ian.

    At least there are some questions open:

    Why you have moved the Status Reports to a new website with disabled comments? It smells that you won’t any comments about the outages / failures.

    What thinks LL about a compensation for all the money lost we (the business owners) have had?

    Last, but not least, i’m missing the february in the quality metrics chart (the first chart here: http://secondlifegrid.net/resources/service_metrics).

    I would love to see a chart for the transaction / TPs / Groupchat and Notices outages, the primary things we need fully functional in-world for the economy. If i count all the hours where transactions fails, you will have more then 2,2% lost in April.

    Your on the right way, but there is very long way to go for LL with some changes in the strategy. You can’t hope that we have more patience or that we can accept more excuses, too often the same failures occurs. The symptoms affects the residents but LL have to work on the causes. All the “Resolved” posts are nothing more then “we have fixed the sympton, but we don’t know the cause”, isn’t it?

    I like SL and there is still a little bit hope that things goes better. Since February there is more anger then fun with SL.

  17. 17 Damona Rau Says:

    @14
    I’m sorry Katt, but why i should appreciate a normal behaviour like the post from Ian? As “Service Provider” your in charge to keep your customers up-to-date about the failures and outages.

    The residents pays your salaries, please keep this in mind. So it should be one of the important points to keep your customers happy.

    As example, my Internet Service Provider offers compensation when there is an outage from more then 4 hours. Ok, it’s a german company and in germany it’s a normal behaviour from the companys that the customers gets an compensation when failures occurs (and, btw, they are faster with explanations, what kind of problems occurs). And the next thing, when this companys writes “Resolved” then it means resolved and not, that the same failure appears some days later again.

    Maybe it’s better if you write “Fixed”, because resolved is nothing really, isn’t it?

  18. 18 Darek Deluca Says:

    People;
    Read your TOS. You are not have any right to compensation!

    Katt;
    Give it a break.
    Frustrated customers who have no recourse except to quit need some outlet.

  19. 19 Almadi Masala Says:

    I can live with lag, login problems, and transaction, TP, and IM failures that come and go — but permanently losing a big chunk of my inventory is much harder to accept. Since being hit with such a loss, my use of SL has fallen off to almost nothing. Many of my favorite items are no longer usable, and I don’t trust the system enough to go to all the trouble of replacing them — knowing that they all could disappear again at any time. A month ago I was a very enthusiastic user of SL. Now the fun is gone, and I am seriously wondering whether it is worth staying. Will the inventory database ever be reliable? That is the main question that will determine whether I continue to use SL.

  20. 20 Katt Linden Says:

    I’m sorry any of you have experienced downtime and problems.

    Seriously.

  21. 21 DISGUSTED IN LINDEN LABS™® *!@$!! Says:

    lindens could at least recover our classified payments for the month of april ………………many people pay 1000s of real life dollars a month for these classifies at least extend them out for us for a few weeks its the least you can do and i’m sure would make people a little more happy Lindens did this last year they extended our classifieds when grid and search was down it is the least you can do to take a step in the right direction. It may not make everyone happy but it would apease the crowds that at least you care something about your customers because at this point in time we are feeling a tad more than TICKED OFF

  22. 22 toybodacho Ireland Says:

    I have always wondered why Linden Labs won’t test their so-called “upgrades” (I personally call them “downgrades” ;) on a few special sims for 3 to 4 months before they apply them to the entire grid. That way they could detect most of the bugs before making us suffer through them.

    @16 Damona: My guess is they moved the Status Reports page to keep the “dirty laundry” out of plain view.

  23. 23 Netsuko Yoshikawa Says:

    Thanks for the heads up LL. I can understand that some residents are getting angry, but I also understand that it is not always possible to maintain a 100% uptime in a system that has grown so large and diverse like the SL Grid.

  24. 24 Kahni Says:

    Thanks for the breakdown on the information.

    There’s a lot of work to do yet, but being honest about numbers like this, and laying it out like this is refreshing. Much more so than “we’re banging on things”.

    I still think there should be tier refunds based on down-time though.

  25. 25 Paulo Dielli Says:

    Yes, we are complaining a lot. But to be honest, there are enough reasons to complain. I have the utmost appreciation for the concept of Second Life and the efforts that it takes to keep it afloat. But it really needs to get more reliable.

    It’s very good to be open about metrcis. In fact, it’s great. But from now on: please just focus on grid’s stability. At the moment we have all the basic things we desired, just keep it with that. And communicate with your customers what you are doing to improve stability. That’s all, nothing more.

  26. 26 Ash Meersand Says:

    Three weeks ago lots of library items from the sign-up outfits like Nightclub Female went missing. I spent 12 hours researching the “missing items” phenomenon and became very worried about the safety of my items. Then I decided to make a “missing from database folder” and went on with my Second Life.

    Two weeks ago I was unable to log in. It said it couldn’t find the domain a week ago. I used an alternate, four times slower, Internet service after an initial six hours of fruitless research. Ten more hours of JIRA scanning led me to do the Vista host file changing trick, which worked.

    Less than one week ago, I installed Release Candidate 6. I visited Torley’s Watermelinden Land, played around with Windlight for the first time for a few days, then teleported away; all the terrain on the mainland was the same as Torley’s. I tried researching if I could bake terrain textures like avatar clothing textures. while being harassed by griefers pelting me with couch sized-bullets and pushing me into the next sim.

    I put up with the faulty terrain textures and crossed a simulator boundary in a heavily populated area of the grid for a trivia contest event. Then my character kept walking out of the building on air and getting disconnected from the sim. This repeated three times before Release Candidate 6 said my Internet connection was faulty and offered to teleport me home, which didn’t work, and thereafter I was told “despite its best efforts” I could not connect to the Grid.

    I uninstalled RC6 and re-opened 1.19(4) on my computer, through which I could connect to the Grid and see textures just fine. I played around in the Plum sandbox happily building intricate connected tubes to send spheres through. Then today I looked at the Grid status page and it said they were having severe networking issues.

    I tried connecting today and was told despite “their best efforts” I couldn’t. I used “my best effort” my researching for about eight hours today and yesterday how to get around this. Going directly to login.agni.lindenlab.com with a browser suggested something related to not being able to be redirected with the fault being in the webmaster hands.

    I went to JIRA and filed a detailed bug report, but my connection kept timing out for about two minutes when visiting the website every time after I pushed “submit” and my report was lost. At least this time I’ll have saved my “log” in Notepad.

    I have given up and uninstalled SecondLife. I will reinstall it and try again after I have seen it go three days without asset server, grid, or log-in issues. That sounds more than reasonable to me.

  27. 27 Verdana Klaar Says:

    A good number of comments here can be summed up as follows: “I want my money back”.

    This means that the main facet of Second-Life has now shifted from an environment that was a mix of fun and creativity (together with some small business) to an economical system at the detriment of nearly all the rest, where residents log into to buy land, sell parcels or open shops filled with items of any kind (which they have not created themselves for a majority).

    The point is that - just like anything in SL - this new facet of the metaverse has been created by… the resident (your world, your economy).

    Also, if i am not wrong, no one in LL has ever forced nobody to get a premium account, invest some hundreds or some thousands dollars for a land, a shop, or alike.

    In the end it looks somehow as if a category of residents had willingly placed their savings on a stock exchange place, then had lost all because they choose wrong options in investing, and… afterwards call for a refund.

    Some residents are funny.

  28. 28 Jayme Llewellyn Says:

    Nice to finally see some honesty from the ‘labs. Maybe now they will realise that what needs to be done is a sever upgrade of the grid for stability and the instillation of police program to detect any bugs and viruses. I’d suggest AVG and Adaware …. but what are the chances of Linden Labs listening to a customer?

  29. 29 Cocoanut Koala Says:

    I do appreciate the straight-forward and informative post! Let’s hope these days are behind us!

    Nonetheless, I agree - reimburse us.

    coco

  30. 30 Partington Gould Says:

    A nice Honest post. Thank You.

    Grid Status reports on another Blog, not an issue for me. I see 3 Status reports now at the top left of this page. I see the status reports when I log in (RC6). I see no reason to comment on status reports.

    This is just me, I don’t claim to speak for everyone.

    Thanks again for the honesty, it really is a step in the right direction towards what others posting here are asking for.

  31. 31 Darien Caldwell Says:

    Ouch. that does look like a nasty graph. There were also a number of issues which revolved around several failed attempts to deploy new versions of the server software. I didn’t see that mentioned. Perhaps the contribution was small, but it did have an impact on overall grid conditions. Otherwise, a good summary of April.

  32. 32 T N Says:

    “The asset storage cluster crashed during routine maintenance; this was a repeat of the crashes that afflicted us in December and January. The vendor had previously assured us that this problem was fixed, and so we continue to work with them to investigate the cause and potential solutions.”

    Is that your Isilon cluster? As a potential customer, I’d like to know that.

  33. 33 Shai Khalifa Says:

    Kat, if you can’t say anything constructive, it’s probably better not to say anything at all.

    There are some very very legitimate concerns expressed here - and a trite “I’m sorry any of you have experienced downtime and problems.” with the word “seriously” added as an afterthought - is more insulting than anything else.

    The metrics displayed here should be cause for deep concern at LL, and if a professional attitude is to be displayed, some compensation for lost time should be offered. In fact it should become practice.

    When the grid is either down or is in some way disabled (particularly stale transactions) a pro rata compensation should be offered in terms of delays in the resident payment dates of Classifieds, and premium memberships - and tier payments should be provided by LL.

    That way, LL is forced to ensure that there are emergency processes in place, and scaling projections resourced to minimise the financial cost to LL.

    We pay for service, you should repay for lack of service provision.

    Down time should not be classified only as time the grid is not available for logins - but should include time when essential aspects of the software are unavailable - such as transactions, tp, group IM, Classifieds, Search etc.

    And these stats should be readily available on a daily basis to those who pay the exceedingly high rates LL charges for the ability to create the exceptional content everyone enjoys.

    If, according to the TOS, LL own everything we create - then for us to create it, there has to be some form of recognition, and if that is simply by compensating people when they can’t create or manage their creations it would go some way to ameliorating some of the issues we have to constantly deal with - it should be a simple solution.

  34. 34 Jayme Llewellyn Says:

    Just thought you should know - though most everyone here probably knows this anyway (everyone bar the Linden’s, that is) - May is starting as a yet another great month for stability. I was on last week with my other avi and I experienced five crashes in my three hours of SL-time. Today, I have been on with my main avi and have had three crashes in an hour - I cannot seem to get past 18 minutes in Real time without crashing. There are some serious problems with the grid stability and Lindens really need to fix it. The above graphs are telling us that.

  35. 35 Massively.com SL news Says:

    Linden Lab loses 630,000 user-hours in April…

    Linden Lab’s published their Second Life service quality metrics for April and the results are about as poor as you’d expect — April was a poor month for the virtual world service. 630,000 user hours were lost to global failures, and that doesn’t c…

  36. 36 Katt Linden Says:

    @22 toybodacho: As @30 Partington notes, the top three Status Reports show up in a feed at the top left side of the page. You can also mouse over the feeds there (just checked it with both Firefox and Safari) to read the initial text in the update.

    You can also subscribe to the RSS feed http://status.secondlifegrid.net/feed/ or to the Twitter https://twitter.com/SLGridStatus.

    They are also always in the Login screen in the Second Life viewer, so each time you log in you see the latest.

    @12, Aminom, and @26 Ash, and the rest of you, I do hear how frustrating it is.

    @23 Netsuko, and Paulo, Kahini, Verdana, Jayme, Cocoanut, Partington, and the rest of you — thank you. I’m glad you’re here.

  37. 37 Umbra Lunardi Says:

    While this kind of transparency is a welcome thing, the level of service we experienced over the past month was abysmal. No amount of explanation and “We’re sorry about the problems” compensates the citizens who pay your salary. I pay a mere $95 USD a month to participate in SL and I can safely say that I did NOT get good value for the money in April. I can’t imagine how some of the people who pay many times more than that feel. Sure, the TOS says we’re not entitled to anything, but I agree with many others that a refund of some kind would go a LONG way to creating some customer satisfaction.

    I’m at the point of abandoning SL because Linden Lab is no longer delivering the service I believe I should be getting for what I am paying. I just don’t care about SL anymore. What a shame.

  38. 38 Sindy Tsure Says:

    Nice write-up, Ian. TY! As annoying as problems are, it’s always good to see some kind of post-mortem afterwards (and thumbs up to the other Lindens who do it, too!). :)

    Also, I’m getting a File Not Found from WordPress when I click that graph excerpt… :(

  39. 39 Dahlia Trimble Says:

    Thanks for the detailed blog post, it helps me feel less frustrated when things arent working and I wonder if the issue is being addressed or not. I understand that things fail and that it’s hard to maintain 100% uptime 24/7 especially when you are dependant on so many things that are outside of your realm of control. Glad to see that you are working to improve the reliability of the grid.

  40. 40 Taft Worsley Says:

    Who are you hosting with and using as a backbone service to have such mission critical outages with no warning and no resolve - sound like its time to get off the value host and get a real data hosting facility with a backbone rpovider with some redundancy.

    Funny the last time i recall your data facility had total power loss and had a longer failure due to an issue with the genorators - real backbone providers and data facilities - first test there fail over systems before fail over happenes and two — have more resources to re route and move traffic at a whim- The only time you run into the issues your company has is when your cheap! That means your raking in the dough for the investors but your spending very little or not enough on mission critical services - Answer me this question, how come other companies both large and small don’t run into host and bandwidth issues like second life has blamed on these vendors.

    Funny how its always the next guy - seems that the IT business is full of people who love to blame others - just seems funny to me that companies large and small can have 99% uptime with millions of transactions, yet your host has a outage and it take out your core business — DO YOU REALLY THINK PEOPLE ARE THAT STUPID!!! Give us a little credit - If your host has had issues its not the first time and since its your company life life to make cash - maybe its time to tell the investors to wait for a return on their money and to spend some time getting this system to consistantly working - If that means changing data providers or leaving facility not able to handle your business - so be it — but i have never seen a company as troubled as yours and then turn around and blame the very vendors you keep paying for the shabby service - BET YOU GOT A REFUND FOR YOUR OUTAGE - SEEMS WHATS GOOD FOR THE GOOSE IS NOT GOOD FOR THE GANDER — as always with this piss poorly run company!

  41. 41 Happi Homewood Says:

    Fact: SL is still the most stable platform of it’s kind.

    SL is pioneering technology. We should expect these bumps in the road. I, for one, will not leave, I look at the long term possibilities, instead of focusing on the current problems, which are really not that bad, compared to problems in the past.

    @the ppl always telling the Lindens how to run their business:

    Why not apply for a job at LL, if you are as good as you project yourselves, I’m sure LL would hire you right away.

    @LL: Excellent, full transparancy will serve you well in the long run :)

  42. 42 Sekonda Huet Says:

    Stop faffing around sponsoring dynamic control input devices and focus on fixing the thing.

    Ever since the 1.2 release client was deployed things have been horrendous, not only is the client horrific in every way and there is no easy options to revert to the previous function which many people for many reasons have expressed a great distaste for.

    I get that you can’t prevent “Acts of God” if you will. If the power cuts you can’t stop it. It’s not like Linden is randomly unplugging the servers for the giggles. It hurts them too and costs them money not just us.

    More money desperately needs to be poured into hardware sustainability and less into new mainland areas that servers won’t be able to cope with anyway.

    I do think tho that Linden should always consider reimbursing customers who pay your company. When Blizzard Europe messes up it compensates players with game time.

    Why not toss a few hundred linden towards customers here? What’s a dollar to a company making millions daily…

  43. 43 Seth Ock Says:

    Thanks for the info, Katt. I honestly don’t know how you folks can resist telling some of these people who post here to kiss your shiny Linden asset server, but then I suppose that might be perceived as bad public relations or something. My experience with SL is a bit more minimal than many others, however. I make things largely for my own amusement and curiosity and sell very little. To some degree, this is the result of the volume of complaints that decry the dependability of things like the inventory database, or the transaction exchange system. Both of which have cut out on me even with my part-time usage. Add to that a viewer that feels clumsy and leaks memory like a fat man sweating in the desert, and my opinion is that SL is a cool idea that almost succeeds.

    But like Virtual Reality back in the late 1980s, it’s very much in danger of alienating its supporters as being an idea that’s ahead of its time. The technology just doesn’t seem to be there for this sort of idea just yet. With home computers that just barely have the power to support a virtual world, bandwidth that can’t handle the data loads for more than cartoonish representations, networking that grows increasingly unreliable as it scales, and databases that have to cope with creativity and experimentation on an unprecedented volume, the number of failures that can occur between my eyeballs and your vision add up to being almost insurmountable today.

    Still, I hold out hope. I hope the viewer eventually stops leaking memory and gets cleaned up. I hope the database issues get resolved. I hope asset transactions become more reliable. I wish you luck, because I think virtual worlds are a useful idea that will one day be exactly the kind of cultural backbone Philip and Mitch envision them to be. I hope, even with the minimal investments I have made, that SL will be the recognized leader of virtual worlds.

  44. 44 Solomon Canning Says:

    Ian & Kat — Thanks for the informative post.

    May I please suggest that network delay and loss stats also be tracked and included in the future? The region viewer network link is an important part of the user experience, too.

    Also, I see that region crashes are way down (presumably because of Havok 4), but the total of region crash + viewer crash seems to be unchanged. Did SL viewers really get that much worse, or is there something else happening there?

  45. 45 Sling Trebuchet Says:

    @14 Katt!!

    Ian is making wild allegations
    “our primary network provider - who delayed, botched, and again delayed the fix,”

    Tell him to stop.
    Are we there yet? ;)

    Seriously though, the “brown-outs” in functionality when the grid is at near-failure are probably more corrosive of satisfaction than is down-time of the graph.

    “We define lost usage as how much time Residents would have spent logged in but did not, due to Grid failures”
    That does not encompass stuff not rezzing, failed TPs and transactions, inventory going poof. This must be causing ‘time Residents would have spent logged in, but did not, due to leaving SL.’
    I’m not leaving of course. I love SL - although sometimes it’s like trying to love a particularly difficult hormonal teenager.

  46. 46 Insky Jedburgh Says:

    I want to commend a change in procedure I witnessed today, and feel is reflected with the information laid out in this metric today. When asset servers started to choke, rather than wait until it was crisis level and dump everyone out, or hide behind silence, you made an announcement detailing the problem.

    This allowed us to make informed decisions regarding our actions during the mini crisis today. Please, do that more often. It’s far less insulting than being left to struggle on our own.

    The report in this metric is a good step in the communication direction. There was no way you could have candy coated April though, so it was your only choice. Still I woudl rather hear bad news than no news so I can assess the situation.

    This platform is not what it needs to be, to gain trust among individual and corporate participants. SL is still far too unstable, and prone to unpredictable changes that often lead to failures, to be faithfully used as the resource it could be. Please keep the majority of your focus on functionality for now, and perhaps get the latest browser fully functional and leave it that way for a while. People are getting tired of the constant adjustment to what seem like arbitrary additions of new innovations, that don’t address real issues.

    It would be nice to have a system that we could get familiar with for a while.

  47. 47 Maya Bogdanovich Says:

    While I can agree with some of you that Linden Lab (and please, not the Lindens themself) did its mistakes of course, I find hard to consider some customers accuses appropriate.
    A fast growing system (with massive usage) requires fast growing company politics, but also requires more partners. The more partners (network providers, support teams, outsourced data centers etc.), the more risks something goes for the wrong way.
    So of course LL has to take full responsability for the service they offer (even if the service provider is found guilty), but this doesn’t allow anyone to humilate these guys (and it happens here more and more often) who work hard just like anyone else and certainly are under a big pressure.

  48. 48 Jerry Says:

    Sidewinder can fix anything , you folks otta give the man a raise and let him take charge . I don’t see anyone else that can step up to the plate like that man does , he should be running SL .

  49. 49 Very Frustrated Owner Says:

    Katt - PLEASE!

    Stop saying how sorry you are and take action!

    People paid Linden Lab thousands of US dollars in April for a service they didn’t recieve. The stats above only mention downtime. I am sorry but if you are going to promote and service an economic platform, having no transactions for vast amounts of every day, doesn’t lend itself to a service that is being provided.

    If my ISP network was down for these amounts of time, I would get compensated, why will Linden Lab not compensate people for the lack of service in April?

    We don’t want apologies anymore (nice at the start), we want compensation and a grid that works. You have lost the confidence of thousands of people and its time to change that. Its been 6 - 8 weeks of problems now, not an acceptable timeframe by any account.

    Also in that time you have slashed the cost of islands, meaning hundreds of people have lost money on their land and now you are planning to change the core principle that the search system is working on, i.e. traffic. Did you intend to press the self destruct button? Either way you have scores of customers that are not just unhappy, they are hopping mad with Linden Lab.

    I would like you to come up with a compensation package for April for everyone that varies in % rebate of tier. After all what other business is allowed by law to charge for a service that isn’t recieved.

    I think the “sorry” card has been over played. Its easy to say sorry, its not easy to go to your CEO and say we have thousands of unhappy customers that feel we are breaking the law with charging for a service that wasn’t provided, we need to look at putting together a compensation package and communicate an action plan to restore confidence in our abilities to provide this service.

    A tough call Katt - but the future success and confidence in Linden Lab rests with your abilities to fix this.

  50. 50 Dinohunden Paine Says:

    About time LL found out, that we’re pretty pissed over the downtime and unstabillity. Now we only wait for more serverpower, preferable NOT in the US but also in Europe, to we don’t get those ridiculous ping times, that we have. Drop the new features for 3 month, and work only on the stabillity problem, it’s pretty urgent.

  51. 51 Chavi Skomoroklov Says:

    Click on the graphic and the answer is “404 — File not found.”
    And yes, at my company we had a department that was shooting all new kind of ideas into the air, but when it comes to customer satisfaction, it’s all about GET THE BASICS RIGHT. :-)

  52. 52 Flew Says:

    44 wrote: “Also, I see that region crashes are way down (presumably because of Havok 4), but the total of region crash + viewer crash seems to be unchanged. Did SL viewers really get that much worse, or is there something else happening there?”

    with H4 sims may not crash that much anymore, but it does cause mega lag now. I prefer the good old sim crash to be honest…and just move to an other region. The giant lag in some regions makes me more frustrated then being in a sim that crashes.

    Also since wind/light (or its the viewer) the memory usage on my pc has gone up a lot and and frame rates have dropped a lot. And if you complain about it with support they just ignore you.no response what so ever. even though i gave detailed info about my pc and what problems exactly i am experiencing.

  53. 53 Actingill Igaly Says:

    Ian/Katt

    Thankyou for finally posting this long overdue blog. At last, all those that highjack every other blog with crys of “Dont tell us that, fix the system” have a legimate place to voice their feeling.
    I dont believe anyone who understands how cutting edge SL is will expect 100% reliability, stability and uptime from the system; however over the past two months I have watched people get more and more bitter and twisted with each incident. This I suspect, has more to do with Linden Labs silence on the subject than the actual failures themselves.
    Hopefully this will calm the vast majority down. Acknowledging there is/have been problems and providing a place for people to make suggestions or let of steam should have been done a long time before now however.
    As to refunds, I dont want or expect one. I would much sooner the money be used to improve the infrastructure than to temporarily placate the hardcore moaners…..
    Please keep up this level of communication on stability/grid issues going forward.

  54. 54 U M Says:

    Anyone thats been around for over 3 years will tell you Ian is indeed on of the most outstanding Lindens when it comes with dealing with the users. So give him a little break, atleast he didnt lie to use about the numbers and try to make things looks rosey when infact they were not…. Thank you again Ian

  55. 55 Craig Altman Says:

    The biggest prob for me during April was the daily transaction failures causing people to pay for things and not receive them, often there was no inworld warning this was happening so I took to closing my shops when IMs started to roll in with people angry they didnt get items they paid for, this was complicated by the fact you cannot actually close shops in SL, people can zoom past ban lines and still buy items, which they then dont receive.

    What I would like to know is when you know you have these problems, firstly can you inform people inworld(few read this blog), secondly if L$ transactions are failing can you disable the L$ balance, you have done this before(maybe not deliberately), the balance would read “loading” and any attempt to buy items would be met by “insufficient funds”

    If you disable transactions this way you can globally prevent people losing money during these problems, I think its the lesser of the evils here.

  56. 56 Elwin Jacobus Says:

    Thanks Ian for an informative post. It’s good to see public recognition of what have been some pretty catastrophic failures. What is better still is that (and this is just my interpretation of your post, clearly as can be seen in the feedback others disagree strongly) there are signs that all of these issues are being seriously worked on.

    I (and pretty much everyone here) look forward to a more stable grid. Let’s hope that the team at LL can deliver this sooner rather than later.

    The honesty and openness that has been a feature of the blog posts lately is very refreshing and encouraging.

    NOW! Back to the salt mines with you all and go make it all work as it is supposed to.

    /me taps toes impatiently

  57. 57 Taft Worsley Says:

    47 Maya Bogdanovich - Obviouisly you must not have any money sunk into this game that has been wasted on downtime - transaction failure, useless classfieds and lost inventory - Funny when you don’t waste your hard earn money how easy it is to talk smack!

    53 “Actingill Igaly Says: I dont believe anyone who understands how cutting edge SL is will expect 100% reliability, stability and uptime from the system” Your right but even 90% is better then what we have been getting here.

    Funny i manage a 4 node cluster of 4×16way SMP systems which does about 273,000 transactions a minute grabbing data from 480 stores nationwide - while it hiccups once in a blue moon — never less then 98% a month including patches and update downtime - and never do i see outagous due to a service provider issue as i said thats why they make redunant links and have a BPG routing protocol. But i guess the toys they have spent their money on like i said before are cheap and must not provide a BGP option for routing.

    Bottom line TOS or not — the people who have invested thousands of dollars in this platform have everyright to complain about this shabby service - it plain out sucks and yes you should be intitled to a refund because as its been stated — anyway you want to candy coat it in a TOS - this is a service and people have not been getting what they pay for out of this company - some of us longer then others.

    Seems like there are more and more people leaving and yet LL doesn’t seem to care — I personally have been reading these blogs for over two years and you know what in that time LL has listen to the complaints all of about zero times - so don’t preach to me about how good this company has been — do your homework and look at the years postings and see how many times they have said sorry only to tap the teir till without fail - funny how they get there money and you get squat - like i said in my last post — I’ll bet dimes to donuts — they got a serious refund from both the colocation facility and the backbone provider if indeed it was there failures to cause outages - but hey WTF do i know i have only been in the IT business 15 years and no nothing about SLA’s!

    All it takes is one person to open the legal gates and this smells of class action lawsuit no matter what the TOS says — BTW the courts have ruled once already the TOS was invaild —- So its not bullet proof.

  58. 58 Vivienne Says:

    Katt, the stats only prove what all of us already knew. At least the resons for the outages are transparent, which is progress and an important step towards more transparency. And hiring more and first class technical staff for the Lab is certainly nothing anyone can complain on.

    Nevertheless there are a lot of technical problems additional to the network caused ones. The client software is overfreighted by hardware killing features and very, very buggy. Some server software code needs reconsideration.

    On the communications front: Offered Open Source additions do not really find their way into the LL schedule. Serious bug reporting by residents fails because LL obviously neglects communication on the recommended level, which is the public JIRA (it seems that only a few Lindens find their way to there adressing reports).

    What SL needs is a open channel for dicussing what residents really need, and this is usability. Usability can only be achieved by some kind of effective quality control. Quality control, from the oow. of residents, means that no code should be thrown on the community without being cross checked by dedicated residents who really WORK with the software. Most of the wild and often justified complaining on the official and non.official channels could be prevented by giving residents a word on what is a senseful and working improvement for their daily second life BEFORE an official or RC release. And taking them serious. I am convinced that LL employees want to make this better, so do residents. But in the end the “felt” quality is the one which matters for the user, not the “tech” quality. An open channel and a restricted, selective betatest program which includes direct communication with the Lab and some authority on further release politics would help LL much more than f.e. any RC candidate which, buggy as it is, only contributes to even more nasty complaints and frustration (even if it adds some noteworthy improvements, too).

  59. 59 Carl Temin Says:

    I would like to ask if owners of businesses who had difficulty paying their bills last month could be compensated? Maybe a weeks free classified add listings?
    It must be realised that business is business, Lindens get our money no matter what !
    Come on chaps…play fair !

    Carl

  60. 60 Close to Cursing in comments... Says:

    Someone spoke the truth.
    Someone showed the facts.

    Now…

    Let’s see someone start paying REFUNDS for april!

  61. 61 RCE Says:

    No one told anyone to invest their money in commercial enterprises in a game. If you chose to put your money into an enterprise, you took the risk, and risks carry the possibility of failure. From what I read, LL is attempting to fix problems. Thats what they should do. As far as refunds for lost whatever, hey, live and learn. I appreciate the honesty of the post and look forward to the improved performance of SL as it matures.

  62. 62 River Ely Says:

    Thank you for an open and frank report on the problems encountered. I have been asking for a year or more why we have not yet invested in global resources rather than simply USA Based resources. Mirroring systems on a second continent would allow all manner of process’s to gain invulnerability against a severed connection, server system, or administrative deployment error.

    Second Life is a Global Product and I feel confident that as billing can be migrated to establish a financial center in the UK to adapt for VAT payments, so could a remote mirror for asset delivers, or a communications bridge to allow EU residents the ability to log in while a USA based system was up for sched maint.

    I am probably wishing for pie in the sky and many readers will propound methodologies supporting the inverse of my proposals, but, such solutions are currently used by global marketing companies to maintain an adaptive global presence riding over outages and systems loss from time to time.

    The product, SecondLife is rather good and seems to demand people return with high interest, even when malfunctions that would normally deter any sane customer from ever returning. knowing that, push the supportive boat out with some customer based incentives we are all asking for and watch the amazing strength of support you get for improved growth. I mean it, I cant stay away, and I assure you, when I am treated badly by Business Clients, I vote with my pocket book. Again, thank you for the intelligent and well worded simple report. You score well on communications sometimes.

    River :)

  63. 63 Deira Llanfair Says:

    Thank you for giving us the details about this. It does help to mitigate the frustration to know that you guys are working to get things sorted out long term. A stable system is the number one priority for the continued success of Second Life.

  64. 64 Jessica Hultcrantz Says:

    @ kat (20):
    Yes, i have indeed experienced way too much outage at prime time lately… *grumbles showing some fists*

    BUT:
    @ LL ™ ?
    This is the kind of blog postings you need to do more often. The Truth!
    If the lab cares anything about customers it’s time to honestly open up and share this kind of information to explain what the h**k is going on behind the screens.
    Seriously, we, the customers, are fed up with “sorry” and “thank you for your patience”, that is getting ridiculous by now.
    But some decent honest truth blogged with details about why, where, when and why (ever worked with journalism perhaps?) is vital to get customers to understand that there indeed are problems that can’t be solved in a snap.
    Yet again but… if there is a lack of communications (which indeed is true for over 90% of the blog posts about things messing up, customers get angry, you get bad reputation for not caring and everyone get in a kind of hostile mood starting to look for lawyers and digging ditches to fire from, and who benefits from that (besides the lawyers?).
    So keep the straight information flow up and tell the details, even the really bad ones, that is needed.

    Now remember, for every blog post try answering: What, where, why, when, who and how. PLEASE!

  65. 65 U M Says:

    There was a **** of alot more issues then transation problems…….But some how everyone made it through it, grant you many lost alot of money. But then as someone said nobody forced anyone to invest money here. But then again how many times have you made even more then typical times? Doesnt it balance out? Or do you expect perfect stats all the time. Ok I said my share of BS as they say inthe past about the lack of support. But thats when LL lied to use about events and other problems. But when they come straigth with us I can`t hold anything on them. Thats why i feel this is a truthy new era with LL. If they keep this up atleast supporting their effort is the least we can do.

    Usagi M

  66. 66 Damona Rau Says:

    @27 Verdana
    I don’t want my money back, thats an impossible thing and the maximum of US$50 won’t cover the lost i have (oh yes folks, read the TOS). But Lindenlabs has some more options for a compesation. Let me point some options.

    - For region owners a half or a full month no tiers (could be handout to the Residents)
    - some weeks no payments for classified ads
    - some weeks no group liability
    - some weeks no payments for texture uploads

    The last 3 points are more “hidden payments” we pay each single day. With all the builders and creators it will be a huge bunch of L$ for Lindenlabs.

    I’ve read in some posts about the “growing environment”. Ok, we have more then 60.000 Residents online, but how many from them are nothing more then camping bots? Let me guess: 20.000 as minimum. So we don’t have that much more “living” residents as in 2007 before gambling was banned (July and August 2007: round about 45.000 residents online).

    I won’t count again all problems (too many, look at JIRA Lindenlabs), I won’t read “Resolved” when only symtoms are resolved and not the cause (how often we read “Problems with transactions” and some hours later “[Resolved] Problems with transactions” and some days later again the same?).

    I would love to work together with Lindenlabs to get the Grid and the services more stable, but Lindenlabs should offer an incentive. The only thing we get are excuses “thanks for appreciation” and begging for more patience, but please accept, that these cards are overused right now.

  67. 67 Deira Llanfair Says:

    Apologies for taking two slots in a restricted number response list - but I cant edit my first post.

    I just wanted to draw attention to Craig Alman’s post @55 - please, please Linden Lab pay attention to this.

    Transaction failures have a huge impact on commerce in SL - it is bad if everything stops, but it is worse having to deal with a stream of customers whose purchases failed to arrive. For SL to succeed we need a stable system where transaction failures are rare, but with a realistic process for dealing with any transaction problems - a contingency process that does not make the situation worse.

  68. 68 cosa nostra Says:

    @36 Katt

    this is the type of information I wanted to see ! it shows visual the performance, also you added the low lights + possible solutions moving forward !

    but ofcourse, a performance like that is affecting your fix customerbase, people who own sims or businessowners small or large, I talk here about a small % because 95 % of the grid residents are travellers or visitors (and have no payment info on file).

    At least a professional company would come up with a commercial proposal to show the FIX base that they fully take responsibility for the weak performance in april ! Saying sorry Katt is not enough, cause this is too easy …

    I really want to see a commercial proposal towards your sim owners, this could help LL moving forward, cause remind that LL is nothing without paying customers, the customers always determine the success of a company, not the company !

    cosa

  69. 69 U M Says:

    @61 - Some would term that as “taking the gamble”. But LL has banneded gambling. So shut up, and join the masses saying we are to be needing many refunds for terrible CRAPPY service that we have been to endure for many many many months now.

    I like using up blog posts.

  70. 70 Kugel Says:

    Wooo … a lot of anger and frustration here.. not suprised really. SL has been terrible for a moth or more.

    KAT …. from your comment in @14 about wild accusations…fair point :) But I point out the comment in the original posting…

    “….caused by a line failure within the backbone of our primary network provider - who delayed, botched, and again delayed the fix…”

    The word “BOTCHED” is fairly wild in itself. If I was the providers legal representative at a corporate level… then I would be examining the public use and definition of the word “botched” too.

    Anyhoo … yeah… off my soapbox. YES.. a lot of things seem to be being botched, workedaround, band-aided, patched, load restricted, etc, etc… I dont actually see the word FIXED these days. At least not with any belief or conviction.

    Thank goodness I dont run a business/sim/whatever here in SL. Its a minefield. I wonder how many “new” businesses are actually starting up these days.

    Anyhoo… back to SL for me now :) Test the RC6 .. until it crashes..still…again.

  71. 71 U M Says:

    “69 U M Says:

    May 10th, 2008 at 3:01 AM
    @61 - Some would term that as “taking the gamble”. But LL has banneded gambling. So shut up, and join the masses saying we are to be needing many refunds for terrible CRAPPY service that we have been to endure for many many many months now.

    I like using up blog posts.”

    This is not me…….I don`t know who is faking posting my name here bt its not me

  72. 72 U M Says:

    @67 Use many blog posts as you wanting. I do it all the time and mostly havent much to say.

    We want refunds. lots of refunds. No tiers for may because april sucked so badly.

    Save another blog post for me for later. I havent used enough yet

  73. 73 U M Says:

    Nice little trick in faking names here kid………But if you really need to do it you are indeed a low life…….

  74. 74 U M Says:

    I are sorry. Do you have a patent on the letters U M ? If you do I change my names. If you dont, then you are the low life and need to stop calling names. stick to the subject. LL is failing to service use properly and should making refunds immediately

  75. 75 Kugel Says:

    U M and U M …. stop it. Let me put the kettle on and make you two a nice cup of tea. :) Besides… its obvious you two have different I.P.’s cos of the pretty icon in your posts.

    LOL.. RC6 just crashed AGAIN … and the crash logger works wonderfully..NOT. :(

  76. 76 U M Says:

    @75 Exactly. I am meaning who he think he gets off being saying i fake my name. If not like, change yours.

  77. 77 U M Says:

    anyone can tell but the little graphic to the right ho is who look they different.so If your start you better stop before they catch you

  78. 78 Bobo Decosta Says:

    @ 61 so SL is a gambling game? Investing on second life is like placing a bet on the uptime level of second life?!

    Glad I stopped gambling on SL’s service level. It’s much like a casino. The house always wins!

  79. 79 U M Says:

    look at this post in the blog http://blog.secondlife.com/2008/05/07/new-release-candidate-viewer-120-rc6-available/#comment-605296

    and this one. you can tell who is who

  80. 80 Lorna Volitant Says:

    I read these forum posts for a giggle at the moaners..it astounds me how much passionate anger is directed at the Lindens, my workthrough for any inworld dissatisfaction would begin and end with the quit button..and if the problems persists, cancel your account..if you are losing money in here due to outages then consider the wisdom of your investment and perhaps redirect your monies towoards RL causes who are crying out for a bit of cash from the wealthy minority who are quite happy to waste cash on megalomaniac follies..I would suggest reevaluating the time you have on your hands and consider a more constructive use of it.
    I see it simply, i choose to pay for a premium account in SL and enjoy the experience whenever I have the time and inclination to login..if for any reason I can’t login, well, there is a big world out there and it is definitely more fantastic and awe inspiring than anything I have found in SL…