AskNicely

Is AskNicely Down Right Now? Check if there is a current outage ongoing.

AskNicely is currently Operational

Last checked from AskNicely's official status page

Historical record of incidents for AskNicely

Report: "AskNicely returns Error 500"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Corporate website DNS outage"

Last update
resolved

This incident has been resolved. If you visited our corporate website during the incident and are still experiencing an issue, please try flushing your DNS cache: https://blog.hubspot.com/website/flush-dns

monitoring

A issue with our corporate website (asknicely.com) has been identified and a fix has been initiated. This did not directly affect the AskNicely Application but may impact users who start by clicking the Sign In link from our home page. As a workaround, those users may instead visit https://start.asknice.ly/findlogin/

Report: "Application Unavailable for some US customers"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating unavailability in our application for some US customers.

Report: "Some US hosted customers are currently getting 404 and 500 errors when trying to access AskNicely"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Investigating increased error rates for US clients"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Investigating increased error rates for US clients."

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Investigating increased error rates for US clients."

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating this issue.

Report: "Issue reported with the dashboard response widget and workflows page for some users."

Last update
resolved

This incident has been resolved. If you experience any issues, please clear your browser cache. If that does not resolve the issue, please contact AskNicely Support.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Surveys not loading for some users"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have identified a possible cause of surveys not displaying to respondents and are currently deploying a change for remediation.

investigating

We are investigating reports of surveys not loading.

Report: "Tenants hosted in Australia may be experiencing issues with loading their main dashboard page."

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating unavailability in our application.

Report: "Elevated 404 error rates"

Last update
resolved

This incident has been resolved.

Report: "Brief period of unavailability"

Last update
resolved

We think we have a good understanding of what caused the incident, and do not anticipate any more problems. Sorry for any inconvenience.

monitoring

We are currently monitoring a recent period of instability in our storage systems. Some AskNicely accounts were unavailable, but they seem to have now recovered. We continue to monitor the situation.

Report: "Site response performance problem"

Last update
resolved

The issue has been resolved.

monitoring

We have identified an issue causing slow responses from the AskNicely application for our customers based in the US datacenter. We've taken steps to move load off the affected systems, and we expect response times to now be recovering to normal.

Report: "AskNicely Site 500 error"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

We have restarted Nginx and all metrics are now nominal. We will investigate further, continue to monitor. We have an error message that most likely requires a small nginx.config change to prevent this happening in the future. Engineers were alerted within 1 minutes of the first 500 errors. Site was restored to health in approx 10 - 15 minutes. Sorry for this outage. Your friendly engineering team.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are seeing 500 server errors related to NGINX. We are investigating and monitoring.

Report: "502 Error US Datacenter"

Last update
postmortem

We’ve identified endpoints that were not properly rate limited and when receiving a high volume of traffic were causing infrastructure issues. We’re working on better rate limiting coverage rolled out to prevent further outages.

resolved

We have now resolved this incident and identified the cause. The engineering team are now doing a postmortem of the event to prevent this happening in the future.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are now monitoring the situation the situation and all our monitoring tools are reporting the system is operating within expected parameters.

identified

We have identified the source of the problem that has been causing an exceptional high load.

identified

We have seen some performance issues that are causing some 502 and 504 errors. We are working hard to see where these are occurring, we will update this as we continue to find the root cause. All alert systems are operating as expected and now we are going through platform monitoring tool

monitoring

We've rolled out changes to try resolve issues accessing AskNicely, and are monitoring current status.

investigating

We are currently investigating a 502 error.

Report: "502 Error USA Data Center"

Last update
postmortem

## The 502 error Today a number of customer may have experienced a 502 error and were not able to access the AskNicely platform. We are super proud of the platform we have built, and when we let our customers down, we know we need to do a better job, it really hurts. We are sorry you were not able to access our platform. Very sorry. We have a fantastic engineering team and over the next week, we will be focusing on our infrastructure to help minimise outages that you may have seen today. ## What went wrong AskNicely is built on AWS \(Amazon\), it is an amazing platform which allows to scale our solution very easily. Today we hit an issue with extremely heavy load on our USA database server \(RDS\). The symptoms we saw. * 502 Error rates * Load Balancer errors, 'unhealthy web server in load balancer pool * Database load in RDS going from under 5% to 100% in matter of seconds. Very abnormal. * Our 502 error page did not tell our customers what was happening, nor link to our status page. Bad. ## What went right We have extensive monitoring on AskNicely we have some fantastic services that we love which kicked in as soon as it detected something abnormal. The services we use today: * [PagerDuty.com](http://PagerDuty.com) We love PagerDuty, both the mobile app, email, SMS and automated phone calls for alerting. Auto escalation policies to other team members. * [Datadog.com](http://Datadog.com) provides us with detailed metrics around our application performance and servers, we send a massive amount of data back to Datadog and its a valuable asset that we use for real time monitoring and debugging. * [Loggly.com](http://Loggly.com) all our log files and error logs are managed in Loggly. We can easily visualise and quantify requests from customers in seconds using their powerful log query tool. * [NewRelic.com](http://NewRelic.com) can provide incredibly detailed analysis of what parts of our application are being used the most, how well that code is performing and what part of the code is the slowest. It also monitors how long our application is taking to load for our customers. We really absolutely love NewRelic and it is our Litmus test to see if our code changes have resolved our issues or not. * [Slack.com](http://Slack.com) it makes it so easy for our team to stay on the same page and communicate instantly no matter where we are in the world. * [Statuspage.io](http://Statuspage.io) You can find a link to our statuspage from the [www.asknicely.com](http://www.asknicely.com) homepage and our 404 pages. ## What we discovered During this time, we came under a very heavy API load from one customer. Normally our API rate limiter would kick in and prevent any one single customer from causing an outage. But due to the size of this customers dataset, our API was too slow to respond to all their requests causing massive congestion. Our rate limiting API is tuned for number of requests, not time to process a request. ## What we did We have a number of strategies that we use to scale our platform. One strategy allows us to move a single customer from one database host \(RDS Instance\) to another. Once we isolated the issue, this customer was moved to their own database instance. The AskNicely application instantly become responsive and all our server metrics returned to what we would consider normal parameters. We have also worked on several bottle necks including: * Autoscaling our primary USA database server, we have tripled the capacity of this server, in size and dedicated IOPS. * We have 6x our Redis instance that provides us with a powerful and fast caching service for parts of the application. * We have changed several variables on our RDS instance that would allow higher loads * We have added another application server to the server pool. ## What we are planning todo * Add detailed API monitoring - time, frequency, tenant and database * Improve our API rate limiter. * Refactor our API code that caused us issues and most likely refactor a particular query that caused the heavy load on our database. * Provide a way to gracefully degrade AskNicely so that core/key services are not affected. * Improve our 502 error page to link to our StatusPage so we can get our customers more timely updates. Again we are sorry, and we are working hard to rectify these issues. John // CTO and co-founder AskNicely

resolved

This is issue is now resolved. We have made several changes that have identified the root cause and rectified these issues. We will continue to monitor over the next several days.

monitoring

We are continuing to monitor, we have made a significant change that appears to rectify the issue. Again, we are monitoring this and we will do a debrief today.

monitoring

We have identified an issue and are now monitoring.

investigating

We are investigating a 502 Error on the US datacenter, we have several engineers looking into the issue.

Report: "Database Issues"

Last update
resolved

All AskNicely services are back to normal.

monitoring

AskNicely is back to being fully operational. We are monitoring for any continued irregular activity.

investigating

We have noticed irregular database activity and are performing emergency maintenance to resolve the issues. Services will be partially offline.

Report: "Application Unavailability"

Last update
resolved

We experienced an application outage starting 11:20 AM due to a database issue. This incident was resolved within an hour.

Report: "Application Unavailability"

Last update
resolved

We experienced a brief period of application unavailability due to malicious requests causing high server load. This issue has been resolved.

Report: "404 errors for tenants hosted in Australia and Europe"

Last update
resolved

The incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating the issue