Historical record of incidents for Ruvna
Report: "Errors sending parent emails in Ruvna Attendance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are currently investigating reports of increased errors and issues with sending parent emails through Ruvna Attendance.
Report: "Blackbaud Data Sync Errors"
Last updateThis incident has been resolved.
School data syncs with the Blackbaud SKY API are failing due to a service outage with Blackbaud.
Report: "Blackbaud Data Sync Errors"
Last updateThis incident has been resolved.
School data syncs with the Blackbaud SKY API are failing due to a service outage with Blackbaud.
Report: "Blackbaud Data Sync Errors"
Last updateThis incident has been resolved.
School data syncs with the Blackbaud SKY API are failing due to a service outage with Blackbaud.
Report: "Investigating Issues with Student Data in Attendance"
Last updateThe issue has been resolved.
A fix has been implemented and we are monitoring the results.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate the issue.
We're investigating issues accessing rosters and students in Ruvna.
Report: "Errors accessing Attendance"
Last updateThis incident has been resolved.
A fix has been implemented and schools should now be able to access Attendance again. We are continuing to monitor the situation and will provide further updates as needed.
We are continuing to investigate the issue.
Schools are reporting being unable to access various parts of the Attendance module. We are investigating the issue.
Report: "Data Syncing Failures"
Last updateThis incident has been resolved.
One of Ruvna's upstream providers is having issues with their services. This is resulting in degraded performance and failures when running school data sync processes. We are continuing to monitor the situation and will provide updates as necessary.
School data syncs are failing at an increased rate. We are investigating the issue.
Report: "Elevated Blackbaud Sync Errors"
Last updateThis incident has been resolved. If your school syncs rosters into Ruvna via the Blackbaud API, we recommend checking your Ruvna environment to make sure roster data is synced correctly. If you notice any missing roster data, head to Settings > Data Import and press Trigger Import to initiate a new sync. Please reach out to support@ruvna.com if you have any questions.
We're experiencing an elevated level of errors when syncing rosters with the Blackbaud API and are currently looking into the issue. Schools syncing data via the Blackbaud API may have their imports paused while our team investigates.
Report: "Elevated Blackbaud Sync Errors"
Last updateThis incident has been resolved.
The issue has been identified and our team is working on a fix.
We're experiencing an elevated level of errors when syncing with the Blackbaud API and are currently looking into the issue. Schools syncing data via the Blackbaud API may have their imports paused while our team investigates.
Report: "Elevated Veracross Sync Errors"
Last updateThis incident has been resolved.
The Veracross team has implemented a fix and Ruvna syncs are no longer failing. We will continue to monitor the situation to ensure service is completely restored.
The Veracross API is experiencing an outage (details here: https://status.veracross.com/incidents/dgwqnvg9pvfj) causing syncs with Veracross to fail. Schools syncing data via the Veracross API may see their syncs fail or have their syncs paused until service is restored.
Report: "Errors updating attendance records"
Last updateThis incident has been resolved.
A fix has been implemented and functionality has been restored. We're continuing to closely monitor the platform.
The issue has been identified and a fix is being deployed.
Users are unable to update student attendance records. We are investigating the issue.
Report: "Errors Signing In with Google SSO"
Last updateThis incident has been resolved and Google SSO is functioning properly. This outage was caused by a mistake in an update to the firewall rules within on of our internal networks. Functionality was restored by correcting the rule. We apologize for the inconvenience.
We are currently investigating this issue.
Report: "Announcement Errors"
Last updateA fix has been implemented and Announcements functionality has been restored. Some Announcements which were scheduled to be sent automatically between 5:00PM ET and 7:10PM ET failed to send initially, and are instead being sent now.
There is an issue with editing and sending Announcements affecting both new and scheduled drafts. We are working on a solution and will provide updates as soon as possible.
Report: "Errors loading screening instances"
Last updateThis incident has been resolved.
Schools may be unable to load parts of the web app, including screening instances. We are investigating the issue.
Report: "SMS Delivery Interruption"
Last updateOur SMS Provider has indicated that the issue has been fully resolved.
Our SMS Provider is indicating that while the issue has been resolved, they will continue to monitor the situation to ensure a full recovery. We are continuing to watch closely as well, and will keep this page up to date.
SMS delivery is continuing to work with slightly increased error rates and delays for delivery confirmation; our SMS provider has recently confirmed that they are still working on correcting the issue. We are continuing to closely monitor this issue and will update this page with more information soon.
This morning, our SMS provider has indicated an increased error rate when sending an SMS, particularly to recipients over the Verizon network. We are continuing to monitor the impact and will update this page with more info shortly.
Report: "Push notification failures"
Last updatePush notifications are now being delivered as-expected. This incident has been resolved.
We are continuing to work on a fix for this issue.
The issue has been identified and our team is working on a fix.
Some users may be experiencing trouble receiving push notifications about Accountability events and panic signals on the iOS app. Notifications via text and email are unaffected. Our team is currently investigating the issue.
Report: "Increased delays with triggering scheduled actions"
Last updateThis incident has been resolved.
Scheduled actions are no longer experiencing delays. The backlog of previously scheduled actions which were skipped during the interruption are being processed and triggered as quickly as possible. As such, Health screening instances which were scheduled to be created earlier this afternoon may still be in the process of being created. Health screening instances which are scheduled for any point in the future will be created according to their scheduled times.
We are continuing to monitor AWS service availability. Clients may experience delays with some scheduled actions, including creating and sending out scheduled Health screening instances, while we work towards a full recovery.
We are continuing to investigate the issue. Some scheduled actions may be running intermittently and/or with delays.
We are investigating potential issues with scheduled actions including syncing school data, syncing attendance data, and creating scheduled Health screening instances. These issues are likely due to the ongoing AWS service interruption. More updates will be provided as they become available.
Report: "Text message delivery failing for AT&T numbers"
Last updateThis incident has been resolved.
We are observing a recovery with text message delivery to AT&T numbers. We will continue monitoring to ensure full service recovery.
Text messages to AT&T numbers are still reporting as undelivered. We are continuing to investigate coordinate with our SMS partner to resolve this issue as quickly as possible. We will provide an update in the next 2 hours, or sooner if we receive confirmation from our provider that the issue has been resolved.
Text messages to AT&T numbers are still reporting as undelivered. We are continuing to investigate coordinate with our upstream provider to resolve this issue as quickly as possible. We will provide an update in the next 60 minutes, or sooner if we receive confirmation from our upstream provider that the issue has been resolved.
Text messages to numbers on AT&T's network are failing to be delivered. We are investigating the issue and will post an update in 30 minutes or when more information is available, whichever comes first.
Report: "Text Message Delivery Failures"
Last updateThis incident has been resolved. We apologize for the inconvenience and appreciate your cooperation.
While systems are behaving normally, we are continuing to monitor the situation and work with our SMS delivery partner to ensure this incident is fully resolved.
A preliminary fix has been implemented and the rate of failed text message deliveries has fallen back within normal limits. We are continuing to monitor the situation and will post more updates as they become available.
Outgoing text messages are failing to be delivered at a higher than normal rate. We are actively investigating the issue.
Report: "Errors Loading Screening Instances"
Last updateThis incident has been resolved.
A fix has been implemented and service has been restored. We are continuing to monitor systems to ensure no further issues arise.
The issue has been identified. A fix is being implemented which should allow users to load screening instances again.
We are currently investigating this issue.
Report: "SFTP Uploads Failing"
Last updateThis incident has been resolved.
File transfers are now completing successfully via SFTP. Our team is continuing to monitor the situation to ensure this issue is fully resolved.
We are currently investigating the issue.
Report: "SMS Delivery Issues"
Last updateThis incident has been resolved. As our service provider began to come back online, their ability to queue text messages for delivery came back online before the communication channel to let us know when a message is successfully handed off. For a short period, this caused some text messages to be delivered multiple times. We sincerely apologize for the inconvenience and disturbance. Due to the repeated message delivery, we (understandably) saw higher rates of people texting STOP to opt-out from receiving texts. You should encourage any parents who opted out due to repeated message delivery to resubscribe by sending START to 36598.
Our upstream provider has confirmed that delivery of text messages is once again operational. We will continue to monitor things closely, and update this page if necessary.
Text messages are continuing to fail to be delivered due to an outage with our upstream provider. Health screenings and Announcements sent via email are unaffected. We will provide an update in the next 60 minutes, or sooner if we receive confirmation from our upstream provider that the issue has been resolved.
Our upstream provider, Twilio, is currently experiencing an issue which is causing SMS delivery delays and failures when sending text messages (including Health screenings and Announcements). See https://status.twilio.com/incidents/zmz5c9prz8mm for more details. We are monitoring closely and will resolve this incident once we have received confirmation that the issue has been resolved from our upstream provider.
Report: "SMS Delivery Delays to T-Mobile"
Last updateOur provider has resolved the issue and there are no longer delivery delays when sending messages to T-Mobile devices.
Our upstream provider, Twilio, is currently experiencing an issue which is causing SMS delivery delays when sending messages to T-Mobile US devices via short codes. Ruvna delivers text messages through a short code, so text messages (including Health screenings and Announcements) may be delayed for users with T-Mobile devices. See https://status.twilio.com/incidents/8961zbmdjfxr for more details. We will resolve this incident once we have received confirmation the issue has been resolved from our upstream provider.
Report: "Issues with Scheduled Screening Instances"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We're currently experiencing delays and increased failure rates when creating automatically scheduled screening instances. Some instances are being created up to 4 minutes later than their scheduled time, and others are failing to be created. Our team is investigating the issue which is a result of an ongoing AWS service disruption. We will provide updates here as they become available. This incident does not affect manually-created screening instances. If your instance is not created within 5 minutes after its scheduled creation time, we recommend creating the instance manually (by selecting "Create Off-Schedule Instance").
Report: "Web App Outage"
Last updateThis morning, Ruvna experienced a significant outage. We largely restored platform functionality by 8:24AM ET and service has been operational since. The cause of the disruption was an issue in our high availability strategy which became magnified by an extreme spike in web traffic. This resulted in a backlog of queued requests which eventually brought the platform down for roughly 2 hours. As an organization, there is nothing more important than our responsibility to assist your communities in safely returning to school, and we know that you are relying on us to do just that. This morning, we let you down. We deeply regret this incident and sincerely apologize. I’m personally disappointed that this happened, and I am incredibly sorry for the confusion and frustration we likely caused. Ruvna's infrastructure is built for high availability. However, like many education technology platforms right now, we are experiencing an increase in traffic with back-to-school unlike any time in our history. I'd like to share with you some details of what happened this morning as well as explain the steps we are taking to ensure this never happens again. **What Happened - Technical Details** Today, the load balancer responsible for relaying requests for static assets to available servers detected a spike in traffic and \(correctly\) stopped sending new requests to servers for processing. At that point, the load balancer began placing new requests into a queue of unhandled requests. This queue then became saturated as well, causing additional requests to fail or hang. We believe the load balancer should have started sending traffic to the Nginx automatically shortly after the spike in traffic subsided, and we are still investigating what prevented this from happening. Because this issue was at the load balancer level, and related to the part of our infrastructure normally believed to be the least sensitive to spikes in traffic, our automated fault detection systems did not identify or resolve this issue automatically. This also prevented our team from resolving the issue as quickly as we wanted to, since we needed to take the load balancer completely offline in order to clear the backlog of queued requests. As the saying goes, a chain is only as strong as its weakest link. Today, the weakest link in our chain became abundantly clear. **Resolution and Moving Forward** Around 7:15AM ET, our team began preparing to shift web traffic to a fully managed Content Delivery Network \(CDN\). CDNs offer distributed, load-tolerant web request handling in situations where requested resources are static files like HTML/CSS/JS/images. Configuration completed around 8:24AM ET at which point we began transferring traffic to the CDN rather than the problematic load balancer. This shift in traffic also allowed the problematic load balancer to clear its backlog of queued requests so service was restored almost immediately. Even though the load balancer began handling requests normally again once the backlog was cleared, moving forward, these requests will continue to be served by the CDN. We previously felt the increased control we gained by not running web traffic through a CDN was worth the potential risks, especially since we had never encountered a scenario where traffic resulted in the outage we experienced today. Obviously, today's events highlight the limitations of our old strategy. A CDN will let us handle a virtually unlimited number of concurrent web requests by caching resources in hundreds of servers throughout the country, and will prevent the issue which took us down today from happening in the future.
The incident should now be resolved and traffic to the web app is no longer failing. We are continuing to monitor the situation closely.
We are continuing to monitor for any further issues.
A solution is being deployed to provide access to the web app. Access may still be spotty and errors may continue over the next 1-2 hours as the situation develops, but most users should be able to now connect.
We are continuing to investigate the major web client outage and work on potential solutions. We will provide updates here as they become available.
We are currently investigating this issue.
Report: "Health Screening Follow Up Links Invalid"
Last updateThis issue should now be resolved for any instances whose follow ups have been or will be sent after 8:04AM ET. Follow up links sent between 12:00AM-8:03AM may still be invalid. The issue did not impact the initial screening links sent by Ruvna Health, so affected users should click on the original link they received rather than the follow up.
The issue has been identified and a fix is being deployed.
We are investigating an issue where some of the links contained in automatic follow up messages from Ruvna Health are not valid. We are working on resolving this issue as quickly as possible. The issue does not appear to impact the initial screening links sent by Ruvna Health, so affected users should click on the original link they received rather than the follow up.
Report: "Screenings Not Sending"
Last updateThis incident has been resolved. We sincerely apologize for the inconvenience. If your school / district was impacted and you would like any unsent screening messages to be delivered, please contact us at support@ruvna.com.
The fix has been pushed live and screenings should now be sending out correctly. We will continue monitoring the situation for any other issues. We sincerely apologize for the inconvenience. If your school / district was impacted and you would like any unsent screening messages to be delivered, please contact us at support@ruvna.com.
Our team has identified the problem and is working on implementing a fix. We will continue to publish updates here as they become available.
We are currently investigating an issue causing some scheduled Health screenings to not automatically send texts or emails to recipients. We will provide updates here as they become available.
Report: "Degraded Email Performance"
Last updateThis incident has been resolved. Email delivery is functioning as-expected.
We are currently investigating this issue.
Report: "Service Outage"
Last updateThis incident has been resolved.
Service has been restored to Ruvna's backend infrastructure and we are continuing to monitor the situation.
We are currently investigating the issue.
Report: "SFTP Syncing Issues"
Last updateThe issue has been resolved.
The issue has been identified and a fix is being tested.
Some data updates sent to schooldata.ruvna.com via SFTP are not being referenced when the corresponding import procedures run, causing the updated data to not be reflected in Ruvna. We are investigating the issue.
Report: "Elevated Veracross Sync Failures"
Last updateThis incident has been resolved.
Syncs with Veracross are completing successfully. Our team will continue to monitor the situation closely.
Requests to the Veracross SIS API are failing at higher than expected rates.
Report: "Announcements Outage"
Last updateThis incident has been resolved.
An issue is currently preventing users from sending Announcements.
Report: "Google SSO Outage"
Last updateThis incident has been resolved and all login strategies are operating correctly.
Google is experiencing an outage with multiple GCP components, including Google OAuth 2.0. Some users are able to login with Google while others are experiencing difficulties. Link to Google Incident: https://status.cloud.google.com/incident/developers-console/19008
Google is experiencing an outage with their authentication services, preventing users from signing into Ruvna with Google. We're investing the issue.
Report: "Clever Sync Errors"
Last updateThis incident has been resolved.
Error rates have fallen back within expected levels. Ruvna engineers are in communication with Clever regarding the incident and any updates will be posted here.
We're experiencing an elevated level of errors when syncing with the Clever API causing some users and students to be removed from Ruvna. We are currently investigating the issue. Schools/Districts syncing data via the Clever API may have their imports paused while our team investigates.
Report: "Degraded Veracross Sync Performance"
Last updateVeracross API syncs are operational again and data is updating in accordance with school-specific schedules.
We are continuing to monitor the Veracross status page.
We're experiencing an elevated level of errors when syncing with the Veracross API as a result of active maintenance being performed on their system (https://veracross.statuspage.io/incidents/5qs1sktq5k5d). Schools syncing data via the Veracross API may have their imports paused until the Veracross API is operational.
Report: "Degraded Veracross Sync Performance"
Last updateThis incident has been resolved and all imports are running on schedule.
We're experiencing an elevated level of errors and increased latency when syncing with the Veracross API and are currently looking into the issue. Schools syncing data via the Veracross API may have their imports paused while our team investigates.
Report: "Increased SFTP Latency"
Last updateThe latency has been resolved and SFTP-based syncs are running as expected.
SFTP requests are currently experiencing higher latency than normal. If your school syncs data with Ruvna via SFTP at schooldata.ruvna.com, your sync schedules may be running with a delay as well.
Report: "DNS Service Outage"
Last update#### **Thursday’s DNS Service Outage - Post Mortem** On Thursday \(9/20\) evening at 10:04PM ET, Ruvna’s DNS host went down for approximately 19 minutes. During the outage, anyone accessing services located at ruvna.com \(such as the Accountability Web Client or SFTP Sync Service\) would have received a DNS error or a hung page. Anyone accessing services that rely on the ruvna.com network \(such as the Ruvna iOS app\) would have received a “No Connection” error. We apologize for this incident and any challenges it may have caused. We recognize that you rely on Ruvna to be available for you at all times since you never know when a crisis may happen. **Root Cause** The cause of the incident was an edge router failure within the network of our DNS hosting provider. DNS, or Domain Name System, is the foundational technology necessary for translating a URL typed into a browser into a physical address of a server to handle and respond to a request. When the infrastructure that hosts DNS records for a given domain \(a.k.a. “nameserver”\) is unavailable, requests to that domain can’t be mapped to physical servers and therefore fail. Normally, an edge router failure would be automatically identified so that a failover procedure could step in to assign another router to this task. On Thursday, the failover process didn’t kickoff automatically, leading the incident to cause an outage for clients of the DNS provider, including Ruvna. Moreover, DNS queries are cached for a period of time which adds protection against these outages, as cached DNS queries can be resolved without reaching a domain’s nameservers. Thursday’s outage lasted longer than the majority of our DNS records cache period, so once the cache TTL on those records expired, the outage with Ruvna began. Ruvna uses a separate provider to host our DNS records independently from the rest of Ruvna’s infrastructure and servers so that a major outage of either provider gives our team options for resuming service as quickly as possible. As such, this incident had no impact on Ruvna’s internal networking \(whose DNS is hosted with yet another provider\), servers, or data, even though such entities were unreachable via the ruvna.com domain during the outage. **Remediation and Prevention** Ruvna’s engineers were notified as soon as our network became unreachable at 10:04PM. They quickly identified that the outage was not with Ruvna’s own infrastructure or network, but with our DNS provider’s physical hardware preventing access to Ruvna’s infrastructure and network. We were in touch with the provider by 10:15PM to address the problem as quickly as possible. They confirmed the failover procedure had been completed by 10:26PM, at which point service was restored. No amount of downtime is acceptable for us, just as it is unacceptable for you. Here’s what we’re doing to prevent incidents like this from occurring in the future: * We feel this incident could have been prevented by the provider-in-question with better failover and high availability strategies. As such, on September 28, we completed the process of migrating our primary DNS hosting to a provider with significantly stronger infrastructure. Our nameservers are now replicated across the globe to ensure Ruvna’s constant availability.
This incident has been resolved and service is fully restored.
Service has been restored. We are monitoring the situation to ensure the issue is fully resolved.
Ruvna's primary DNS servers are experiencing significant delays in response time, causing many systems to be unreachable.
Report: "Veracross Sync Failures"
Last updateThis incident has been resolved. Schools with no "active" classes relying on the Veracross API for data syncing will have their Student and Roster data frozen until the start of the Fall term. Users (faculty/staff) will continue to sync normally. Veracross schools which still have "active" classes will continue syncing as usual. Contact Ruvna Support (support@ruvna.com) with any questions.
We have identified the issue and our engineers are working to implement a fix.
We're experiencing sync failures for schools utilizing the Veracross API sync if/when no classes have the "active" status. Schools syncing data via the Veracross API may have their imports paused while our team investigates.