Beta Solutions Blog

The Art of the Launch: How Real-Time Diagnostics Can Secure Product Success

Date:  Dec 12, 2023

How to Nail Your Products Release.


Product reliability issues can escalate into crises that exact a heavy toll on customer satisfaction and brand reputation. No product is 100% perfect and no doubt issues will arise, but how do you best capture these issues quickly before they get a chance to snowball? Take this hypothetical scenario for example.

Product Launch: Presentation


Example Scenario:
Navigating customer issues the outdated way.


Customer Complaint:

Jane recently purchased a smart thermostat from your company and is now facing a persistent issue: her device keeps resetting to a default temperature, undoing her personalized settings. This problem leads to fluctuating indoor temperatures and undermining the energy efficiency benefits Jane expected. Feeling the sting of high utility bills and the annoyance of a non-cooperative device, she contacts your customer support for help.


Initial Troubleshooting Challenges:

With no advanced diagnostic tools at their disposal, the customer support team relies on traditional troubleshooting methods. They query Jane for details surrounding each reset incident. This approach depends on Jane's memory and descriptive abilities and fails to provide the technical granularity needed for proper diagnosis.


Difficulty in Pinpointing the Problem:

The engineering team is left in the dark, without error logs or system snapshots to shed light on the issue. They make educated guesses on the cause of the resets, but without solid data, their efforts could lead to misdiagnoses and misdirected resources. There's also uncertainty whether this is a widespread firmware issue or a hardware fault unique to Jane's thermostat.


Inefficient Development and Deployment:

The guesswork continues as the firmware team hastily crafts a patch based on speculation. Without facilities for remote updates, Jane must endure a complicated manual process to apply the fix—a procedure far from user-friendly and fraught with risks. Moreover, since the engineers cannot replicate the issue, they've effectively turned Jane into an involuntary beta tester.


No Controlled Rollout or Direct Feedback:

The lack of a structured deployment system means all new factory devices receive the untested firmware while existing customers are left with faulty units. As more customers report similar problems, a manageable situation quickly spirals out of control. A full product recall becomes the last resort, a costly and brand-damaging move.


The Impact on Brand Reputation:

The updates are finally implemented across all units, but the inability to confirm their effectiveness breeds more uncertainty. Customers, including Jane, have faced inconvenience and disappointment, prompting public expressions of dissatisfaction and skepticism towards your product and, by extension, your company as a whole.


This hypothetical scenario paints a stark picture of the pitfalls that await a product that doesn’t have the necessary tools to handle such issues. Keep reading to find out about one of the tools we use.



Multiple Devices displayed on a desk.


Introduction

Electronic development, or any product development for that matter, can be an elaborate and nuanced endeavor. Success is marked not only by the product launch but also by its sustained performance and reliability in the field. Post-launch, project managers often face the formidable task of managing field issues that, if left unchecked, can lead to detrimental effects on the user experience and the brand's reputation. Moreover, the financial implications associated with recalls, extended support, and compensatory actions further accentuate the need for vigilant post-launch management.


This article aims to introduce you to the idea of real-time diagnostics and hopefully enlighten you to the significance they play in mitigating post-launch hazards, explore a tool that Beta Solutions has utilized in previous projects, known as Memfault, followed by a revisit to the above scenario, leading to a much happier Jane.


The Value of Real-time Diagnostics:

As mentioned above, maintaining the operational integrity and user satisfaction of a new electronic product post-launch demands vigilant monitoring and rapid response to emerging issues. Real-time diagnostics play a pivotal role in this maintenance process, transforming the traditional reactive methods of problem-solving into a proactive, strategic asset for any organisation.


What Do We Mean When We Say Real-time Diagnostics?

  • Real-time diagnostics refers to the continuous monitoring and analysis of a product's performance data as they function in their operational environment. This process detects, logs, and addresses issues immediately as they occur, without delay.
  • The system provides instant access to device data, error codes, and device metrics that would typically only be available if the device was physically connected to a pc.
  • With real-time diagnostics, companies have a live feed of their product's health, allowing them to maintain high standards of performance and reliability.


What Is the Proactive Approach?

  • The proactive approach in diagnostics anticipates potential issues before they become user-facing problems, as opposed to the reactive approach that responds only after a problem has been reported.
  •  Proactive diagnostics is characterised by the prevention of faults, predictive maintenance, and the optimization based on ongoing data analysis.
  • By implementing a proactive strategy, companies can ensure greater product uptime, a better customer experience, and a more robust understanding of their product’s real-world usage.


What are the Benefits of Real-Time Data?

  • Real-time data analysis supports continuous improvement of the product by identifying usage trends and customer preferences.
  • It enables data-driven decision-making for feature updates, hardware revisions, and customer support initiatives.
  • Real-time data provides an invaluable feedback loop for development teams, resulting in more agile and responsive product evolution.

Electronics: Working on Diagnostics


What's Memfault?


Memfault is a state-of-the-art Real-Time Diagnostic Tool that we, at Beta Solutions, utilize to ensure our partners products perform at their best. It's effectively a comprehensive health monitoring system for electronic devices, offering a suite of powerful tools that work in unison to detect, diagnose, report on, and resolve issues as they occur. Here are some key points that make Memfault great:


1. Monitoring:

Memfault continuously tracks how a device is functioning in real-time. It collects information on how the device is performing, such as its temperature, battery life, and whether all parts are operating correctly. It's akin to a 24/7 surveillance system that's always watching to ensure everything is working as it should.


2. Diagnosing:

When something goes wrong, Memfault is ready. It gathers all the relevant information so the engineering team has everything its needs to identify the problem. It can spot errors, breakdowns, and other issues using the data it collects.


3. Reporting:

Once an issue is detected, Memfault sends an alert. This could be akin to a warning light on your car's dashboard that tells you when it’s time for an oil change. It notifies the engineers or the product team about the issue so they can start working on a solution.


4. Updating:

Memfault can send out updates or fixes directly to the device, similar to how a computer receives software updates. These updates can correct errors, introduce new features, or improve existing ones, all without the user having to take the device to a service center.


5. Analyzing:

Beyond fixing immediate problems, Memfault collects data over time to find bigger trends and patterns. It helps companies understand how their devices are used and how they can be improved in future designs. 



Memfault's Backend


Upon logging into your Memfault account via their website (which you can access anywhere with an internet connection), you'll find yourself at the command center for all your devices and the data they generate.

Overview Dashboard to visualise and group your entire fleets data


The dashboard is your primary interface which showcases and overview of your fleet and the features mentioned above. A project manager might allocate merely two minutes at the beginning of their workday to visit this dashboard, but this is enough time to obtain a snapshot of their devices' status and address any issues that need to be handled. You can see all your devices on the devices page where you can search and filter based on things like what software version or which cohort they have been assigned to.


Cohorts serve to categorize devices into organized groups based on intended function or product stage. An example would be the grouping of devices into a "Test" cohort, which encompasses units designated for internal validation and pilot testing. These devices are used for latest firmware updates. Conversely, a cohort named "Production" might be established to encompass all devices post-manufacture but pending distribution to end-users. This systematic arrangement via cohorts effectively maintains order and structure within the device fleet.


Issue Management page showing relevant details about a devices assert.


All issues can be seen from an issues page where you again can filter and track issues based on a range of parameters such as:

  • Occurrence time
  • Software version
  • Cohort
  • Hardware version


Clicking into a specific issue gives you all the information an engineer would need to diagnose the Problem. It reveals:

  • Which devices have had this issue
  • The times the issue has occurred
  • Device logs
  • The coredump of the device (This is essentially what exactly was the device doing when the issue occurred)


This capability is nothing short of remarkable. It's akin to having the device right in front of you the very moment the problem arises, which significantly speeds up the diagnostic process. Additionally, you're equipped with a complete history of the issue each time it has occurred.


Metrics page showing battery metric data in graphs


Metrics: Devices check in every hour and report to memfault any key metrics you want to know about them. Things like:

  • Battery level
  • Number of time the device was turned on
  • Cellular signal (RSSI)
  • Number of times a feature was used on a device
  • Temperature
Battery discharge metric graph

All this information is graphed in a timeseries plot so you can see how any devices metrics have changed over time.


Alerts can be setup to notify you or the engineering team of potential problems. This helps catch issues before the customer does. An example of one being setup to alert everyone on the team if a device stays below 10% battery for more than 2 hours.

Firmware Over the Air staged rollout

Firmware management: Here you can manage different firmware versions. With the ability to select which cohort should have what firmware. You can also stage firmware rollouts by choosing to update only 10% of your device fleet to mitigate the risk of introducing more issues. This is amazing to beagle to test new features and being able to compare them to previous versions at the same time.


With an understanding of the challenges encountered in traditional device management, as illustrated through Jane's experience with her smart thermostat,  let's examine how the scenario could unfold differently. By integrating real-time diagnostics tools into the process, we witness a much different outcome.


Example Scenario:
Navigating Customer Issues With Real Time Diagnostic Tools


Customer Complaint: 

Imagine a customer, Jane, who has recently purchased a smart thermostat from your company. She contacts customer support to report that her thermostat sporadically resets to a default temperature, negating her preset schedules and resulting in discomfort and increased energy costs.


Initial Assessment with Real time diagnostic tools:

The customer support team acknowledges Jane's issue and tells her to not worry, they’ll look into it. They no longer have to rely on Jane’s memory and descriptive ability they can just log into the platform and search for Jane's device ID. Upon finding the device, they review the recent logs and notice several entries that correspond to the times Jane reported the thermostat resets.


Identifying the Root Cause:

The logs indicate a recurring error code that precedes each reset event. Additionally, the support team retrieves a Core Dump that was automatically recorded during the last incident. Customer support passes this infomation on to the engineering team. Good news, they are already aware of this as were alerted of Jane’s issue at the time it occurred. Using remote diagnostics tools, they had reviewed the core dump and error logs and determined that a firmware bug caused the thermostat’s operating system to crash under a specific set of circumstances, leading to a full system reset.


Developing a Firmware Fix:

With the issue identified, they can now recreate it on their devices. They create a new firmware version that addresses the issue and thoroughly test it in a controlled environment to ensure that it resolves the problem without introducing new issues. No more relying on customers to test the firmware.


Rolling Out a Segmented Update:

Once satisfied with the stability of the new firmware, the team decides to roll out the update using the Segmented Rollout feature.They start with a small percentage of devices that exhibit similar issues to Jane's thermostat. By monitoring these devices closely after the update, they can confirm that the fix works and that no new issues have arisen.


Broad Deployment and Resolution:

With positive results from the limited rollout, the team proceeds with a full fleet update. Jane's thermostat, along with all others affected, automatically receives the update. The device applies the update during a time that minimizes disruption for the user.


Customer Follow-Up and Confirmation:

After the update is deployed, customer support reaches out to Jane to confirm the resolution of her issue. They leverage real time diagnostic tools to ensure that her device has received and is running the new firmware version. Jane confirms that her thermostat has maintained the set temperatures and that the resets have ceased. She is pleased with the proactive approach and the timely fix, reinforcing her trust in your company's brand.


Closing the Feedback Loop:

The engineering team uses this experience and the data collected through Metrics to improve their development process. They plan to utilize Key User Metrics and Device Health indicators more effectively to prevent similar issues in the future. The proactive, data-driven approach offered by real time diagnostic tools ensures continuous enhancement of the product and customer satisfaction.


Conclusion

In summary, the ability to responded to real time issues efficiently and effectively has a huge impact on a products success. Not only do you keep you devices out in the customers hand working reliably, you also gather useful information about how your product is being used to aid in future iterations.


If the concept of Real Time Diagnostic Tools resonates with you, or if you have a product that could benefit from it, do not hesitate to get in touch with Beta Solutions. Our team has experience using the platform and is ready to bring it into your product strategy, ensuring you stay ahead in the dynamic world of electronics.



Share by: