Просмотр исходного кода

Merge pull request #595 from nightscout/command-line-replay

Updates to oref replay system to enable replaying a large number of inputs
Sam King 5 месяцев назад
Родитель
Сommit
af1a9f6540

oref_swift_port_notes.md → DeveloperDocs/OrefSwift/oref_swift_port_notes.md


+ 152 - 0
DeveloperDocs/OrefSwift/replay.md

@@ -0,0 +1,152 @@
+# Replaying inputs for oref
+
+To debug and verify our swift oref implementation, we replay inputs
+caputed from real devices. This document outlines the two main use
+cases for this replay mechanism: verification and daily verification.
+It also shows how to debug when you find an inconsistency.
+
+## Verification
+
+To verify our swift oref implementation, we replay a large number of
+inputs that have caused inconsistencies in the past. If our swift
+implementation is correct, these previously incorrect runs will now be
+consistent with either the JS implementation or our fixed JS
+implementation, which is present only in our testing bundle.
+
+To do a verification run:
+
+```bash
+# In Trio-oref, check out the latest `oref-swift` branch
+$ cd Trio-oref
+$ git checkout oref-swift
+
+# In trio-oref-logs get the latest inputs
+$ cd ../trio-oref-logs
+$ ./update_trio_stats.sh # will take a long time for the first run
+
+# extract all inputs from the logs
+$ python extract_inputs.py
+
+# run the verification script
+$ python run_tests_on_existing_errors.py
+```
+
+This verification script will run through all of the inputs, separated
+by timezone, and either confirm that all inputs produce correct outputs
+or flag any timezones that had incorrect runs.
+
+## Daily verification
+
+Each day as new logs come in, you can run through the logs to see if
+there are any inconsistencies. To do this, you run:
+
+```bash
+# Fetch the latest logs incrementally
+$ ./update_trio_stats.sh
+# run through all of the inputs for a single day
+$ python run_tests_on_errors.py 2025-12-06 > 2025-12-06.txt
+```
+
+Then once it's done running it'll give you a report to let you know if
+there were any inconsistencies found. That report will look something
+like this:
+
+```
+(venv) kingst@Sams-MacBook-Pro-4 trio-oref-logs % tail 2025-12-06.txt 
+
+--- Summary---
+- autosens: 10 errors, Xcode tests: ✅
+- determineBasal: 11 errors, Xcode tests: ❌ Failed for: America/Los_Angeles
+- iob: 521 errors, Xcode tests: ✅
+- profile: 0 errors, Xcode tests: N/A
+- meal: 1178 errors, Xcode tests: ✅
+```
+
+This summary shows that all of the `autosens`, `iob`, and `meal`
+inputs were consistent when run within the unit test, `profile` didn't
+have any inconsistencies, and `determineBasal` had one or more replay
+runs where there was an inconsistency for records in the
+America/Los_Angeles timezone.
+
+## Debugging
+
+If you get an error, you need to step through the code and debug it. I
+haven't found a good way to do this in an automated fashion yet, so
+this is a highly manual process.
+
+From an architecture perspective, there are three key
+components. First, there is a local HTTP server that runs within the
+`trio-oref-logs` repo to serve up inputs for replay. We use a local
+HTTP server to enable us to access a large number of input logs from
+within our iOS app running on a simulator.
+
+Second, there is the iOS unit test. This test will download a list of
+files from the HTTP server, download files one-by-one, and run the
+appropriate function on it (e.g., `determineBasal`) to test against
+the production JS implementation and a [JS
+implementation](https://github.com/kingst/trio-oref/tree/dev-fixes-for-swift-comparison)
+that has the bug fixes we added to Swift. It also formats the inputs
+in a way that is suitable for running with the JS implementation using
+mocha.
+
+Third, the JS implementation includes unit tests for replaying inputs
+created by the iOS test.
+
+With this architecture, you can debug the same input on both the JS
+and Swift implementations.
+
+Here is an example of debugging the `determineBasal` bug from the
+2025-12-06 daily verification run that we list above.
+
+First, extract out the inputs for that particular day and serve them
+using our HTTP server:
+
+```bash
+$ cd trio-oref-logs
+$ rm errors/*
+$ ./extract_errors.sh determineBasal 2025-12-06
+$ python serve_errors.py
+```
+
+Next, open up xcode and set up the ConfigOverride.xcconfig file:
+
+```
+ENABLE_REPLAY_TESTS = YES
+REPLAY_TEST_TIMEZONE = America/Los_Angeles
+HTTP_FILES_OFFSET = 0
+HTTP_FILES_LENGTH = 2500
+```
+
+Run the unit test that will run through all of the errors:
+`DetermineBasalJsonTests.replayErrorInputs`
+
+Search through the console for the string "REPLAY ERROR" -- this will
+show you what was different and will tell you which input file caused
+the error.
+
+Then, update the unit test that runs for a single input, in our case
+`DetermineBasalJsonTests.formatInputs` and copy in the name of the
+input file. It will look something like this:
+`/files/f1d04efa-c39b-4f0a-9955-65ab663ff9fb.0.json`. Confirm that the
+test is still failing. This run will also create the inputs for use
+with JS replay tests.
+
+Search through the console and look for the string "writing" to find
+the location on your local file system for the inputs formatted for
+the JS replay unit test.
+
+From the JS repo that has the fixed JS implementation, copy in the inputs:
+
+```bash
+$ cd trio-oref
+$ git checkout dev-fixes-for-swift-comparison
+$ cp /Users/kingst/Library/Developer/CoreSimulator/Devices/98ED1614-33B5-4F12-906B-D5C092AD0EB5/data/Containers/Data/Application/F9F20EFC-128C-482B-85E3-C59A3242DDEB/tmp/determine_basal_error_inputs.json tests
+$ ./node_modules/.bin/mocha --inspect-brk -c tests/determine-basal-replay.test.js
+```
+
+And the replay test is waiting for you to attach a debugger. I use
+Visual Studio to debug Javascript, but anything that understands JS
+debugging protocols should work.
+
+And at this point you can replay both JS and Swift implementations for
+an input that causes an inconsistency and debug the issue.

+ 64 - 0
DeveloperDocs/OrefSwift/roadmap.md

@@ -0,0 +1,64 @@
+# Roadmap
+
+At this point, we have a complete port of the oref algorithm from
+Javascript to Swift. At a high level, the three steps we want to go
+through are:
+
+  - Small scale testing
+  - Beta testing shadow mode
+  - Beta testing swift algorithm
+  - Release
+
+## Small scale testing
+
+At this stage, the implementation is in the `Trio-dev` repo and there
+are a small number of known testers running the algorithm. The Swift
+implementation runs in shadow mode where we execute it, compare the
+results against JS, and log any inconsistencies for further analysis.
+
+The exit criteria for this stage is:
+
+  - Ensure no inconsistencies for the large database (200k+) of inputs
+    we have.
+
+  - Fix any known bugs in the Swift implementation (all documented via
+    GitHub issues)
+
+  - Do an analysis on the algorithm bugs we fixed in Swift to confirm
+    that the resulting changes to the algorithm are safe and within
+    our expected bounds.
+
+  - Add the ability to test fixed JS in the app before logging
+    inconsistencies to reduce the logging volume.
+
+## Beta testing shadow mode
+
+At this stage, we move the algorithm to the main `Trio` repo on the
+dev branch. The Swift implementation is still running in shadow mode
+while we collect more data.
+
+The exit criteria for this stage is:
+
+  - No inconsistencies in the algorithm for one week of operation
+
+## Beta testing swift algorithm
+
+At this stage, we move to using the Swift implementation for dosing
+decisions, but we keep the JS implementation to check for
+inconsistencies and log inputs for any inconsistent runs.
+
+The exit criteria for this stage is:
+
+  - No inconsistencies in the algorithm for one month of operation
+
+## Release
+
+At this stage, the port is complete. The swift code is running and we
+productionize the implementation.
+
+Productionization includes:
+
+  - Removing the JS implementation from the repo
+
+  - Refactoring the replay mechanism or removing it depending on if we
+    want to use it for other features in the future

+ 4 - 0
TrioTests/Info.plist

@@ -18,5 +18,9 @@
 	<string>$(ENABLE_REPLAY_TESTS)</string>
 	<key>ReplayTestTimezone</key>
 	<string>$(REPLAY_TEST_TIMEZONE)</string>
+	<key>HttpFilesOffset</key>
+	<string>$(HTTP_FILES_OFFSET)</string>
+	<key>HttpFilesLength</key>
+	<string>$(HTTP_FILES_LENGTH)</string>
 </dict>
 </plist>

+ 0 - 1
TrioTests/OpenAPSSwiftTests/AutosensJsonTests.swift

@@ -128,7 +128,6 @@ import Testing
                 }
                 if let str = algorithmComparison.swiftException {
                     print(str)
-                    #expect(Bool(false), "Swift exception on autosens")
                 }
                 continue
             }

+ 0 - 1
TrioTests/OpenAPSSwiftTests/DetermineBasalJsonTests.swift

@@ -27,7 +27,6 @@ import Testing
                 }
                 if let str = algorithmComparison.swiftException {
                     print(str)
-                    #expect(Bool(false), "Swift exception on determine")
                 }
                 continue
             }

+ 0 - 1
TrioTests/OpenAPSSwiftTests/IobJsonTests.swift

@@ -58,7 +58,6 @@ import Testing
                 }
                 if let str = algorithmComparison.swiftException {
                     print(str)
-                    #expect(Bool(false), "Swift exception on IoB")
                 }
                 continue
             }

+ 0 - 1
TrioTests/OpenAPSSwiftTests/MealJsonTests.swift

@@ -27,7 +27,6 @@ import Testing
                 }
                 if let str = algorithmComparison.swiftException {
                     print(str)
-                    #expect(Bool(false), "Swift exception on meal")
                 }
                 continue
             }

+ 21 - 2
TrioTests/OpenAPSSwiftTests/utils/HttpFiles.swift

@@ -2,14 +2,33 @@ import Foundation
 @testable import Trio
 
 /// Helper struct to download files from localhost via HTTP. Must have a HTTP server
-/// running on port 8123 that supports listing files and downloading files
+/// running on port 8123 that supports listing files and downloading files.
+///
+/// You can set two ReplayTests variables `HTTP_FILES_OFFSET` and `HTTP_FILES_LENGTH`
+/// to implement paging
 ///
 /// This struct is only useful during testing as it is missing a number of error checks
 struct HttpFiles {
     static func listFiles() async throws -> [String] {
         let url = URL(string: "http://localhost:8123/list")!
         let (data, _) = try await URLSession.shared.data(from: url)
-        let files = try JSONDecoder().decode([String].self, from: data)
+        let allFiles = try JSONDecoder().decode([String].self, from: data)
+
+        let files: [String]
+        if let offset = ReplayTests.filesOffset, let length = ReplayTests.filesLength
+        {
+            // Both variables exist and are valid integers
+            let endIndex = min(offset + length, allFiles.count)
+            let startIndex = min(offset, allFiles.count)
+            files = Array(allFiles[startIndex ..< endIndex])
+        } else {
+            files = allFiles
+        }
+
+        if files.count > 5000 {
+            fatalError("too many files: \(files.count) \(ProcessInfo.processInfo.environment)")
+        }
+
         return files
     }
 

+ 53 - 8
TrioTests/OpenAPSSwiftTests/utils/ReplayTests.swift

@@ -1,19 +1,59 @@
 import Foundation
 
-/// Flag to enable replay tests.
-///
-/// These test are only used for debugging so normally they should be disabled. But
-/// if you're debugging the oref-swift functions they are extremely useful. To enable them
-/// add these lines to your ConfigOverride.xcconfig file:
-/// ```
-/// ENABLE_REPLAY_TESTS = YES
-/// ```
 enum ReplayTests {
+    /// Flag to enable replay tests.
+    ///
+    /// These test are only used for debugging so normally they should be disabled. But
+    /// if you're debugging the oref-swift functions they are extremely useful. To enable them
+    /// add these lines to your ConfigOverride.xcconfig file:
+    /// ```
+    /// ENABLE_REPLAY_TESTS = YES
+    /// ```
     static var enabled: Bool {
+        let env = ProcessInfo.processInfo.environment
+        if env["ENABLE_REPLAY_TESTS"] == "YES" {
+            return true
+        }
+
         let bundle = Bundle(for: BundleReference.self)
         return bundle.object(forInfoDictionaryKey: "EnableReplayTests") as? String == "YES"
     }
 
+    /// The offset for pagination of replay input files
+    ///
+    /// Set this offset using an environment variable or the ConfigOverride.xcconfig file.
+    /// For this change to take effect you must also set the length
+    /// ```
+    /// HTTP_FILES_OFFSET = 2000
+    /// ```
+    static var filesOffset: Int? {
+        let env = ProcessInfo.processInfo.environment
+        if let offset = env["HTTP_FILES_OFFSET"].flatMap({ Int($0) }) {
+            return offset
+        }
+
+        let bundle = Bundle(for: BundleReference.self)
+        let offsetString = bundle.object(forInfoDictionaryKey: "HttpFilesOffset") as? String
+        return offsetString.flatMap { Int($0) }
+    }
+
+    /// Length for pagination of replay input files
+    ///
+    /// Set this length using an environment variable or the ConfigOverride.xcconfig file.
+    /// ```
+    /// HTTP_FILES_LENGTH = 3500
+    /// ```
+    static var filesLength: Int? {
+        let env = ProcessInfo.processInfo.environment
+        if let length = env["HTTP_FILES_LENGTH"].flatMap({ Int($0) }) {
+            return length
+        }
+
+        let bundle = Bundle(for: BundleReference.self)
+        let lengthString = bundle.object(forInfoDictionaryKey: "HttpFilesLength") as? String
+        return lengthString.flatMap { Int($0) }
+    }
+
     /// Timezone to use for replay tests.
     ///
     /// This is used to filter replay test files by timezone. If not set, it defaults to "America/Los_Angeles".
@@ -22,6 +62,11 @@ enum ReplayTests {
     /// REPLAY_TEST_TIMEZONE = Europe/Berlin
     /// ```
     static var timezone: String {
+        let env = ProcessInfo.processInfo.environment
+        if let timezone = env["REPLAY_TEST_TIMEZONE"], !timezone.isEmpty {
+            return timezone
+        }
+
         let bundle = Bundle(for: BundleReference.self)
         return bundle.object(forInfoDictionaryKey: "ReplayTestTimezone") as? String ?? "America/Los_Angeles"
     }