Bytecode try-catch Instrumentation for Third-Party Android Crashes | Kai

Client crash mitigation usually starts with source code fixes. For self-written code, you can add null checks, modify threading, or adjust lifecycles; for open-source libraries, you can upgrade versions or submit patches; and for system APIs, you can handle them via version branching. In practice, however, you often encounter trickier situations: third-party SDKs without source code, issues appearing only on a few device models, high upgrade costs, slow vendor response, or exceptions originating from deep callbacks that are hard to bypass with business logic in the short term.

If these crashes have a large impact but are not critical errors that should force the app to crash, bytecode try-catch instrumentation can be a viable stopgap containment measure. It does not require modifying third-party source code and can cover multiple risk points centrally. The drawback, however, is that it intervenes in the build pipeline, and incorrect instrumentation might alter method semantics or even introduce new stability issues. Therefore, it must be treated as a controlled mitigation tool, not a general exception absorber.

Why `hookPoint` Configuration is Core

In a real project, bytecode protection configurations should reside in a unified App configuration, rather than being scattered across various screens. By configuring the global exception handler and the list of hookPoints via tryCatchExtension, hookPoints can be precise down to “Class + Method.” This is more restrained than a global try-catch: it only protects known high-risk points, such as certain debug entry clicks, WorkManager internal runnables, third-party dialog dismissals, or ViewPager2 scrolling adapters.

This design has two main values. First, the app integrates multiple third-party SDKs, dynamic modules, WorkManager, dialog components, and complex UI containers. Some crashes occur deep within the libraries, which the business logic cannot directly wrap. Second, the project might have multiple app shells and build types; if the protection is written into the business code, it’s easy to miss a shell; placing it in a unified configuration allows it to apply to all hosts.

One important boundary remains: not all methods can be instrumented. For instance, suspend functions, coroutine state machines, and bytecode verification are more complex, and blind injection might cause a VerifyError. Instrumentation is a stopgap containment tool, not a universal solution for swallowing exceptions; the more precise the hookPoint, the more controllable the side effects.

Configuration-Driven Instrumentation Model

Each hookPoint describes a risk point and includes the target class, method, descriptor, instrumentation mode, caught exception types, return strategy, and version range:

hookPoints:
  - id: "third_party_render_guard"
    enabled: true
    owner: "com.example.thirdparty.RenderEngine"
    methodName: "render"
    descriptor: "(Landroid/view/View;Ljava/lang/Object;)Z"
    mode: "METHOD"
    catchTypes:
      - "java.lang.Exception"
    returnStrategy:
      type: "CONST_BOOLEAN"
      value: false
    versionRange:
      min: "1.0.0"
      max: "2.5.0"
    note: "Guard known non-fatal render exception on specific library versions."

  - id: "third_party_callback_guard"
    enabled: true
    owner: "com.example.thirdparty.CallbackBridge"
    methodName: "dispatch"
    descriptor: "(Ljava/lang/String;)V"
    mode: "INVOKE_AROUND"
    targetInvoke:
      owner: "com.example.thirdparty.NativeAdapter"
      methodName: "notify"
      descriptor: "(Ljava/lang/String;)V"
    catchTypes:
      - "java.lang.RuntimeException"
    returnStrategy:
      type: "RETURN_VOID"

The three instrumentation modes have different use cases. Method-level instrumentation adds a try-catch block between the method entry and normal return, suitable for small methods with controllable side effects. Invoke-point instrumentation wraps protection around a known risky call before and after an invoke instruction, suitable for specific known risks inside a third-party method. Range-level instrumentation uses marked start and end points for control, suitable for advanced scenarios but with higher configuration complexity.

Bytecode Rewriting and Runtime Protection

The bytecode rewriting phase needs to maintain exception tables, local variables, operand stacks, and stack map frames. The capture logic passes the exception to GuardRuntime, which then generates a default value based on the return strategy. For void methods, it can simply return after catching; for object returns, it can return null or a fallback object; for primitive types, it needs to return 0, false, or a configured default value. Constructor and static initializer methods must be handled with extreme caution and are generally not recommended as default targets.

class TryCatchWeaver(private val matcher: HookPointMatcher) {
    fun visitMethod(className: String, method: MethodNode): MethodNode {
        val hook = matcher.match(className, method.name, method.descriptor)
            ?: return method

        return when (hook.mode) {
            HookMode.METHOD -> wrapWholeMethod(method, hook)
            HookMode.INVOKE_AROUND -> wrapTargetInvoke(method, hook)
            HookMode.RANGE -> wrapConfiguredRange(method, hook)
        }
    }

    private fun wrapWholeMethod(method: MethodNode, hook: HookPoint): MethodNode {
        val start = LabelNode()
        val end = LabelNode()
        val handler = LabelNode()

        method.instructions.insert(start)
        method.instructions.add(end)
        method.tryCatchBlocks.add(
            TryCatchBlockNode(start, end, handler, hook.catchTypes.first())
        )

        method.instructions.add(handler)
        method.instructions.add(callGuardRuntime(hook.id))
        method.instructions.add(buildReturnInstruction(hook.returnStrategy))

        return method.withRecomputedFrames()
    }
}

Runtime protection logic must remain simple; complex business logic should not be placed in the exception path:

object GuardRuntime {
    fun onCaught(hookPointId: String, error: Throwable) {
        if (FatalErrorPolicy.shouldRethrow(error)) {
            throw error  // Serious errors like OutOfMemoryError should not be suppressed
        }
        CrashGuardReporter.report(
            hookPointId = hookPointId,
            errorType = error::class.java.name,
            messageHash = error.message.safeHash(),
            sampleRate = 0.1
        )
    }
}

Controlling Boundaries is More Important Than Capability

The capture scope must be conservative. By default, do not catch Throwable. Serious VM errors, memory errors, and thread termination signals should not be swallowed. For exceptions that the business logic must be aware of, they should be rethrown or converted into explicit errors, rather than silently returning default values.

Returning default values requires caution. Returning false, 0, null, or an empty collection seems safe, but it can alter the upper-level logic. If the return value affects transactions, permissions, security decisions, or data writes, it is usually unsuitable for fallback using simple default values.

hookPoint configurations must undergo review. Key review points include: whether the target method is precise, whether the exception is non-fatal, whether the return strategy is reasonable, whether observability metrics exist, and whether a deprecation plan exists. Configurations lacking documentation and ownership are likely to become historical baggage that no one dares to delete.

An expiration mechanism must be set up. Every hookPoint should have a review date. If the issue disappears after a third-party library upgrade, the configuration should be deleted; if the exception remains frequent, a root-cause fix should be prioritized; and if instrumentation causes side effects, it should be disabled quickly. The goal of instrumentation governance is to buy time for a fix, not to permanently mask the problem.

Bytecode try-catch instrumentation is a sharp tool that requires restraint. It can provide temporary containment when source code is unavailable, third-party responses are slow, and online crashes have a significant impact, but it can also alter program semantics. The value of hookPoint configuration is to keep this capability within a reviewable, reversible, and observable boundary. Mature instrumentation governance does not aim for “no crashes from any exception,” but rather for “known non-fatal risks can be degraded, and unknown serious issues are not masked.” Ultimately, the root cause of third-party crashes must still be addressed through upgrades, replacement, correct API usage, or vendor fixes. Bytecode instrumentation is responsible for protecting user experience and business continuity until then.

Bytecode try-catch Instrumentation for Third-Party Crashes: Precise Containment with hookPoint

Why `hookPoint` Configuration is Core

Configuration-Driven Instrumentation Model

Bytecode Rewriting and Runtime Protection

Controlling Boundaries is More Important Than Capability

Android On-Device LLM Latency: From Tap to First Token

Evaluating and Optimizing On-Device AI on Android: From Offline Benchmarks to Production Experiments

Android Multi-Channel Builds: From Gradle Flavors to Walle V2 Signature Injection

Inside Android Crash Aggregation: Stack Fingerprinting to Automated Issue Triage

Read Next

Async Inflate Manager: Thread-Pool Preloading for Faster First Frames

Compose Multiplatform Desktop in Practice: JVM Windows and Platform Interop

Why hookPoint Configuration is Core

Configuration-Driven Instrumentation Model

Bytecode Rewriting and Runtime Protection

Controlling Boundaries is More Important Than Capability

Related Posts

Android On-Device LLM Latency: From Tap to First Token

Evaluating and Optimizing On-Device AI on Android: From Offline Benchmarks to Production Experiments

Android Multi-Channel Builds: From Gradle Flavors to Walle V2 Signature Injection

Inside Android Crash Aggregation: Stack Fingerprinting to Automated Issue Triage

Read Next

Async Inflate Manager: Thread-Pool Preloading for Faster First Frames

Compose Multiplatform Desktop in Practice: JVM Windows and Platform Interop

Why `hookPoint` Configuration is Core