Inside Android Room: KSP Code Generation, Migrations, and Flow Queries
When choosing a persistence solution for Android, three questions usually come up: can SQL mistakes be caught at compile time, how does the UI get notified automatically when data changes, and how can database upgrades avoid breaking production data?
Room was designed to replace runtime reflection with compile-time code generation, while wrapping SQLite APIs in an interface that feels natural for Android development. This article starts with the KSP annotation processor and follows Room’s full execution path.
What KSP does at compile time
Room’s core capability comes from KSP, Kotlin Symbol Processing. In the older KAPT setup, the annotation processor first compiled Kotlin into Java stubs and then ran apt. That stub generation step was painfully slow. KSP works directly at the Kotlin source level and avoids the stub phase.
Room’s KSP processor scans all classes annotated with @Database, @Dao, and @Entity, then generates the corresponding implementations. Take this @Dao as an example:
@Dao
interface UserDao {
@Query("SELECT * FROM users WHERE age > :minAge")
fun getAdults(minAge: Int): Flow<List<User>>
@Insert(onConflict = OnConflictStrategy.REPLACE)
suspend fun insertUser(user: User)
}
When KSP processes this code, it first parses the SQL string in the @Query annotation. Room has an important design choice that is easy to overlook: compile-time SQL validation. The processor verifies:
- Whether the table name
usersexists in an@Entitydeclaration - Whether the selected columns can map to fields on the
Userdata class - Whether the Java type of bound parameter
:minAgematches the method signature
If the table name is wrong or the fields do not line up, the build fails at compile time. You do not have to wait for a runtime crash.
The generated implementation roughly looks like this:
// UserDao_Impl.java, generated by KSP
@Override
public Flow<List<User>> getAdults(final int minAge) {
final String _sql = "SELECT * FROM users WHERE age > ?";
final RoomSQLiteQuery _statement = RoomSQLiteQuery.acquire(_sql, 1);
_statement.bindLong(1, minAge);
return CoroutinesRoom.createFlow(__db, false,
new String[]{"users"},
new Callable<List<User>>() {
@Override
public List<User> call() throws Exception {
final Cursor _cursor = DBUtil.query(__db, _statement, false, null);
try {
final int _cursorIndexOfId = CursorUtil.getColumnIndex(_cursor, "id");
// ... extract and map each column
final List<User> _result = new ArrayList<>(_cursor.getCount());
while (_cursor.moveToNext()) {
final User _item = new User();
_item.id = _cursor.getLong(_cursorIndexOfId);
_result.add(_item);
}
return _result;
} finally {
_cursor.close();
}
}
});
}
The generated code handles SQL construction, parameter binding, and cursor mapping for us. Every DAO method you write corresponds to a complete generated database interaction.
Why the SupportSQLite layer exists
Room does not call the platform SQLiteDatabase directly. It adds a SupportSQLiteDatabase abstraction. At runtime, the call chain is:
UserDao_Impl.query()
-> RoomDatabase.query()
-> SupportSQLiteDatabase.query()
-> FrameworkSQLiteDatabase.query()
-> SQLiteDatabase.rawQuery()
This extra abstraction has three motivations:
- Testability: Robolectric or a custom in-memory database can replace the real SQLite layer
- Version compatibility: SQLite behavior differs across Android versions, and the abstraction can normalize those differences
- Room for extension: in theory, other storage engines can be integrated behind the same interface
This does not affect day-to-day usage much, but it explains why Room tests often need extra Robolectric setup: tests need to intercept SupportSQLiteDatabase for in-memory testing.
Choosing a migration strategy
Version upgrades are unavoidable in any SQLite-backed project. Room gives you three paths.
Destructive migration deletes and recreates the database, losing all data:
Room.databaseBuilder(context, AppDb::class.java, "app.db")
.fallbackToDestructiveMigration()
.build()
This is appropriate only for cache databases.
Manual Migration means writing the SQL script yourself:
val MIGRATION_1_2 = object : Migration(1, 2) {
override fun migrate(db: SupportSQLiteDatabase) {
db.execSQL("ALTER TABLE users ADD COLUMN nickname TEXT NOT NULL DEFAULT ''")
}
}
It gives you the most control, but it is also easy to introduce deadlocks or foreign-key conflicts. Pair it with MigrationTestHelper for automated validation:
helper.createDatabase(TEST_DB, 1).close()
helper.runMigrationsAndValidate(TEST_DB, 2, true, MIGRATION_1_2)
AutoMigration, introduced in Room 2.4, compares exported schema files and derives migration SQL automatically:
@Database(
entities = [User::class],
version = 2,
autoMigrations = [AutoMigration(from = 1, to = 2)]
)
abstract class AppDb : RoomDatabase()
At compile time, Room exports JSON schema files for each version. AutoMigration compares the diff between adjacent versions and generates DDL. The limitation is clear: it supports only simple DDL such as adding columns, deleting columns, and renaming columns. It cannot handle complex data migrations. In a project that split user data across multiple tables, AutoMigration was not enough, and we eventually went back to manual Migration.
My rule of thumb: use AutoMigration for small projects because it is convenient; use manual Migration plus MigrationTestHelper for business databases because it gives you stronger control.
How Flow reactive queries work
Room DAO methods can return Flow<T>, and new results are pushed automatically when data changes. The core mechanism behind this is InvalidationTracker.
@Query("SELECT * FROM users WHERE age > :minAge")
fun observeAdults(minAge: Int): Flow<List<User>>
When this Flow is collected, Room does two things:
- It runs the query immediately and emits the initial value
- It registers a table-level observer for
userswith InvalidationTracker
After that, any INSERT, UPDATE, or DELETE against users notifies InvalidationTracker. The key design is: Room does not report row-level changes. It marks the whole table as “invalid” and triggers every Flow subscribed to that table to re-query.
Several details are worth paying attention to.
Automatic background scheduling. Room-generated Flows run queries on Dispatchers.IO by default, so callers do not need to switch threads manually.
Cross-table joins. If a DAO uses a JOIN query, the generated code subscribes to every table involved. A write to any of those tables triggers the query again.
Invalidation storms. Suppose Flow<List<ChatMessage>> observes the message table and a chat screen inserts 10 new messages per second. Every INSERT can trigger a new query. Room has an internal debounce window of roughly 100 ms, but high-frequency writes can still cause unnecessary work. In practice, distinctUntilChanged helps reduce duplicate emissions:
dao.observeConversations()
.distinctUntilChanged { old, new ->
old.map { it.id to it.lastMessageTime } ==
new.map { it.id to it.lastMessageTime }
}
.catch { e -> /* Handle errors. */ }
.collect { /* Update the UI. */ }
Mental models to keep
Room’s design reveals several reusable patterns.
Compile time vs runtime. Room shifts SQL validation left into compile time. That is a relatively rare example of compile-time safety in the Android ecosystem. When designing APIs, constraints that can be expressed by the type system should not be left for runtime fallbacks.
Observer plus invalidation notification. Avoid maintaining complicated callback chains between table creation, updates, and queries. A global InvalidationTracker manages table-level invalidation events; the view layer subscribes, and the data layer notifies. The same pattern applies well to cache layers and message synchronization.
Tiered migration strategy. Destructive, manual, and automatic migration are really tradeoffs between “do we need the data” and “how much control do we need.” The same thinking applies to API version upgrades and storage-format changes.