[Project-Android app] Shelfie (1)

업데이트: March 28, 2026

Why I Built This

Click the logo!

Even when you know a book’s call number, you still end up scanning every spine by eye. I wanted the camera to do that for you — type in a title, point at the shelf, and let the app tell you exactly where it is.

How It Works

Enter the book title on the home screen.
Tap Scan to open the camera.
Pan slowly across the shelf. OCR runs on each frame and extracts any visible text.
When extracted text matches the query above a similarity threshold, the app draws a bounding box around the spine and triggers a haptic buzz.

Tech Stack

Area	Technology
Framework	Flutter (Dart)
OCR	Google ML Kit Text Recognition v2
State management	Riverpod (`StateNotifier`)
Camera	`camera` package, YUV_420_888 → NV21 conversion
Languages	English, Korean, Chinese, Japanese

Real-Time OCR Pipeline

Running OCR on every frame kills performance. The fix is straightforward: skip frames, and drop any frame that arrives while the previous one is still being processed.

static const int _frameSkipInterval = 2;

void _processFrame(CameraImage image) async {
  _frameSkipCounter++;
  if (_frameSkipCounter % _frameSkipInterval != 0) return;
  if (state.isProcessing) return;
  // ...
}

YUV → NV21 Conversion

Android delivers camera frames in YUV_420_888. ML Kit wants NV21. The conversion merges three separate Y/U/V planes into a single interleaved byte array, with two code paths depending on pixelStride:

if (uvPixelStride == 2) {
  // V plane is already interleaved — copy rows directly
  for (int row = 0; row < uvHeight; row++) {
    final srcOffset = row * vPlane.bytesPerRow;
    nv21.setRange(destIndex, destIndex + uvWidth * 2,
        vPlane.bytes.buffer.asUint8List(...));
    destIndex += uvWidth * 2;
  }
} else {
  // Manually interleave V and U bytes
  for (int row = 0; row < uvHeight; row++) {
    for (int col = 0; col < uvWidth; col++) {
      nv21[destIndex++] = vPlane.bytes[vIndex];
      nv21[destIndex++] = uPlane.bytes[uIndex];
    }
  }
}

Text Matching

OCR output is noisy — lighting, typefaces, and camera angle all introduce errors. Fuzzy matching handles this in three stages:

Exact containment — if the recognized text directly contains the query, score 0.8–1.0.
Word-level matching — for multi-word queries, check whether each word appears individually in the result.
Levenshtein distance — if neither above rule fires, fall back to edit-distance similarity.

The minimum threshold is 0.55; anything below that is silently ignored.

Haptic Feedback

A match triggers a vibration, but vibrating on every frame while the camera is held still would be annoying. The logic is:

Match found → already buzzed? → No  → vibrate + set flag
                              → Yes → skip

Match gone for 5 consecutive frames → reset flag

Vibration intensity also scales with confidence:

Similarity	Duration	Amplitude
≥ 95% (exact)	300 ms	max
≥ 75% (strong)	200 ms	medium
< 75%	100 ms	light

Project Structure

lib/
├── main.dart
├── app.dart                      # theme, routing
├── models/
│   ├── search_query.dart         # query + mode (library / store)
│   └── match_result.dart         # match result + bounding box
├── features/
│   ├── search/
│   │   └── search_screen.dart
│   ├── scanner/
│   │   ├── scanner_screen.dart
│   │   ├── scanner_provider.dart         # state + frame handling
│   │   ├── text_recognizer_service.dart  # ML Kit wrapper
│   │   ├── text_matcher_service.dart     # fuzzy matching
│   │   └── overlay_painter.dart          # bounding box overlay
│   └── update/
│       └── ...
└── services/
    └── update_service.dart

Each feature directory holds its UI, business logic, and service layer side by side — no global utils/ dumping ground.

What’s Next

It’s MVP quality but genuinely useful in a real library. Things I want to add:

Multi-query — search for several books in one scan
Search history
Barcode scanning as a fallback

Source is on GitHub.

Twitter Facebook LinkedIn

Seungwoo Lim

[Project-Android app] Shelfie (1)

Why I Built This

How It Works

Tech Stack

Real-Time OCR Pipeline

YUV → NV21 Conversion

Text Matching

Haptic Feedback

Project Structure

What’s Next

공유하기

댓글남기기

참고

[Quant] Alpha#101

[Paper review and Reproduction] Mapping News Narrative Using LLMs and Narrative-Structured Text Embeddings

[Paper review and Reproduction] Topotext

A Practical Guide to Image Processing with PIL and OpenCV (2)