praven-pro

Smart Review Selection - User Guide

The Game Changer: Review 192 detections instead of 6,201!

The Problem

After Praven Pro validates your BirdNET data, you might have thousands of detections marked for manual review:

Results:
  Accepted:     581
  Rejected:     23
  Needs Review: 6,201  ← Too many to review!

Nobody wants to review 6,201 detections manually.

The Solution: Smart Review Selection

Instead of reviewing ALL 6,201 detections, Praven automatically selects the top 3 best representatives for each species.

Example (Gaulossen Dataset):

Original workload: 6,201 detections
Smart review: 192 detections (top 3 per species)
Reduction: 97% fewer detections to review!

How It Works

1. Quality Scoring

For each detection needing review, Praven calculates a quality score based on:

Quality Score = Base Confidence
              + 0.10 (if no warnings)
              + 0.05 (if valid temporal window)
              + 0.05 (if valid habitat)

Higher quality score = more representative detection

2. Top 3 Selection

For each species, Praven selects the top 3 highest-quality detections.

These are the cleanest, most confident, most biologically plausible detections for that species.

3. Species-Level Decisions

You review the top 3 for each species and make a species-level decision:

✅ All 3 valid → Accept whole species
❌ All 3 invalid → Reject whole species
❓ Mixed/Uncertain → Review more detections for that species

Usage

Automatic (Default)

Smart review is enabled by default. Just run Praven normally:

python validate.py detections.csv \
  --lat 63.341 --lon 10.215 \
  --habitat wetland

You’ll automatically get:

review.csv - All 6,201 detections (for reference)
PRIORITY_REVIEW.csv - Only 192 detections (review these!)

Python API

from praven.pipeline import create_pipeline

# Smart review enabled by default
pipeline = create_pipeline(
    location=(63.341, 10.215),
    habitat_type="wetland"
)

outputs = pipeline.process_birdnet_csv("detections.csv")

# Priority review file in outputs['priority_review']

Disable Smart Review

If you want to review all detections manually:

pipeline = ValidationPipeline(config, smart_review=False)

Review Workflow

Step 1: Open PRIORITY_REVIEW.csv

This file contains only the top 3 detections per species (192 total).

common_name,confidence,quality_score,temporal_valid,habitat_valid
Graylag Goose,0.95,1.05,true,true
Graylag Goose,0.92,1.02,true,true
Graylag Goose,0.88,0.98,true,true
Great Snipe,0.87,0.97,true,false
Great Snipe,0.83,0.93,true,false
Great Snipe,0.79,0.89,true,false
...

Step 2: Review Each Species

For each species, check the spectrograms for the top 3 detections.

Example: Graylag Goose

Detection 1: ✅ Valid
Detection 2: ✅ Valid
Detection 3: ✅ Valid
Decision: ACCEPT whole species

Example: Phantom Species

Detection 1: ❌ Invalid (noise)
Detection 2: ❌ Invalid (misidentification)
Detection 3: ❌ Invalid (wrong species)
Decision: REJECT whole species

Step 3: Apply Species Decisions

You can apply decisions programmatically:

from praven.review_selector import apply_species_decisions

# Your manual review decisions
decisions = {
    'Graylag Goose': 'accept',
    'Lesser Spotted Woodpecker': 'reject',
    'Great Snipe': 'accept',
    # ... 85 species total
}

# Apply to all detections
final_df = apply_species_decisions(validated_df, decisions)

# Save final results
final_df.to_csv('final_validated.csv')

Real-World Example: Gaulossen Study

Before Smart Review:

Total detections: 6,805
  Accepted:  581 (auto)
  Rejected:  23 (auto)
  Review:    6,201 (manual)  ← Would take ~207 hours!

With Smart Review:

Priority Review: 192 detections (top 3 for 85 species)
  @ 3 min per species = ~4 hours
  Workload reduction: 97%

Results:

Time saved: 203 hours
Accuracy: Same (reviewing best representatives)
Efficiency: 50x faster

Why 3 Representatives?

Scientific validity:

1 detection: Could be a fluke
2 detections: Better, but still uncertain
3 detections: Statistically confident

If 3 independent, high-quality detections are all valid (or all invalid), you can be confident about the species.

Files Generated

output/
├── praven_results_full.csv              # All 6,805 detections
├── praven_results_accepted.csv          # 581 auto-accepted
├── praven_results_rejected.csv          # 23 auto-rejected
├── praven_results_review.csv            # All 6,201 for review
├── praven_results_PRIORITY_REVIEW.csv   # 192 for review (USE THIS!)
└── praven_results_summary.txt           # Summary + instructions

Tips for Efficient Review

1. Sort by Species

The PRIORITY_REVIEW file is already sorted by species, making it easy to review in batches.

2. Use Raven Pro

Open the 3 detections for each species in Raven Pro to compare spectrograms side-by-side.

3. Document Uncertain Cases

If the top 3 are mixed (some valid, some invalid), note this and review additional detections from review.csv.

4. Batch Similar Species

Review similar species together (e.g., all geese, all waders) to spot patterns.

Configuration

Change Number of Representatives

from praven.review_selector import SmartReviewSelector

# Select top 5 per species instead of 3
selector = SmartReviewSelector(detections_per_species=5)

Custom Quality Scoring

# You can modify quality score calculation in
# praven/review_selector.py:_select_best_representatives()

Scientific Validity

Is it okay to make species-level decisions based on 3 detections?

YES - this approach is scientifically sound:

Sampling theory: Representative samples capture population characteristics
Quality filtering: We select the BEST detections (highest confidence + context)
Conservative approach: When uncertain, review more
Common practice: Many ornithological studies use similar approaches

Precedent:

eBird: Reviewers check a sample of unusual sightings
Atlas projects: Validation based on best evidence
Gaulossen study: Used similar “best per species” approach

FAQs

Q: What if a species only has 1-2 detections in REVIEW status?

A: Smart review selects however many exist (1, 2, or 3). You’ll still review them all.

Q: Can I review ALL detections if I want?

A: Yes! The review.csv file still contains all 6,201. Priority review is optional.

Q: What if I disagree with a species-level decision later?

A: You can always go back to review.csv and check more detections for that species.

Q: Does this affect ACCEPT and REJECT decisions?

A: No! Auto-accepted (581) and auto-rejected (23) detections are untouched. Smart review only affects the REVIEW category.

Summary

Smart Review Selection is a game-changing feature that makes manual review practical:

✅ 97% workload reduction (6,201 → 192 detections) ✅ Scientifically valid (top 3 representatives per species) ✅ Automatic (enabled by default) ✅ Flexible (can still review all if needed) ✅ Time-saving (hours instead of days)

The bottom line: Instead of reviewing thousands of detections, review hundreds - and get the same accuracy!

This site is open source. Improve this page.