praven-pro

Praven Pro - Gaulossen Validation Results

Date: October 22, 2025 Dataset: Gaulossen Nature Reserve acoustic monitoring (October 13-15, 2025) Validation Mode: Rule-based (habitat + temporal + weather)

Summary

Praven Pro successfully validated 6,805 BirdNET detections from the Gaulossen study, automatically identifying 23 biologically impossible false positives.

Results

Metric Count Percentage
Total Detections 6,805 100.0%
Auto-Accepted 1,173 17.2%
Auto-Rejected 23 0.3%
Needs Review 5,609 82.4%

Manual Verification (Reference)

Auto-Rejected Detections (23 total)

Praven Pro identified 5 species with biologically impossible detections:

1. Lesser Spotted Woodpecker (14 rejections)

Rejection reasons:

Example rejection:

Species: Lesser Spotted Woodpecker
Time: 2025-10-13 23:19:00
Confidence: 0.78
Reason: Habitat mismatch + Temporal impossibility (diurnal species at night)

2. European Storm-Petrel (4 rejections)

Rejection reason:

3. Manx Shearwater (3 rejections)

Rejection reason:

4. Bar-headed Goose (1 rejection)

Rejection reason:

5. Western Capercaillie (1 rejection)

Rejection reason:

Validation Logic

Habitat Validation

Temporal Validation

Native Region Validation

Weather Activity Model

Effectiveness

Species-Level Results

Detection-Level Efficiency

Key Findings

1. Nocturnal Woodpeckers (10 detections)

The most important catch - 10 Lesser Spotted Woodpecker detections between 22:52 and 04:25 were automatically rejected due to temporal impossibility. Woodpeckers are strictly diurnal and cannot vocalize at night.

2. Oceanic Birds Inland (7 detections)

Storm-Petrels and Shearwaters detected 100m inland were rejected as pelagic species - these typically do not occur in inland wetlands.

3. Habitat Mismatches (18 detections)

Forest-specialist species (woodpeckers, capercaillie) detected in open wetland were flagged as habitat mismatches.

4. Non-native Species (1 detection)

Bar-headed Goose (native to Himalayan region) was identified as non-native to Europe.

Comparison to Manual Verification

The manually verified dataset (4,108 detections, 82 species) was used as the reference standard. Praven Pro’s auto-rejections represent a separate validation layer that identifies biologically implausible detections based on taxonomic rules:

Files Generated

praven-pro/
├── gaulossen_validated_fast.csv          # All 6,805 detections with validation
├── gaulossen_auto_accepted.csv           # 1,173 auto-accepted detections
├── gaulossen_auto_rejected.csv           # 23 auto-rejected false positives
└── gaulossen_needs_review.csv            # 5,609 detections for human review

Usage

To reproduce these results:

cd /Users/georgeredpath/Dev/mcp-pipeline/shared/praven-pro
python3 examples/validate_gaulossen_fast.py

Next Steps

1. Get eBird API Key

2. Add More Species to Database

Current database covers 77+ species. Expand with:

Edit: praven/data/species_db.json

3. Train Custom Weather Model

Use your verified dataset to train species-specific weather-activity models:

from praven.models import WeatherActivityModel

model = WeatherActivityModel()
model.train(gaulossen_verified_data, save_path="gaulossen_weather.pkl")

4. Integration with Analysis Pipeline

Integrate Praven Pro as Stage 1 validation:

  1. BirdNET detection → Raw detections
  2. Praven Pro → Auto-reject impossible species
  3. Human verification → Review remaining detections
  4. Final dataset → Verified detections

Conclusion

Praven Pro identified 23 biologically implausible detections in the Gaulossen dataset using automated validation rules. The system flagged:

Development Context: This test was conducted on a single wetland study. The system requires extensive development for broader real-world deployment, including validation on multiple datasets, expanded species coverage, and testing across diverse habitats and geographic regions.


Dataset: Gaulossen Nature Reserve, Norway (63.341°N, 10.215°E) Recording period: October 13-15, 2025 (48.8 hours) Conditions: 80% rain coverage, heavy fog, 7-11°C Validation: Praven Pro v1.0 (rule-based)