Singapore - Challenges and Progress in Bioacoustics

Introduction

This report synthesizes the current literature on training bird sound classification models in data-sparse regional contexts with an emphasis on Singapore or similar Southeast Asian biodiversity hotspots. Overall, the reviewed literature indicates that while many studies develop robust machine learning frameworks using deep convolutional neural networks (CNNs), transfer learning, and semi-supervised techniques for bird sound classification, only a subset of these works explicitly focus on or have been evaluated using Singapore-specific soundscape recordings.

1. General Frameworks and Methods in Data-Sparse Contexts

A number of recent studies address the common challenges inherent in bioacoustic monitoring when labeled audio data are scarce. Researchers have proposed solutions that leverage transfer learning from large global datasets, employ data augmentation techniques, and utilize weak or semi-supervised learning approaches to overcome the limited availability of high-quality regional audio samples. For example, Bellafkir et al. (2023) describe a CNN-based approach that uses data augmentation methods such as the addition of background noise and selective mixup to mitigate class imbalances that naturally arise in long-tail distributions. Their methodology—developed originally for challenging soundscapes that include overlapping environmental noises and weak labels—is broadly applicable to scenarios with limited training data, such as those encountered in Southeast Asian urban and forest environments.

Deep transfer learning has been widely adopted to address the data limitation problem. Das et al. (2023) review various CNN architectures, including ResNet, Inception-v3, VGG, MobileNet, DenseNet, and EfficientNet, and detail how pre-trained networks (often on large-scale image or audio datasets) can be fine-tuned on much smaller regional datasets. These approaches considerably reduce the need for large quantities of labeled examples and decrease the training time while improving accuracy. Although the reviewed literature does not always include explicit evaluations using Singapore-specific data, the underlying methodologies show promise for adaptation to any data-sparse, biodiversity-rich region.

In addition to transfer learning, several groups have explored semi-supervised and weakly supervised approaches. For instance, Caprioli's (2022) work on semi-supervised classification using the FixMatch algorithm demonstrates that even with as few as 11 labeled samples per class, significant accuracy improvements can be attained compared to purely supervised baselines. These techniques are particularly relevant in contexts where field-collected labeled audio is extremely limited, such as in remote or urban ecosystems.

2. Challenges in Contexts Like Singapore

Many studies echo similar challenges in limited regional training data, especially when the targeted species are either rare or difficult to record. Common issues include:

Limited Availability of High-Quality Labeled Audio Data. In the field, particularly in highly diverse tropical settings like Singapore, obtaining long-duration recordings with high signal-to-noise ratios is challenging. Critically endangered or resident species often have very sparse recordings, and collector efforts are constrained by practical limitations as well as the fine-grained differences in acoustic patterns (Bellafkir et al., 2023).
High Acoustic Similarity Among Coexisting Species. The coexistence of many species within a small geographical area results in considerable overlap in acoustic signals. This situation demands models that can differentiate subtle variations in call structure and temporal vocalization patterns, which in turn requires inventive signal processing and advanced network architectures capable of capturing both spectral and temporal features (Das et al., 2023).
Environmental Noise and Overlapping Vocalizations. Urban and forest settings in Southeast Asia, including Singapore, are characterized by background noises from both natural and anthropogenic sources. This not only complicates the audio signal but also increases the chance of false positives. Several studies address these problems through sophisticated post-processing techniques, such as class-wise threshold calibrations and confidence penalization for overly common species (Bellafkir et al., 2023).

3. Efforts Focused on Singaporean Soundscapes

When it comes to directly addressing the Singapore context, there is clear evidence of published work that has explicitly targeted Singapore's unique acoustic environment. Notably, Hexeberg et al. (2025) propose a semi-supervised classification approach for bird vocalizations that is explicitly tested using recordings from Singapore. Their work involves acoustic recordings from local sites—including the Singapore Botanic Gardens—collected over extended periods. This study demonstrates the feasibility of detecting species with overlapping vocalizations and noisy backgrounds in an urban tropical environment by achieving reasonable precision on classes even when trained on very few labeled samples. In further experiments, their classifier is shown to reflect natural diurnal and nocturnal behavioral patterns and to manage site-specific variations effectively. Such results provide strong validation that semi-supervised approaches can be adapted successfully to manage the synthesis of regional data in Singapore.

Additionally, while Bellafkir et al. (2023) do not explicitly test their model on Singapore data, they note that the efficiency of their pipeline in handling noisy, imbalanced, and weakly labeled datasets makes it well-suited for deployment in tropical urban and forest settings such as Singapore. Thus, although not every study targets Singapore directly, many proposed frameworks are clearly applicable to similar contexts in Southeast Asia.

Another indirect contribution comes from studies that have been conducted in neighboring regions—such as the SiulMalaya dataset for Malaysia—which similarly face challenges of high species diversity and background noise in lowland forests (Jamil et al., 2023). While the SiulMalaya dataset is specific to Malaysia, the underlying machine learning approaches can offer valuable insights for adapting models to the Singaporean context, given the ecological and acoustic similarities among tropical Southeast Asian habitats.

4. Mitigation Strategies in Data-Sparse Settings

Several strategies are recommended to address data scarcity in regional contexts such as Singapore:

Transfer Learning and Region-Specific Fine-Tuning. Numerous studies recommend using globally trained models and then fine-tuning them on localized, limited datasets. The use of pre-trained backbones such as EfficientNet (Ansar et al., 2024) and ResNet (Zhong et al., 2021) has been demonstrated to be beneficial when adapting to rare or underrepresented bird sounds. Although many works do not explicitly mention Singapore, the strategies described can be directly applied by incorporating local recording data during the fine-tuning stage.
Semi-Supervised and Weakly-Supervised Learning. In environments where obtaining numerous labels is impractical, semi-supervised approaches that combine small amounts of labeled data with large volumes of unlabeled data provide a promising avenue. Hexeberg et al.'s (2025) work on semi-supervised classification in Singapore demonstrates that even with minimal labeled data, reasonable performance levels can be attained. Similarly, Caprioli's (2022) study on the FixMatch algorithm shows further improvements when pseudo-labels are generated from unlabeled recordings.
Data Augmentation and Synthetic Data Generation. To supplement sparse datasets, augmentation methods such as time stretching, pitch shifting, and mixup are frequently applied. Bellafkir et al. (2023) incorporate several augmentation techniques designed to simulate the presence of background noise and overcome class imbalances. These practices are crucial in developing models that remain robust against the acoustic variability typical of urban and forested soundscapes in Singapore.

5. Gaps in the Current Literature

While several frameworks have been successfully applied in data-sparse contexts throughout Southeast Asia, there remains a relative paucity of models that have been exclusively trained and validated on Singapore-specific data. Das et al. (2023) note that many deep transfer learning-based studies rely on datasets sourced from global repositories such as Xeno-canto and BirdCLEF, and that there is no explicit reference to Singaporean bird audio recordings within these published efforts. This observation highlights a meaningful gap in the literature: the need for dedicated Singapore-specific datasets and models that account for the unique acoustic challenges, species assemblages, and environmental characteristics within Singapore. Future work in this area could benefit from developing comprehensive local datasets, combining the strengths of both supervised and unsupervised methods, and leveraging metadata (e.g., spatial-temporal information) that might further differentiate the subtle acoustic nuances among local species.

6. Conclusion

In summary, while there is a substantial body of literature that presents methodologies for training bird sound classification models in data-sparse contexts, only select studies have directly tackled the Singapore setting. The work of Hexeberg et al. (2025) clearly demonstrates that a semi-supervised classification approach using Singapore-specific recordings is feasible even in the presence of noise, overlapping calls, and limited labels. Other frameworks developed by Bellafkir et al. (2023) and those based on deep transfer learning by Das et al. (2023) offer methods that are inherently adaptable to Singapore; however, they often rely on data not collected specifically from Singapore. The collective literature emphasizes the importance of harnessing advanced signal processing, ensemble learning, and fine-tuning strategies to address key challenges such as high species diversity, overlapping acoustic signals, and environmental noise. Although Singapore-specific efforts exist and show promising results, there remains a notable gap in published research dedicated solely to Singaporean bird song classification systems. Addressing this gap through continued development of localized data repositories and the adaptation of semi-supervised and transfer learning methods will enhance the accuracy and usability of these systems for biodiversity monitoring in tropical urban and forest contexts.

This review confirms that, indeed, there is published research concerning Singapore's bird sound classification – most notably the contributions by Hexeberg et al. (2025) – while many global models described in the literature provide methodologies that can be further extended and fine-tuned using local Singapore data. Continued efforts in integrating local recordings with established global models will ultimately lead to more accurate and region-sensitive bioacoustic monitoring systems in Southeast Asia.

References

Ansar, W., Chatterjee, A., Goswami, S., & Chakrabarti, A. (2024). An efficientnet-based ensemble for bird-call recognition with enhanced noise reduction. SN Computer Science, 5, 265. https://doi.org/10.1007/s42979-023-02591-6

Bellafkir, H., Vogelbacher, M., Schneider, D., Kizik, V., Mühling, M., & Freisleben, B. (2023). Bird species recognition in soundscapes with self-supervised pre-training. Communications in Computer and Information Science, 60-74. https://doi.org/10.1007/978-3-031-46338-9_5

Caprioli, E. (2022). A semi-supervised approach to bird song classification [Master's thesis, Norwegian University of Science and Technology]. NTNU Open. https://hdl.handle.net/11250/3093609

Das, N., Padhy, N., Dey, N., Bhattacharya, S., & Tavares, J. M. R. S. (2023). Deep transfer learning-based automated identification of bird song. International Journal of Interactive Multimedia and Artificial Intelligence, 8, 33. https://doi.org/10.9781/ijimai.2023.01.003

Hexeberg, S., Chitre, M., Hoffmann‐Kuhnt, M., & Low, B. W. (2025). Semi-supervised classification of bird vocalizations. ArXiv. https://doi.org/10.48550/arxiv.2502.13440

Jamil, N., Norali, A. N., Ramli, M. I., Shah, A. K. M. K., & Mamat, I. (2023). Siulmalaya: an annotated bird audio dataset of malaysia lowland forest birds for passive acoustic monitoring. Bulletin of Electrical Engineering and Informatics, 12, 2269-2281. https://doi.org/10.11591/beei.v12i4.5243

Zhong, M., Taylor, R., Bates, N., Christey, D., Basnet, H., Flippin, J., Palkovitz, S., Dodhia, R., & Ferres, J. L. (2021). Acoustic detection of regionally rare bird species through deep convolutional neural networks. Ecological Informatics, 64, 101333. https://doi.org/10.1016/j.ecoinf.2021.101333