Instructions to use 0xgr3y/Arch-Building-Image-Classification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use 0xgr3y/Arch-Building-Image-Classification with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://0xgr3y/Arch-Building-Image-Classification") - Notebooks
- Google Colab
- Kaggle
- Fine-Grained Image Classification of World Architecture: An EfficientNetV2-S Transfer Learning Approach with Layered Regularization
Fine-Grained Image Classification of World Architecture: An EfficientNetV2-S Transfer Learning Approach with Layered Regularization
Architectural Building Image Classifier
Fine-Grained Image Classification (FGIC) of world architectural buildings using CNN transfer learning with EfficientNetV2-S, enhanced with GeM Pooling, Focal Loss, Discriminative AdamW (LR), Stochastic Weight Averaging (SWA), Grad-CAM explainability, and calibration analysis.
| Architecture | EfficientNetV2-S + GeM Pooling + Focal Loss + SWA |
| Task | Fine-Grained Image Classification (FGIC) |
| Test Accuracy | 97.77% |
| Classes | 8 (barn, bridge, castle, mosque, skyscraper, stadium, temple, windmill) |
| Input Size | 320 × 320 pixels |
| Parameters | 23,350,633 |
| Framework | TensorFlow / Keras 3 |
| License | Apache-2.0 |
Model Description
A fine-grained image classification model for world architectural buildings. Built on EfficientNetV2-S pretrained on ImageNet, enhanced with GeM Pooling (learnable generalized mean pooling), Focal Loss, Discriminative AdamW and Stochastic Weight Averaging (SWA). Extended with Grad-CAM explainability visualization, ROC-AUC evaluation, ECE calibration analysis, and t-SNE embedding visualization.
Key architectural contributions:
- GeM Pooling (Radenovic et al., CVPR 2018) — replaces global average pooling with a learnable power parameter (p=3.0) that emphasizes high-activation features, yielding stronger discriminative representations for FGIC tasks
- Focal Loss (Lin et al., ICCV 2017, gamma=2.0) — down-weights well-classified examples to focus gradient updates on hard-to-classify building pairs
- DiscriminativeAdamW LR — extends AdamW with per-variable LR scaling on block6 (×0.1) via update_step override, combined with selective fine-tuning (block6+top_conv unfrozen, BN frozen). LR scaling produces truly discriminative updates — block6 variables receive 10× smaller learning rate than head variables
- SWA with BN re-estimation (Izmailov et al., UAI 2018) — 10-epoch post-training weight averaging with constant LR 1e-4, followed by 100-step batch normalization statistics re-estimation
- Grad-CAM (Selvaraju et al., ICCV 2017) — gradient-weighted class activation mapping for explainability, targeting top_conv (last Conv2D layer)
- ECE Calibration (Guo et al., ICML 2017) — Expected Calibration Error with 15-bin reliability diagram to assess prediction confidence reliability
Architecture
Input (320, 320, 3)
│
EfficientNetV2-S (ImageNet pretrained, 513 layers, 20.33M params)
│
Conv2D(256, 3×3, ReLU, padding=same) → 2,949,376 params
BatchNormalization → 1,024 params
MaxPooling2D(2×2) → 0 params
│
GeM Pooling(p=3.0, eps=1e-6, learnable) → 1 param
│
Dense(256, ReLU) → 65,792 params
BatchNormalization → 1,024 params
Dropout(0.4) → 0 params
│
Dense(8, Softmax) → 2,056 params
│
Output (8 classes)
| Component | Output Shape | Parameters |
|---|---|---|
| EfficientNetV2-S (Functional) | (None, 10, 10, 1280) | 20,331,360 |
| Conv2D 256 3×3 | (None, 10, 10, 256) | 2,949,376 |
| BatchNormalization | (None, 10, 10, 256) | 1,024 |
| MaxPooling2D 2×2 | (None, 5, 5, 256) | 0 |
| GeM Pooling p=3.0 | (None, 256) | 1 |
| Dense 256 ReLU | (None, 256) | 65,792 |
| BatchNormalization | (None, 256) | 1,024 |
| Dropout 0.4 | (None, 256) | 0 |
| Dense 8 Softmax | (None, 8) | 2,056 |
| Total | 23,350,633 | |
| Trainable (Phase 1) | 3,018,249 (11.51 MB) | |
| Trainable (Phase 2) | 17,810,225 (67.94 MB) | |
| Non-trainable (Phase 1) | 20,332,384 (77.56 MB) |
Performance
Overall Metrics
| Metric | Value |
|---|---|
| Test Accuracy | 97.77% |
| Validation Accuracy (SWA) | 98.36% |
| Test-Time Augmentation | 97.99% |
| Test Loss | 0.4262 |
| Overfitting Gap (Train − Test) | 2.11% |
| Macro Avg Precision | 0.9777 |
| Macro Avg Recall | 0.9777 |
| Macro Avg F1-Score | 0.9777 |
| Top-2 Accuracy | 99.26% |
| Top-3 Accuracy | 99.70% |
| Macro ROC-AUC (OvR) | 0.9985 |
| ECE (15 bins) | 0.1204 (pre-T-scaling; post-T-scaling: 0.0053, T=0.54) |
Per-Class Results
| Class | Precision | Recall | F1-Score | AUC (OvR) | Support |
|---|---|---|---|---|---|
| barn | 0.9760 | 0.9702 | 0.9731 | 0.9950 | 168 |
| bridge | 0.9591 | 0.9762 | 0.9676 | 0.9983 | 168 |
| castle | 0.9763 | 0.9821 | 0.9792 | 0.9996 | 168 |
| mosque | 0.9763 | 0.9821 | 0.9792 | 0.9987 | 168 |
| skyscraper | 0.9940 | 0.9940 | 0.9940 | 0.9999 | 168 |
| stadium | 0.9820 | 0.9762 | 0.9791 | 0.9999 | 168 |
| temple | 0.9816 | 0.9524 | 0.9668 | 0.9976 | 168 |
| windmill | 0.9765 | 0.9881 | 0.9822 | 0.9987 | 168 |
| Macro Avg | 0.9777 | 0.9777 | 0.9777 | 0.9985 | 1,344 |
Model Selection
Four candidate models were evaluated on the validation set:
| Checkpoint | Val Accuracy | Val Loss | Description |
|---|---|---|---|
head_training.keras |
92.34% | 1.0109 | Phase 1 checkpoint (backbone frozen) |
fine_tuning.keras |
96.28% | 0.5655 | Phase 2 checkpoint (block6+top_conv unfrozen) |
fine_tuning_ema.keras |
93.53% | 0.6007 | Phase 2 EMA (per-step Polyak averaging) |
fine_tuning_swa.keras |
98.36% | 0.4109 | SWA averaged weights ← SELECTED |
SWA Progression
| SWA Epoch | Val Accuracy | Val Loss |
|---|---|---|
| 1 | 95.76% | 0.5831 |
| 2 | 97.62% | 0.5116 |
| 3 | 97.69% | 0.4748 |
| 4 | 96.95% | 0.4390 |
| 5 | 97.47% | 0.4490 |
| 6 | 97.84% | 0.4416 |
| 7 | 98.14% | 0.4055 |
| 8 | 97.32% | 0.4359 |
| 9 | 97.02% | 0.4519 |
| 10 | 97.54% | 0.4226 |
| SWA + BN (final) | 98.36% | 0.4109 |
Training Details
Training Strategy
Two-phase progressive training with SWA post-processing:
| Phase | Description | Backbone | Optimizer | LR | Max Epochs | Actual Epochs | CutMix+Mixup | FocalLoss LS |
|---|---|---|---|---|---|---|---|---|
| Phase 1 — Feature Extraction | Train custom head only | Frozen (all) | AdamW (wd=2e-5) | 0.001 + CosineDecay + Warmup 3ep | 25 | 1 | Yes (50/50 alternation) | 0.1 |
| Phase 2 — Selective Fine-Tuning | Load head_training → fine-tune | block6 + top_conv unfrozen (BN frozen) | DiscriminativeAdamW (block6=0.1×) | 3e-4 + CosineDecay + Warmup 5ep | 50 | 1 + 10 SWA | No | 0.05 |
¹ Phase 1 stops when
val_accuracy ≥ 85%threshold (myCallback).
² Phase 2 stops when
val_accuracy ≥ 92%threshold (myCallback), followed by 10 SWA epochs (constant LR 1e-4).
Hyperparameters
| Parameter | Phase 1 | Phase 2 |
|---|---|---|
| Optimizer | AdamW | DiscriminativeAdamW |
| Learning Rate | 0.001 | 3×10⁻⁴ |
| LR Schedule | WarmupCosineDecay (warmup=3) | WarmupCosineDecay (warmup=5) |
| Weight Decay | 2×10⁻⁵ | 2×10⁻⁵ |
| LR Multiplier (block6) | — | 0.1× (LR scaling via update_step, truly discriminative) |
| LR Multiplier (top_conv+head) | — | 1.0× |
| Loss | FocalLoss (gamma=2.0, LS=0.1) | FocalLoss (gamma=2.0, LS=0.05) |
| Batch Size | 32 | 32 |
| Early Stopping Patience | 7 | 12 |
| myCallback Threshold | val_acc ≥ 0.85 | val_acc ≥ 0.92 |
| EMA Decay (per-step) | 0.999 | 0.999 |
| SWA Epochs | — | 10 (post-training) |
| SWA LR | — | 1×10⁻⁴ (constant) |
| BN Re-estimation Steps | — | 100 |
| CutMix (alpha=1.0) | Yes (50% batches) | No |
| Mixup (alpha=0.2) | Yes (50% batches) | No |
| Hardware | 2× Tesla T4 (MirroredStrategy) | 2× Tesla T4 (MirroredStrategy) |
Regularization Strategy
| Technique | Implementation | Reference |
|---|---|---|
| Transfer Learning | EfficientNetV2-S backbone frozen in Phase 1 | Yosinski et al., NeurIPS 2014 |
| Selective Fine-Tuning | Unfreeze block6+top_conv only, BN stays frozen | Howard & Ruder, ACL 2018 |
| Discriminative LR Scaling | block6 LR×0.1 via update_step (truly discriminative — 10× smaller updates for pretrained features) | Howard & Ruder, ACL 2018 |
| CutMix + Mixup | Alternation per batch (50/50), Phase 1 only | Yun et al., ICCV 2019; Zhang et al., ICLR 2018 |
| Focal Loss | gamma=2.0, down-weights easy examples | Lin et al., ICCV 2017 |
| Label Smoothing | 0.1 (Phase 1) → 0.05 (Phase 2) | Szegedy et al., CVPR 2016 |
| GeM Pooling | p=3.0 learnable, replaces GAP | Radenovic et al., CVPR 2018 |
| Dropout | 0.4 after Dense(256)+BN | Srivastava et al., JMLR 2014 |
| Batch Normalization | After Conv2D and Dense; frozen during fine-tuning | Ioffe & Szegedy, arXiv 2015 |
| EMA (per-step) | Shadow weights, decay=0.999, Polyak averaging | Tarvainen & Valpola, NeurIPS 2017 |
| SWA | 10-epoch post-training, constant LR 1e-4 | Izmailov et al., UAI 2018 |
| Data Augmentation | Rotation ±15°, shift ±10%, shear ±0.1 rad, zoom ±20%, brightness 0.75–1.15, channel shift ±10.0, horizontal flip | Perez & Wang, arXiv 2017 |
| Random Erasing | p=0.5, area [0.02–0.15], aspect [0.3–3.3], applied pre-normalization | Zhong et al., AAAI 2020 |
| Test-Time Augmentation | 6 augmentation variants, averaged | Shanmugam et al., ICML 2020 |
| WarmupCosineDecay | Linear warmup + cosine annealing | Loshchilov & Hutter, ICLR 2017 (SGDR) |
| Early Stopping | Patience 7 (Phase 1) / 12 (Phase 2) | Prechelt, Neural Networks 1998 |
Dataset
See the dataset curation page for World Architectural Buildings Dataset for Multi‑Class Image Classification — 13,440 images (8 classes × 1,680, balanced) sourced from Pexels with perceptual (pHash) and exact (SHA256) deduplication.
| Split | Images | Percentage |
|---|---|---|
| Train | 10,752 | 80% |
| Validation | 1,344 | 10% |
| Test | 1,344 | 10% |
Data Preprocessing
- Normalization:
preprocess_inputfromtf.keras.applications.efficientnet_v2(ImageNet distribution) - Input resolution: 320×320 (higher than ImageNet default 224×224 to capture fine-grained architectural details — textures, ornaments, facade patterns)
- Augmentation: Applied to training set only; validation and test sets use clean preprocessing
- Split method:
splitfolders.ratiofromdataset/, seed=42
Files
| Category | Files |
|---|---|
| Model (best) | fine_tuning_swa.keras (227 MB) · .weights.h5 (158 MB) · .safetensors (157 MB) |
| Code | build_model.py (21 KB) — architecture + CLI inference |
| Config | config.json · label_mapping.json · preprocessor_config.json |
| Evaluation | calibration_data.json · model_benchmark.json · confusion_pairs.json · class_confidence_stats.json · temperature_config.json |
| Deployment | saved_model/ (183 MB) · tflite/ (88 MB) · tfjs_model/ (90 MB, 23 shards) |
| Results | results/ — 12 PNG (training curves, confusion matrix, ROC, t-SNE, Grad-CAM, etc.) |
| Archive | models_keras/ — 3 checkpoints (head_training, fine_tuning, fine_tuning_ema) |
Usage
Gradio Space
Try the live building classifier: Architecture Building Image Classifier with Space
Python — build_model.py (recommended)
build_model.py is a standalone module that provides:
- Custom class definitions (
GeMPooling,FocalLoss,DiscriminativeAdamW) with@register_keras_serializable— importing the module registers all custom classes globally, soload_model()works without explicitcustom_objects. ArchBuildingClassifier— high-level wrapper class withbuild(),from_weights(),from_keras(),predict(),predict_batch()methods.CUSTOM_OBJECTSdict — fallback for explicitcustom_objects=inload_model().build_model()— backward-compatible function that returns a rawtf.keras.Model.
Upload build_model.py to the same directory as your script or add it to PYTHONPATH.
Note: Filenames below use
fine_tuning_swaas an example. The actual best checkpoint filename depends on training results — check the repo for the actual.keras,.weights.h5, and.safetensorsfilenames.
from build_model import ArchBuildingClassifier
from huggingface_hub import hf_hub_download
# Download weights (clean format)
weights_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "fine_tuning_swa.weights.h5")
# Load model: architecture + weights
clf = ArchBuildingClassifier.from_weights(weights_path)
# Inference
from PIL import Image
import numpy as np
label, confidence, top3 = clf.predict(Image.open("skyscraper_00000.jpg"))
print(f"Predicted: {label} ({confidence:.1%})")
for cls, prob in top3:
print(f" {cls}: {prob:.1%}")
Python — TF-Lite (fastest inference)
import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download
from PIL import Image
import json
try:
from tensorflow.keras.applications.efficientnet_v2 import preprocess_input
except (ImportError, ModuleNotFoundError):
from tensorflow.keras.applications.efficientnet import preprocess_input
# Download
model_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "tflite/model.tflite")
labels_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "label_mapping.json")
with open(labels_path) as f:
LABELS = json.load(f)["labels"]
interpreter = tf.lite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320))
arr = np.expand_dims(preprocess_input(
np.array(img, dtype=np.float32)), axis=0)
interpreter.set_tensor(input_details[0]["index"], arr)
interpreter.invoke()
preds = interpreter.get_tensor(output_details[0]["index"])[0]
top3_idx = np.argsort(preds)[::-1][:3]
for i in top3_idx:
print(f" {LABELS[i]}: {preds[i]*100:.1f}%")
Python — Keras (convenient)
import build_model # registers custom classes via @register_keras_serializable
import tensorflow as tf
from huggingface_hub import hf_hub_download
try:
from tensorflow.keras.applications.efficientnet_v2 import preprocess_input
except (ImportError, ModuleNotFoundError):
from tensorflow.keras.applications.efficientnet import preprocess_input
from PIL import Image
import numpy as np
import json
model_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "fine_tuning_swa.keras")
labels_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "label_mapping.json")
model = tf.keras.models.load_model(model_path, compile=False) # custom_objects not needed
with open(labels_path) as f:
LABELS = json.load(f)["labels"]
img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320))
arr = np.expand_dims(preprocess_input(np.array(img, dtype=np.float32)), axis=0)
preds = model.predict(arr, verbose=0)[0]
print(f"Predicted: {LABELS[np.argmax(preds)]} ({np.max(preds)*100:.1f}%)")
Python — SavedModel (TF Serving)
from huggingface_hub import snapshot_download
import tensorflow as tf
import numpy as np
from PIL import Image
try:
from tensorflow.keras.applications.efficientnet_v2 import preprocess_input
except (ImportError, ModuleNotFoundError):
from tensorflow.keras.applications.efficientnet import preprocess_input
snapshot_download("0xgr3y/Arch-Building-Image-Classification", allow_patterns=["saved_model/*"], local_dir=".")
# Load SavedModel (created via model.export() — inference-only, no custom_objects needed)
loaded = tf.saved_model.load("saved_model")
img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320))
arr = tf.constant(np.expand_dims(preprocess_input(np.array(img, dtype=np.float32)), axis=0))
preds = loaded(arr).numpy()[0]
top3_idx = np.argsort(preds)[::-1][:3]
for i in top3_idx:
print(f" Class {i}: {preds[i]*100:.1f}%")
Python — safetensors (HF standard, cross-framework)
Note: safetensors stores raw weight tensors without architecture metadata. To load, reconstruct the architecture with
build_model.pyfirst, then map tensors manually. For most use cases,.weights.h5(viaArchBuildingClassifier.from_weights()) is simpler and equally clean.
from safetensors.numpy import load_file
from build_model import ArchBuildingClassifier
from PIL import Image
# Reconstruct architecture
clf = ArchBuildingClassifier.build()
# Load safetensors tensors
tensors = load_file("fine_tuning_swa.safetensors")
# Map tensors to model weights (iterate layers, not .variables — Keras 3 compatible)
for layer in clf.keras_model.layers:
for w in layer.weights:
name = w.name.replace(':', '_').replace('/', '_')
if name in tensors:
w.assign(tensors[name])
# Inference
label, confidence, top3 = clf.predict(Image.open("skyscraper_00000.jpg"))
Inference Verification
Keras vs TFLite consistency was verified on 8 random test samples (1 per class):
| Metric | Result |
|---|---|
| Keras correct | 7/8 (88%) |
| TFLite correct | 7/8 (88%) |
| Keras vs TFLite match | 8/8 (100%) — identical predictions |
| Keras inference speed | 358.0 ms |
| TFLite inference speed | 170.0 ms |
The 1 misclassification (castle→barn, 65% confidence) is consistent with the 97.77% test accuracy. The 8/8 match confirms TFLite conversion preserves model behavior exactly.
Security Notice (PAIT-KERAS-301)
The .keras files in this repository are flagged "Unsafe" by Protect AI Guardian (threat: PAIT-KERAS-301). This is a structural false positive, not a malware detection:
- What the scanner checks: String-matching of
class_namefields in the Keras v3 config against a whitelist of built-in Keras layers. - Why flagged: The model contains a custom layer (
GeMPooling) — a non-standard class name triggers the flag. - What it does NOT check: The scanner does not analyze the Python code of the custom class, does not look for
eval()/exec()/os.system(), and does not detect actual malware. - Other scanners: VirusTotal, JFrog, HF Picklescan — all clean. Only Protect AI flags this file.
The custom classes are safe and open source:
GeMPooling— Generalized Mean Pooling (Radenovic et al., CVPR 2018). Pure tensor ops:tf.pow,tf.reduce_mean,tf.maximum.FocalLoss— Focal Loss (Lin et al., ICCV 2017). Pure tensor ops.DiscriminativeAdamW— AdamW subclass with gradient scaling. No file I/O, no network calls, no arbitrary code.
Full source code for all custom classes is available in build_model.py and the training notebook for public audit.
Multi-Format Deployment Guide
With model is provided in multiple formats to suit different deployment scenarios. Formats marked ✓ are not flagged by Protect AI (no custom class serialization).
| Format | File | Size | Protect AI | Inference Speed | Best For |
|---|---|---|---|---|---|
| TF-Lite ✓ | tflite/model.tflite |
~88 MB | ✓ Safe | 170.0 ms (fastest) | Mobile, edge, embedded, HF Space |
| SavedModel ✓ | saved_model/ |
~183 MB | ✓ Safe | — | TensorFlow Serving, cloud backend |
| TFJS ✓ | tfjs_model/ |
~90 MB | ✓ Safe | — | Browser, Node.js (no backend) |
| Weights H5 ✓ | fine_tuning_swa.weights.h5 |
~158 MB | ✓ Safe | — | Programmatic load via build_model.py |
| safetensors ✓ | fine_tuning_swa.safetensors |
~157 MB | ✓ Safe | — | HF standard, cross-framework |
| Build Script ✓ | build_model.py |
~21 KB | ✓ Safe | — | Architecture reconstruction + load_weights() |
| Keras ⚠️ | fine_tuning_swa.keras |
~227 MB | ⚠️ Flagged | 358.0 ms | Developer reference, fine-tuning |
Load Examples
See Usage section above for complete load + inference examples for each format.
Intended Use
- Architectural style classification from building photographs
- Educational tool for architecture recognition
- Research baseline for fine-grained image classification (FGIC)
- Transfer learning experiments on architectural imagery
Limitations
- Trained on Pexels stock photography — performance may differ on user-generated or field photographs
- Limited to 8 architectural classes (barn, bridge, castle, mosque, skyscraper, stadium, temple, windmill)
- Confusion pair analysis found 0 significant pairs (threshold >5%) — all 8 classes are well-distinguished by the model; see
confusion_pairs.jsonfor details - Barn and windmill share 3 cross-class duplicates (0.02% of dataset) — left as-is due to negligible impact
- Inference confidence can be low on atypical examples
Ethical Considerations
- All training images sourced from Pexels.com under the Pexels License (free for commercial use, no attribution required). No copyrighted or personally identifiable images were used.
- The dataset contains only photographs of buildings and structures — no people, faces, or private property are the subject of classification.
- The model reflects the visual distribution of Pexels stock photography, which may over-represent Western and iconic architectural styles and under-represent vernacular or regional architecture.
- The 8 class categories are broad and do not capture the full diversity of world architecture. Results should not be used to make definitive claims about architectural categorization.
- URL pattern filtering during dataset collection explicitly excluded AI-generated art, illustrations, and non-photographic content to ensure authenticity.
Links
- Gradio Space (Live): arch-building-classifier Space
- Dataset Studio: 0xgr3y/arch-building-dataset
- GitHub Repo: arcxteam/building-architectural-image-classifier
References
- Tan, M., & Le, Q. V. (2021). EfficientNetV2: Smaller Models and Faster Training. ICML 2021. arXiv:2104.00298
- Radenovic, F., Tolias, G., & Chum, O. (2018). Fine-Tuning CNN Image Retrieval with No Human Annotation. IEEE TPAMI. arXiv:1711.02512
- Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2017). Focal Loss for Dense Object Detection. ICCV 2017. arXiv:1708.02002
- Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., & Wilson, A. G. (2018). Averaging Weights Leads to Wider Optima and Better Generalization. UAI 2018. arXiv:1803.05407
- Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization. ICLR 2018. arXiv:1710.09412
- Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. ICCV 2019. arXiv:1905.04899
- Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. CVPR 2016. arXiv:1512.00567
- Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How Transferable Are Features in Deep Neural Networks? NeurIPS 2014. arXiv:1411.1792
- Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. ACL 2018. arXiv:1801.06146
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR, 15(56), 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html
- Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv preprint. arXiv:1502.03167
- Tarvainen, A., & Valpola, H. (2017). Mean Teachers are Better Role Models: Weight-averaged Consistency Targets Improve Semi-supervised Deep Learning Results. NeurIPS 2017. arXiv:1703.01780
- Perez, L., & Wang, J. (2017). The Effectiveness of Data Augmentation in Image Classification using Deep Learning. arXiv preprint. arXiv:1712.04621
- Shanmugam, D., Blalock, D., Balakrishnan, G., Guttag, J., & Sarma, A. (2020). Towards Principled Test-Time Augmentation. ICML 2020. PDF
- Loshchilov, I., & Hutter, F. (2017). SGDR: Stochastic Gradient Descent with Warm Restarts. ICLR 2017. arXiv:1608.03983
- Prechelt, L. (1998). Automatic Early Stopping Using Cross Validation: Quantifying the Criteria. Neural Networks, 11(4), 761–767. https://doi.org/10.1016/S0893-6080(98)00010-0
- Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. ICML 2017. arXiv:1706.04599
- Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. ICCV 2017. arXiv:1610.02391
- van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. JMLR, 9(Nov), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html
- Hand, D. J., & Till, R. J. (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning, 45(2), 171–186. https://doi.org/10.1023/A:1010920819831
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. IJCV, 115(3), 211–252. arXiv:1409.0575
- Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. NeurIPS 2017. arXiv:1612.01474
Citation
@misc{saugani2026_arch_building,
title={Fine-Grained Image Classification of World Architecture:
An EfficientNetV2-S Transfer Learning Approach with Layered Regularization},
author={Saugani},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/0xgr3y/Arch-Building-Image-Classification}
}
- Downloads last month
- 377
Dataset used to train 0xgr3y/Arch-Building-Image-Classification
Space using 0xgr3y/Arch-Building-Image-Classification 1
Papers for 0xgr3y/Arch-Building-Image-Classification
EfficientNetV2: Smaller Models and Faster Training
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Fine-tuning CNN Image Retrieval with No Human Annotation
Averaging Weights Leads to Wider Optima and Better Generalization
Universal Language Model Fine-tuning for Text Classification
Evaluation results
- Test Accuracy on arch-building-datasettest set self-reported0.978
- Validation Accuracy (SWA) on arch-building-datasettest set self-reported0.984
- TTA Accuracy on arch-building-datasettest set self-reported0.980
- Macro F1 on arch-building-datasettest set self-reported0.978
- Macro Precision on arch-building-datasettest set self-reported0.978
- Macro Recall on arch-building-datasettest set self-reported0.978
- Macro ROC-AUC (OvR) on arch-building-datasettest set self-reported0.999









