文章

kaggle-imc25 记录

第一次打kaggle比赛因为没什么经验,中间浪费了很多时间和提交次数来了解数据、测试流程和模型部署。

虽然没拿到牌子, 还是学到了不少东西。下次继续努力 :)

总结

  1. 相比于自己搭建框架, 优先去复刻往年金牌的思路,注意其中提到的技巧以及baseline(其实写论文也是如此,站在巨人的肩膀上,不要因为非我发明综合征而去重复造轮子)

  2. 本地score上升并不意味着在test集上也一定会提升

  3. 本地测试时,尤其是聚类带来的score提升往往不那么有效

  4. 在最终deadline来临前再去合并team

尝试过的方案

以下score为提交后结果, 没有固定提交后的随机种子(LB score有一定的随机性)

需要match_exhaustive来解决问题

Q:是如何发现这个问题的

ResultLB scoreStrategy-demo结果及分析是否继续测试
效果最好33.92DINOv2+BLIP,PCA后线性拼接特征进行聚类BLIP的引入能够带来提升 目前的线性融合可以改进(短板目前不在聚类这里):x:
失败N/Allava描述图像, 提取全局特征以提升聚类效果7B模型识别图像能力很差:x:
失败18.36mst聚类(训练集聚类效果提升明显, 但是测试集效果很差,且总得分掉落问题不在聚类这边, 匹配精度不够导致总结得分下降:x:
失败N/AMASt3R精度不够:x:
失败N/AMASt3R + BundleAdjustmentout of disk:heavy_check_mark:
失败N/AVGGT精度不够:x:
失败N/AMASt3R/VGGT + glomap BA问题应该在输入colmap的特征部分,其精度不够:x:
失败15.13主动降低(2)的聚类精度以图提升指标。生成置信度, dropout匹配/BA置信度低的图像可修改drop condition(失败, 置信度和随机dropout camera都不会有提升):x:
效果第二好32.64PCA model 维度从192 -> 384可调整参数测试(调参已经到极限:引入其他特征/匹配方案/修改colmap参数?):heavy_check_mark:
失败20.29superpoint + kdtree检索 + romacolmap失败:x:

过程记录

roma colmap failed

添加了roma后经常遇到的问题:

一组聚类n张图像,最后仅估计出一张图像的位姿。

一个场景有m组聚类结果,那么仅得到m张图像的位姿。

这是roma到colmap这个阶段出现了问题。

调整nn_thresh后,部分数据集能解决这个问题,但是其他数据集也会出现。

例如nn_thresh=5可以解决ETs的问题,但是staris不行。

待验证思路

初期花费了大量的时间聚焦与聚类的改进,但事实证明这部分其实并不重要。(主要是减少计算量方面的提升)

llava in kaggle notebook

llava使用了一个

使用特定的transformers版本

1
!pip install transformers==4.36.2

https://www.kaggle.com/competitions/chaii-hindi-and-tamil-question-answering/discussion/267694#1488794

必须在kaggle notebook的第一个cell中执行。如果需要离线安装,参考:https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/113195

引入llava

1
2
3
4
5
6
7
8
9
10
11
/llava-model-checkpoints
  506  mkdir -p llava-packages
  507  pip download   transformers accelerate sentencepiece bitsandbytes   -d llava-packages
  512  git clone https://github.com/haotian-liu/LLaVA.git llava-code/LLaVA
  513  cd llava-code/LLaVA
  514  rm -rf .git  # 删除 .git 避免浪费空间
  518  mkdir -p
  520  mkdir -p llava-model-checkpoints/
  521  ls
  522  cd llava-model-checkpoints/
下载huggingface的所有内容https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main

结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
llava-offline/
├── llava-packages/
│   ├── transformers-4.51.3-py3-none-any.whl
│   └── ...
├── llava-code/
│   └── LLaVA/
│       ├── llava/
│       ├── setup.py
│       └── ...
└── llava-model-checkpoints/
    └── snapshots/
        └── 1234567890abcdef/
            ├── config.json
            ├── pytorch_model-00001-of-00002.bin
            └── ...

将llava-offline压缩为一个单独的zip文件, 上传为一个单独的kaggle Dataset.

llava测试结果

官方提供的P100和T4*2都不能满足运行条件,T4×2可以分布到两张卡上运行,但是我拉取的最新镜像有bug,双卡llava会运行失败。

而且7B的图像理解能力基本处于一个无法使用的状态

1.appear to be engaged in conversation or listening to a speaker. The room has a dining table in the background, which occupies a significant portion of the space. There are also two chairs visible in the room, one near the center and the other towards the right side. The overall atmosphere of the image suggests a collaborative and social environment.

2.further back. The individuals are of varying heights and positions, indicating that they might be of different ages or genders. They all appear to be focused on the same direction, possibly anticipating an announcement or event. The overall atmosphere of the image suggests that the people are organized and orderly, waiting patiently in line for whatever is about to happen.

Baseline 测试

测试OCTAVI GRAU提供的baseline

DINOv2 提取全局描述符,筛选图像对,shortlisting

ALKIED 检测局部特征点,使用LightGlue进行局部特征匹配。

RANSAC+PnP(pycolmap)计算相对旋转和位移,估计三维姿态

目前的部分问题

  1. 加载图像太慢, 主要问题出在图像对的构建上$C^2_n$的复杂度太高了。在kaggle notebook上如果尝试加载所有的train数据甚至会导致超过内存上限。

能否利用memory减小复杂度? 但速度不是最关键的问题

  1. 聚类精度不够

  2. pose estimation 精度不够

待验证的解决思路

使用dinov2的global desc和BLIP进行聚类

将聚类结果临时保存,按小的dataset加载并使用memory机制进行匹配。

MUSt3R提到使用xformer快速计算注意力?

指标记录

baseline

1
2
3
4
5
6
7
8
9
10
11
12
imc2023_haiper: score=0.00% (mAA=0.00%, clusterness=0.00%)
imc2023_heritage: score=0.00% (mAA=0.00%, clusterness=0.00%)
imc2023_theather_imc2024_church: score=52.24% (mAA=35.36%, clusterness=100.00%)
imc2024_dioscuri_baalshamin: score=62.98% (mAA=48.61%, clusterness=89.41%)imc2024_lizard_pond: score=0.00% (mAA=0.00%, clusterness=0.00%)
pt_brandenburg_british_buckingham: score=0.00% (mAA=0.00%, clusterness=0.00%)
pt_piazzasanmarco_grandplace: score=0.00% (mAA=0.00%, clusterness=0.00%)
pt_sacrecoeur_trevi_tajmahal: score=0.00% (mAA=0.00%, clusterness=0.00%)
pt_stpeters_stpauls: score=0.00% (mAA=0.00%, clusterness=0.00%)
amy_gardens: score=30.90% (mAA=18.27%, clusterness=100.00%)
fbk_vineyard: score=28.03% (mAA=24.19%, clusterness=33.33%)
ETs: score=31.67% (mAA=25.00%, clusterness=43.18%)
stairs: score=6.25% (mAA=3.33%, clusterness=50.00%)

DINO+BLIP 共用test集拟合PCA模型

1
2
3
4
amy_gardens: score=33.23% (mAA=19.92%, clusterness=100.00%)
fbk_vineyard: score=29.50% (mAA=26.46%, clusterness=33.33%)
ETs: score=31.67% (mAA=25.00%, clusterness=43.18%)
stairs: score=5.26% (mAA=2.78%, clusterness=50.00%)

DINO+BLIP fusion(normalize+独立PCA model)

1
2
amy_gardens: score=30.54% (mAA=18.02%, clusterness=100.00%)
ETs: score=36.63% (mAA=30.77%, clusterness=45.24%)

很奇怪, amy_gardens的聚类一直是100%, 属于非常简单的那种, 但是匹配的精度反而降低了。

Test data load

理解错误了.

这种隐藏数据集的比赛提交后的流程时,你提交的notebook先加载他提供的公开test数据进行测试。运行成功并能得到score后,然后才会去真正的测试集上运行并评分。第二阶段的所有输出和打印信息你全部无法看到。

lightglue baseline + PCA 192

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Dataset "ETs" -> Registered 21 / 22 images with 1 clusters
Dataset "amy_gardens" -> Failed!
Dataset "fbk_vineyard" -> Failed!
Dataset "imc2023_haiper" -> Failed!
Dataset "imc2023_heritage" -> Failed!
Dataset "imc2023_theather_imc2024_church" -> Failed!
Dataset "imc2024_dioscuri_baalshamin" -> Failed!
Dataset "imc2024_lizard_pond" -> Failed!
Dataset "pt_brandenburg_british_buckingham" -> Failed!
Dataset "pt_piazzasanmarco_grandplace" -> Failed!
Dataset "pt_sacrecoeur_trevi_tajmahal" -> Failed!
Dataset "pt_stpeters_stpauls" -> Failed!
Dataset "stairs" -> Registered 66 / 51 images with 3 clusters

mst + cluster dropout + too far from center remove

1
2
Dataset ETs -> Registered 19 / 22 images with 2 clusters
Dataset stairs -> Registered 37 / 51 images with 3 clusters

做个实验验证一下我的猜想,重写is_train为False时的数据处理,不再加载sample_test_submission.csv. 而是使用os扫描目录, 然后生成新的csv文件。

结构如官方sample_submission.csv一致

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
if is_train:
    sample_submission_csv = os.path.join(data_dir, 'train_labels.csv')
else:
    # 注释掉原本的加载sample_submission.csv的代码
    # sample_submission_csv = os.path.join(data_dir, 'sample_submission.csv')
    # 使用os扫描data_dir/test/下的所有文件,
    # 结构为
    # |-data_dir
    #   |-test
    #     |-dataset1
    #       |-image1.jpg/png
    #       |-image2.jpg/png
    #       ...
    #     |-dataset2
    #       |-image1.jpg/png
    # 将结果生成一个my_test_submission.csv文件
    # csv第一行写入5列的名字: image_id	dataset	scene	image	rotation_matrix	translation_vector
    # 之后每一行写入 dataset名(dataset1, dataset2), dataset(聚类结果,默认cluster0), image为图像名, rotation_matrix默认为 nan,nan,nan,nan,nan,nan,nan,nan,nan translation_vector默认为nan,nan,nan
    test_dir = os.path.join(data_dir, "test")
    sample_submission_csv = os.path.join(workdir, "my_test_submission.csv")

    rows = []
    for dataset_name in sorted(os.listdir(test_dir)):
        dataset_path = os.path.join(test_dir, dataset_name)
        if not os.path.isdir(dataset_path):
            continue  # 跳过非目录文件
        # 只保留常见的图像后缀
        img_files = [
            f for f in os.listdir(dataset_path)
            if f.lower().endswith((".jpg", ".jpeg", ".png"))
        ]
        for img in sorted(img_files):
            rows.append(
                {
                    "image_id": np.nan,                      # 测试集隐藏,无需填写
                    "dataset": dataset_name,                 # 数据集名称(=子目录名)
                    "scene": "cluster0",                     # 默认占位,可后续替换为真实聚类
                    "image": img,                            # 单纯的文件名(不含路径)
                    "rotation_matrix": " ".join(["nan"] * 9),   # 9 × nan
                    "translation_vector": " ".join(["nan"] * 3) # 3 × nan
                }
            )

    # 按官方列顺序保存
    df = pd.DataFrame(
        rows,
        columns=[
            "image_id",
            "dataset",
            "scene",
            "image",
            "rotation_matrix",
            "translation_vector",
        ],
    )
    df.to_csv(sample_submission_csv, index=False)
    print(f"Generated test submission file with {len(df)} rows → {sample_submission_csv}")

可视化

可视化代码 :arrow_right: CODE

CSV 文件中保存的是世界坐标系到相机坐标系的变换(World2Cam),在可视化时(乘以最优匹配前)需要转换为相机坐标系到世界坐标系的变换(Cam2World)。

\[\mathbf{R}_{w2c}, \mathbf{t}_{w2c} = \text{Load}(\texttt{submission.csv})\] \[\mathbf{R}_{c2w} = \mathbf{R}_{w2c}^\top\] \[\mathbf{t}_{c2w} = -\mathbf{R}_{w2c}^\top \mathbf{t}_{w2c}\] \[[\mathbf{R}_{\text{aligned}} \mid \mathbf{t}_{\text{aligned}}] = \mathbf{T}_{\text{best}} [\mathbf{R}_{c2w} \mid \mathbf{t}_{c2w}]\]

关于Metric的一些验证与思考

单个scene的mAA的计算方式如下:

其中的top_dec默认为3, 得分最高的3个camera被排除


选择了ETs数据集(在3种pipeline下的cluster精度都是100%,排除聚类的干扰)

最优匹配变换transf来自score函数. 保证正确性。

DINOv2+ALIKED+Lightglue Baseline

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
ETs: score=44.78% (mAA=28.85%, clusterness=100.00%)

📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| another_et_another_et001.png|  ❌     ❌     ❌     ❌     ❌     ✅   |   0.04 |  16.67 |
| another_et_another_et002.png|  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| another_et_another_et003.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.06 |   0.00 |
| another_et_another_et004.png|  ❌     ❌     ❌     ❌     ✅     ✅   |   0.02 |  33.33 |
| another_et_another_et005.png|  ❌     ❌     ❌     ✅     ✅     ✅   |   0.01 |  50.00 |
| another_et_another_et006.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.09 |   0.00 |
| another_et_another_et007.png|  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| another_et_another_et008.png|  ❌     ❌     ✅     ✅     ✅     ✅   |   0.01 |  66.67 |
| another_et_another_et009.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.06 |   0.00 |
| another_et_another_et010.png|  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
✅ Scene mAA for ETs/another_ET: 35.71%

📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| et_et000.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.07 |   0.00 |
| et_et001.png            |  ❌     ❌     ✅     ✅     ✅     ✅   |   0.01 |  66.67 | ✅ Removed
| et_et002.png            |  ❌     ❌     ✅     ✅     ✅     ✅   |   0.01 |  66.67 | ✅ Removed
| et_et003.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.12 |   0.00 |
| et_et004.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.07 |   0.00 |
| et_et005.png            |  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| et_et006.png            |  ❌     ❌     ❌     ❌     ❌     ✅   |   0.04 |  16.67 |
| et_et007.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.10 |   0.00 |
| et_et008.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.08 |   0.00 |
✅ Scene mAA for ETs/ET: 4.17%

MASt3R

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
ETs: score=40.00% (mAA=25.00%, clusterness=100.00%)

📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| another_et_another_et001.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.05 |   0.00 |
| another_et_another_et002.png|  ❌     ❌     ❌     ❌     ✅     ✅   |   0.02 |  33.33 |
| another_et_another_et003.png|  ❌     ❌     ❌     ❌     ❌     ✅   |   0.03 |  16.67 |
| another_et_another_et004.png|  ❌     ❌     ❌     ❌     ✅     ✅   |   0.02 |  33.33 |
| another_et_another_et005.png|  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| another_et_another_et006.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.13 |   0.00 |
| another_et_another_et007.png|  ❌     ❌     ❌     ❌     ✅     ✅   |   0.01 |  33.33 | ✅ Removed
| another_et_another_et008.png|  ❌     ✅     ✅     ✅     ✅     ✅   |   0.00 |  83.33 | ✅ Removed
| another_et_another_et009.png|  ❌     ❌     ❌     ❌     ❌     ✅   |   0.03 |  16.67 |
| another_et_another_et010.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.15 |   0.00 |
✅ Scene mAA for ETs/another_ET: 21.43%

📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| et_et000.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.08 |   0.00 |
| et_et001.png            |  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| et_et002.png            |  ❌     ❌     ❌     ❌     ❌     ✅   |   0.03 |  16.67 |
| et_et003.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.06 |   0.00 |
| et_et004.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.06 |   0.00 |
| et_et005.png            |  ❌     ❌     ❌     ❌     ❌     ✅   |   0.02 |  16.67 |
| et_et006.png            |  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| et_et007.png            |  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| et_et008.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.10 |   0.00 |
✅ Scene mAA for ETs/ET: 8.33%

MASt3R+BA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
ETs: score=42.42% (mAA=26.92%, clusterness=100.00%)

📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| another_et_another_et001.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.05 |   0.00 |
| another_et_another_et002.png|  ❌     ❌     ❌     ✅     ✅     ✅   |   0.01 |  50.00 | ✅ Removed
| another_et_another_et003.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.04 |   0.00 |
| another_et_another_et004.png|  ❌     ❌     ❌     ❌     ❌     ✅   |   0.04 |  16.67 |
| another_et_another_et005.png|  ❌     ❌     ❌     ✅     ✅     ✅   |   0.01 |  50.00 | ✅ Removed
| another_et_another_et006.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.13 |   0.00 |
| another_et_another_et007.png|  ❌     ❌     ❌     ❌     ✅     ✅   |   0.01 |  33.33 |
| another_et_another_et008.png|  ❌     ❌     ❌     ❌     ✅     ✅   |   0.02 |  33.33 |
| another_et_another_et009.png|  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| another_et_another_et010.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.12 |   0.00 |
✅ Scene mAA for ETs/another_ET: 17.86%

 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| et_et000.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.07 |   0.00 |
| et_et001.png            |  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| et_et002.png            |  ❌     ❌     ❌     ❌     ❌     ✅   |   0.03 |  16.67 |
| et_et003.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.06 |   0.00 |
| et_et004.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.06 |   0.00 |
| et_et005.png            |  ❌     ❌     ❌     ❌     ❌     ✅   |   0.02 |  16.67 |
| et_et006.png            |  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| et_et007.png            |  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| et_et008.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.10 |   0.00 |
✅ Scene mAA for ETs/ET: 8.33%

MASt3R + BA + shared_intrinsics=Fase

ETs: score=7.41% (mAA=3.85%, clusterness=100.00%)

VGGT

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
ETs: score=14.29% (mAA=7.69%, clusterness=100.00%)

📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| another_et_another_et001.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.10 |   0.00 |
| another_et_another_et002.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.07 |   0.00 |
| another_et_another_et003.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.04 |   0.00 |
| another_et_another_et004.png|  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| another_et_another_et005.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.10 |   0.00 |
| another_et_another_et006.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.16 |   0.00 |
| another_et_another_et007.png|  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| another_et_another_et008.png|  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
| another_et_another_et009.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.05 |   0.00 |
| another_et_another_et010.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   0.14 |   0.00 |
✅ Scene mAA for ETs/another_ET: 0.00%

📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| et_et000.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.14 |   0.00 |
| et_et001.png            |  ❌     ❌     ✅     ✅     ✅     ✅   |   0.01 |  66.67 | ✅ Removed
| et_et002.png            |  ❌     ❌     ❌     ✅     ✅     ✅   |   0.01 |  50.00 | ✅ Removed
| et_et003.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.19 |   0.00 |
| et_et004.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.12 |   0.00 |
| et_et005.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.07 |   0.00 |
| et_et006.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.11 |   0.00 |
| et_et007.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   0.06 |   0.00 |
| et_et008.png            |  ✅     ✅     ✅     ✅     ✅     ✅   |   0.00 | 100.00 | ✅ Removed
✅ Scene mAA for ETs/ET: 0.00%

无三角化BA

记录对比,以检查是否有提升

1
2
3
4
5
6
7
8
9
10
11
12
13
📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| another_et_another_et001.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   4.38 |   0.00 |
| another_et_another_et002.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.13 |   0.00 |
| another_et_another_et004.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.13 |   0.00 |
| another_et_another_et005.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   4.14 |   0.00 |
| another_et_another_et006.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   1.32 |   0.00 | ✅ Removed
| another_et_another_et007.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   1.36 |   0.00 | ✅ Removed
| another_et_another_et008.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.25 |   0.00 |
| another_et_another_et009.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   5.04 |   0.00 |
| another_et_another_et010.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   7.41 |   0.00 |
✅ Scene mAA for ETs/another_ET: 0.00%
1
2
3
4
5
6
7
8
9
10
11
12
13
📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| et_et000.png            |  ❌     ❌     ❌     ❌     ✅     ✅   |   0.01 |  33.33 | ✅ Removed
| et_et001.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   7.32 |   0.00 |
| et_et002.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   4.45 |   0.00 |
| et_et003.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   9.96 |   0.00 |
| et_et004.png            |  ❌     ❌     ❌     ❌     ✅     ✅   |   0.02 |  33.33 | ✅ Removed
| et_et005.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   7.35 |   0.00 |
| et_et006.png            |  ❌     ❌     ✅     ✅     ✅     ✅   |   0.01 |  66.67 | ✅ Removed
| et_et007.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   2.35 |   0.00 |
| et_et008.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   9.09 |   0.00 |
✅ Scene mAA for ETs/ET: 0.00%

修正三角化BA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| another_et_another_et001.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   4.38 |   0.00 |
| another_et_another_et002.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.13 |   0.00 |
| another_et_another_et003.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   2.97 |   0.00 | ✅ Removed
| another_et_another_et004.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.13 |   0.00 |
| another_et_another_et005.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   4.14 |   0.00 |
| another_et_another_et006.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   1.32 |   0.00 | ✅ Removed
| another_et_another_et007.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   1.36 |   0.00 | ✅ Removed
| another_et_another_et008.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.25 |   0.00 |
| another_et_another_et009.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   5.04 |   0.00 |
| another_et_another_et010.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   7.41 |   0.00 |
✅ Scene mAA for ETs/another_ET: 0.00%
1
2
3
4
5
6
7
8
9
10
11
12
13
📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| et_et000.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   4.18 |   0.00 |
| et_et001.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   3.13 |   0.00 | ✅ Removed
| et_et002.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   2.61 |   0.00 | ✅ Removed
| et_et003.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   4.68 |   0.00 |
| et_et004.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   2.38 |   0.00 | ✅ Removed
| et_et005.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   4.33 |   0.00 |
| et_et006.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   4.54 |   0.00 |
| et_et007.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   5.39 |   0.00 |
| et_et008.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   7.13 |   0.00 |
✅ Scene mAA for ETs/ET: 0.00%

BA 无效三角化 inv(对保存的RT求逆)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| another_et_another_et001.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   4.38 |   0.00 |
| another_et_another_et002.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.13 |   0.00 |
| another_et_another_et003.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   2.97 |   0.00 | ✅ Removed
| another_et_another_et004.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.13 |   0.00 |
| another_et_another_et005.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   4.14 |   0.00 |
| another_et_another_et006.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   1.32 |   0.00 | ✅ Removed
| another_et_another_et007.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   1.36 |   0.00 | ✅ Removed
| another_et_another_et008.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   3.25 |   0.00 |
| another_et_another_et009.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   5.04 |   0.00 |
| another_et_another_et010.png|  ❌     ❌     ❌     ❌     ❌     ❌   |   7.41 |   0.00 |
✅ Scene mAA for ETs/another_ET: 0.00%
1
2
3
4
5
6
7
8
9
10
11
12
13
📊 Per-image registration table:
| Image Name              |  0.00   0.01   0.01   0.01   0.02   0.04 | Error  | mAA (%) | Removed?
--------------------------------------------------------------------------------------------------
| et_et000.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   9.78 |   0.00 |
| et_et001.png            |  ❌     ❌     ❌     ❌     ❌     ✅   |   0.02 |  16.67 | ✅ Removed
| et_et002.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   5.99 |   0.00 |
| et_et003.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   4.31 |   0.00 |
| et_et004.png            |  ❌     ❌     ❌     ❌     ❌     ✅   |   0.03 |  16.67 | ✅ Removed
| et_et005.png            |  ❌     ❌     ❌     ❌     ✅     ✅   |   0.01 |  33.33 | ✅ Removed
| et_et006.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   4.57 |   0.00 |
| et_et007.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   7.92 |   0.00 |
| et_et008.png            |  ❌     ❌     ❌     ❌     ❌     ❌   |   3.69 |   0.00 |
✅ Scene mAA for ETs/ET: 0.00%

坐标系

在验证坐标系的正确性时也耗费了大量的时间,在引入baseline前应该添加可视化验证。

关于cv.findFundamentalMat

https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html

关于score中的随机性

来自于colmap

1
2
It was pointed out that this randomness comes from the reconstruction process. You need to set "num_threads": 1 in pycolmap.IncrementalPipelineOptions to stabilize the result for submission.
Ref: https://www.kaggle.com/competitions/image-matching-challenge-2024/discussion/494921

关于mast3r和VGGT baseline的一些分析

只能恢复相对尺度,缺少绝对距离约束。

metric中的Horn +RANSAC能够做一次相似变换,该变换只应用到相机的中心,并且对极端的outlier敏感。

local BA的参数不够?

为什么baseline可以?是因为三角化?

一些代码方面的问题和tips

区分本地和kaggle环境

添加该代码, 自动区分本地和kaggle环境,防止扰乱本地环境。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import os

# 自动判断是否在Kaggle环境运行
IN_KAGGLE = os.path.exists("/kaggle/input/")

# 设置本地或Kaggle的包路径
if IN_KAGGLE:
    ROOT_DIR = "/kaggle/"
else:
    ROOT_DIR = "./kaggle/"  # 本地相对路径
    os.environ['TORCH_HOME'] = './.cache/torch'  # 不影响系统级路径
    os.environ["CUDA_VISIBLE_DEVICES"] = "5"
    
ROOT_DIR = "/kaggle" if IN_KAGGLE else "./kaggle"
PKG_DIR = f"{ROOT_DIR}/input/imc2024-packages-lightglue-rerun-kornia"
ALIKED_MODEL = f"{ROOT_DIR}/input/aliked/pytorch/aliked-n16/1/aliked-n16.pth"
LIGHTGLUE_MODEL = f"{ROOT_DIR}/input/lightglue/pytorch/aliked/1/aliked_lightglue.pth"

ROMA_DIR = f"{ROOT_DIR}/input/roma-utils/roma-utils/RoMa"

# 离线模型路径
SUPERPOINT_MODEL = f"{ROOT_DIR}/input/superpoint-v1-checkpoints/superpoint_v1.pth"

ROMA_MODEL_PATH = f"{ROOT_DIR}/input/roma-utils/roma-utils/checkpoints"

TORCH_CACHE = "/root/.cache/torch/hub/checkpoints" if IN_KAGGLE else "./.cache/torch/hub/checkpoints"
1
2
3
4
5
6
7
8
!pip install --no-index {PKG_DIR}/* --no-deps
!mkdir -p {TORCH_CACHE}
!cp {ALIKED_MODEL} {TORCH_CACHE}/aliked-n16.pth
!cp {LIGHTGLUE_MODEL} {TORCH_CACHE}/aliked_lightglue.pth
!cp {LIGHTGLUE_MODEL} {TORCH_CACHE}/aliked_lightglue_v0-1_arxiv-pth

# Superpoint
!cp {SUPERPOINT_MODEL} {TORCH_CACHE}/superpoint_v1.pth

kaggle “no internet” 下pip安装依赖

预先将依赖的包下载好,上传为dataset

使用

1
pip download <package_name> --dest <directory>

会在”directory”目录下得到所有需要的whl包, 在kaggle中

1
!pip install --no-index {PKG_DIR}/* --no-deps

会自动安装其下的所有包和对应的依赖包

使用kaggle命令行工具查看submit

1
2
3
4
5
6
7
8
9
10
kaggle competitions submissions -c image-matching-challenge-2025
fileName        date                        description                                                                                    status                     publicScore  privateScore
--------------  --------------------------  ---------------------------------------------------------------------------------------------  -------------------------  -----------  ------------
submission.csv  2025-05-29 09:02:24.747000  Notebook MASt3R-Baseline+localBA | Version 4                                                   SubmissionStatus.COMPLETE
submission.csv  2025-05-29 07:17:25.817000  Notebook mst+DBSCAN_cluster_improve | Version 8                                                SubmissionStatus.COMPLETE  13.05
submission.csv  2025-05-29 07:01:17.557000  Notebook Baseline2-param + cluster improve | Version 1                                         SubmissionStatus.PENDING
submission.csv  2025-05-29 06:35:05.563000  Notebook MASt3R-Baseline+localBA | Version 3                                                   SubmissionStatus.COMPLETE
submission.csv  2025-05-28 13:27:01         Notebook MASt3R-Baseline+localBA | Version 2                                                   SubmissionStatus.COMPLETE
submission.csv  2025-05-28 09:08:02         Notebook Baseline-Test:DINO+ALIKED+LightGLUE a667f0 | Version 1                                SubmissionStatus.COMPLETE  26.84
submission.csv  2025-05-28 08:46:47         Notebook MASt3R-Baseline+localBA | Version 1                                                   SubmissionStatus.COMPLETE
本文由作者按照 CC BY 4.0 进行授权