[논문 리뷰] source free domain adaptation via distribution estimation

논문 스터디 2023. 6. 22. 17:21

어디까지나 뇌피셜인 블로그

논문 링크 : https://arxiv.org/pdf/2204.11257.pdf

Motivation

domain adaptation 은 source dataset으로 train된 model을 unlabeled target dataset에 사용하려는 task이다. source dataset과 real-world는 다르므로 source dataset으로 train된 모델을 general하게 사용하기는 어렵기 때문에 관심을 얻고 있다. domain adaptation은 곧 source dataset의 distribution과 target dataset의 distribution이 다른 문제를 말하는 domain shift problem을 푸는 것과 같다.

그 중에서도 source dataset에 접근하지 않고 source dataset으로 train된 model의 weight만을 사용해 domain adaptation 하는 task를 source free domain adaptation이라고 하고, 이 논문에서는 source domain의 distribution을 estimation 하는 방법을 사용한다.

먼저 선행되어야 하는 notation들을 소개한다.

source dataset 을 $D s = (x s i, y s i) n s i = q <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>D</mi><mi>s</mi></msub><mo>=</mo><msubsup><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><msubsup><mi>x</mi><mi>i</mi><mi>s</mi></msubsup><mo>,</mo><msubsup><mi>y</mi><mi>i</mi><mi>s</mi></msubsup><mo stretchy="false">)</mo></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mi>q</mi></mrow><mrow data-mjx-texclass="ORD"><msub><mi>n</mi><mi>s</mi></msub></mrow></msubsup></math>$ , target dataset $D t = (x t i) n t i = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>D</mi><mi>t</mi></msub><mo>=</mo><msubsup><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><msubsup><mi>x</mi><mi>i</mi><mi>t</mi></msubsup><mo stretchy="false">)</mo></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><msub><mi>n</mi><mi>t</mi></msub></mrow></msubsup></math>$ , linear classifier $G (\cdot) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>G</mi><mo stretchy="false">(</mo><mo>\cdot</mo><mo stretchy="false">)</mo></math>$ , CNN feature extractor를 $F (\cdot) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi><mo stretchy="false">(</mo><mo>\cdot</mo><mo stretchy="false">)</mo></math>$ , m dimensional feature representation을 $f = F (x) \in R m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo>=</mo><mi>F</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>\in</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>m</mi></msup></math>$ , trained model $G (F (\cdot)) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>G</mi><mo stretchy="false">(</mo><mi>F</mi><mo stretchy="false">(</mo><mo>\cdot</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo></math>$ , weights learned by $G <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>G</mi></math>$ 를 $w G <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>w</mi><mi>G</mi></msup></math>$ , $k <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$ -th weight vector of $w G <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>w</mi><mi>G</mi></msup></math>$ 를 $w G k <math xmlns="http://www.w3.org/1998/Math/MathML"><msubsup><mi>w</mi><mi>k</mi><mi>G</mi></msubsup></math>$ 라고 하겠다.

따라서 linear classifier $G <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>G</mi></math>$ 에 의해 predict된 class label은 아래와 같이 나타내어 진다.

$^y i = arg max <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mover><msub><mi>y</mi><mi>i</mi></msub><mo stretchy="false">^</mo></mover></mrow><mo>=</mo><mi>arg</mi><mo data-mjx-texclass="NONE"></mo><munder><mo data-mjx-texclass="OP" movablelimits="true">max</mo><mi>k</mi></munder><mrow data-mjx-texclass="ORD"><msubsup><mi>f</mi><mi>i</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="normal">⊤</mi></mrow></msubsup><msubsup><mi>w</mi><mi>k</mi><mi>G</mi></msubsup></mrow></math>$

각 element는 feature와 weight vector의 dot product이고, k-th class의 data는 k-th weight vector of $G$ 를 activate하는 feature representation을 도출한다. 따라서 $w_k^G$ 는 k-th class를 나타내는 anchor 라고 볼 수 있다.

Method

1) Pseudo-labeling by exploiting anchors

본 논문에서는 spherical k-means 를 통해 target data를 cluster하고 pseudo label 을 부여한다. initial center 는 anchor들로 선정한다 : $A_{k}^{(0)} = w_{k}^{G}$ . ${\hat{y}}_{i}^{t} = \arg min_{k} D i s t (A_{k}^{(m)}, f_{i}^{t})$ 로 m번째 iteration의 cluster center와 $f_{i}^{t}$ 의 cosine distance를 최소화하는 k 가 target data의 pseudo label이 된다. 그리고 initial center는 anchor였으나 ${\hat{y}}_{i}^{t} = k$ 이면 $f_{i}^{t}$ 쪽으로 옮겨가게 된다.

2) Source distribution estimation

SFDA task는 traditional DA 와 달리 source data의 distritubution을 알지 못하므로, 본 논문에서는 source feature distribution을 estimation 한다. 첫 번째로, source data의 feature representation은 class-conditioned multivariate Gaussian distribution을 따른다고 가정한다 : $f_{i, k}^{s} N_{k}^{s} (μ_{k}^{s}, σ_{k}^{s})$ , where $f_{i, k}^{s} = F (x_{i}^{s} | y_{i}^{s} = k)$ 여기서 $μ_{k}^{s}$ 가 feature representation 의 center이고 covariance matrix 는 feature의 variation 과 rich semantic information 을 나타낸다..고 하는데 feature variation까지는 이해가 가는데 rich semantic information은 이해가 잘 안간다.

아무튼 surrogate distribution $N_{k}^{s u r} ({\hat{μ}}_{k}^{s}, {\hat{σ}}_{k}^{s})$ 으로 source distribution $N_{k}^{s}$ 을 근사한다. ${\hat{μ}}_{k}^{s}$ 를 구하기 위해 feature mean을 directly 사용하면 domain distribution shift problem 을 반영하지 못하므로, anchor 를 사용하여 surrogate source distribution의 estimator mean을 calibrate한다.

${\hat{μ}}_{k}^{s} = | | {\bar{f_{k}^{t}}}_{2} | | \cdot \frac{w_{k}^{G}}{| | w_{k}^{G} | |^{2}}$

여기서 의미하는 바는 estimated source feature mean의 scale은 target feature의 scale과 같고 direction은 anchor 와 같다는 것이다. 그리고 covariance matrix는 이전의 연구들에서 class conditioned covariance 가 activated semantic direction과 서로다른 feature channel간의 correlation을 나타낸다는 것을 밝혔다고 한다.. ;; 읽어봐야겠다. 왜? 직관적으로 잘 와닿지 않는다.

본 논문에서는 target feature의 intra-class semantic information 이 source 와 roughly consistent한다고 가정한다.

따라서 source covariance의 estimator 를 target feature의 statistics로 구한다.

${\hat{Σ}}_{k}^{s} = γ \cdot Σ_{k}^{t} = γ \cdot \frac{f_{k}^{t} \cdot {f_{k}^{t}}^{⊤}}{\sum 1 ({\hat{y}}_{i}^{t} = k)}$

where $f_{k}^{t} = [f_{1, k}^{t} - {\bar{f}}_{k}^{t}, . . ., f_{i, k}^{t} - {\bar{f}}_{k}^{t}, . . .]$ 인 matrix whose columms are centralized target features of k-th class in ${D^{'}}_{t}$

아무튼 $γ$ 를 controlling coefficient 로 놓고 sampled surrogate features의 sampling range와 semantic diversity를 조절한다고 한다.

anchors와 target features를 이용하여, $K$ class-conditioned surrogate source distributions

$N_{k}^{s u r} (| | {\bar{f}}_{t}^{k} | |_{2} \frac{w_{k}^{G}}{| | w_{k}^{G} | |_{2}}, \frac{γ \cdot f_{k}^{t} \cdot {f_{k}^{t}}^{⊤}}{\sum 1 ({\hat{y}}_{i}^{t} = k)}), k \in C$

를 derive한다. where we can sample surrogate features ${f_{k}}^{s u r} \sim {N_{k}}^{s u r} ({\hat{μ}}_{k}^{s}, {\hat{Σ}}_{k}^{s})$

3) Source-free domain adaptation

이전 section에서, pretrained model에 보존된 domain knowledge를 이용하여 accessing source data 없이 source distribution 을 estimate 했다. 따라서 이제 surrogate source data인 estimated distritubution으로부터 data 를 sample하여 SFDA 문제는 traditional DA 문제로 바뀌게 된다. 여기서 Contrastive Domain Discrepancy (CDD) 를 targe domain과 estimated source distribution을 explicitly align 하기 위해 사용한다.

'논문 스터디' 카테고리의 다른 글

[논문 리뷰] On Layer Normalization in the Transformer Architecture (2)	2023.09.24
[논문 리뷰] Understanding and Improving Layer Normalization (0)	2023.09.21
[논문리뷰] Do Bayesian Neural Networks Need To Be Fully Stochastic? (0)	2023.09.15
[논문 리뷰] VoxelNet (0)	2023.08.26
[논문 리뷰] Flatformer (0)	2023.07.05

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인기포스트

ABOUT ME

ddangchong ddangchong

Motivation

Method

'논문 스터디' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

인기포스트

ABOUT ME

Motivation

Method

'논문 스터디' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역