The GCPANet consists of four parts
Combine low-level features and high-level features. 取长补短
Additionally use global context information to help understand the relationship between different objects (ping-pong ball for example), which is beneficial in generate more complete and accurate saliency map.
What's more, global context information helps alleviate the effect of feature dilution.
To fully integrate the three mentioned features.
To better fuse up-sampled high-level features with low-level features, the paper suggests we should use multiplication instead of concatenation, which helps to strengthen the response of salient objects and to suppress the background noise.
To be specific, here is what the paper tells us
\[\mathbf W^t_h = upsample(conv_2(\mathbf f^t_h)) \]\[\mathbf f^t_{hl}=\delta(\mathbf W^t_h\odot \mathbf{ \widetilde f_l^t }) \]\[\mathbf W_l^t = conv_3(\mathbf{\widetilde f_l^t}) \]\[\mathbf f_{lh}^t = \delta(\mathbf W_l^t\odot upsample(\mathbf f_h^t)) \]Introduce the global context features \(\mathbf f_{g}^t\) at each stage.
\[\mathbf W_g^t=upsample(conv_4(f_g^t)) \]\[\mathbf f_{gl}^t=\delta(\mathbf W_g^t \odot \mathbf{\widetilde f_l^t}) \]Concatenate the three features and pass them through a \(3\times 3\) convolution layer to obtain the output.
\[\mathbf f_a^t = conv_5(concat(\mathbf f_{hl}^t,\mathbf f_{lh}^t,\mathbf f_{hl}^t)) \]To reduce the contradictory response of different layers.
To select important and representative features from the output of the top layers, which usually contains much redundant information.
As is mentioned above, it locates following the top layers to process the output of the first layers.
Apply a convolution layer to the input feature maps \(\mathbf F\) to obtain a compressed feature representation \(\mathbf{\widetilde F}\) with 256 channels.
Generate a mask \(\mathbf W\) and bias \(\mathbf{b}\), then we get
\[\mathbf {F_1 = \delta(W\oplus \widetilde F+b)} \]where \(\delta\) represents to the ReLU activation function
Use average pooling to down-sample \(\mathbf F\) into channel-wise feature vector \(\mathbf f\)
Apply 2 successive fully connected layers to \(\mathbf f\) and get an output vector \(\mathbf y\)
Get final output vector \(\mathbf F_{out} = \mathbf F_1 \odot \mathbf y\)
To better understand the relationship between different salient objects, and to alleviate the effect of feature dilution.
Outperform other 12 stage-of-the-art methods on 6 benchmark datasets.
Perform ablation study to prove the effectiveness of the four main part of GCPANet.
I use BJTU HPC platform to run the code.
Had trouble trying to SSH to the server
sol: The platform supports WinSCP, which can pass password to PuTTY. So I can SSH to the server indirectly.
Fail to pass parameter to test.py due to the restriction of BJTU HPC platform.
sol: Replace
sys.argv[1]
with the parameter I'm trying to pass. A better solution would be writing a start.py.
Fail to locate files since working path is redirected to "jobs/xxx"
sol: Add
os.chdir('/data/home/u20281202/SOD/GCPANet-master/')
at the beginning of test.py.
Fail to load the ResNet50
sol: Upload resnet50-19c8e357.pth to model folder and modify the initialize function.
def initialize(self): self.load_state_dict(torch.load('./model/resnet50-19c8e357.pth'), strict=False)(I missed the beginning dot when I thought I was going to successfully run it, only to find the MAE was way too large.)