VC虚函数布局引发的问题

技术2022-05-19 32

在网上看到一个非常热的帖子，里面是这样的一个问题：

#include <iostream> using namespace std; class Base { public: int m_base; virtual void f() { cout << "Base::f" << endl; } virtual void g() { cout << "Base::g" << endl; } }; class Derive : public Base { int m_derived; }; typedef void(*Fun)(void); void main() { Derive d; Fun pFun = (Fun)*((int*)*(int*)(&d)+0); printf("&(Base::f): 0x%x /n", &(Base::f)); printf("&(Base::g):0x%x /n", &(Base::g)); printf("pFun: 0x%x /n", pFun); pFun(); } #include <iostream>using namespace std;class Base { public: int m_base; virtual void f() { cout << "Base::f" << endl; } virtual void g() { cout << "Base::g" << endl; }};class Derive : public Base{ int m_derived;};typedef void(*Fun)(void);void main(){ Derive d; Fun pFun = (Fun)*((int*)*(int*)(&d)+0); printf("&(Base::f): 0x%x /n", &(Base::f)); printf("&(Base::g):0x%x /n", &(Base::g)); printf("pFun: 0x%x /n", pFun); pFun();}

在打印的时候发现pFun的地址和 &(Base::f)的地址竟然不一样太奇怪了？经过一番深入研究，终于把这个问题弄明白了。下面就来一步步进行剖析。

根据VC的虚函数的布局机制，上述的布局如下：

然后我们再细细的分析第一种方式：

Fun pFun = (Fun)*((int*)*(int*)(&d)+0);

(&d)取得的是类对象d的地址。而在32位机上指针的大小是4字节，因此*(int*)(&d)取得的是vfptr，即虚表的地址。从而*((int*)*(int*)(&d)+0)是虚表的第1项，也就是Base::f()的地址。事实上我们得到了验证，程序运行结果如下：

这说明虚表的第一项确实是虚函数的地址，上面的VC虚函数的布局也确实木有问题。

但是，接下来就引发了一个问题，为什么&(Base::F）和PFun的值会不一样呢？既然PFun的值是虚函数f的地址，那&(Base::f)又是什么呢？带着这个问题，我们进行了反汇编。

printf("&(Base::f): 0x%x /n", &(Base::f));

00401068 mov edi,dword ptr [__imp__printf (4020D4h)]

0040106E push offset Base::`vcall'{0}' (4013A0h)

00401073 push offset string "&(Base::f): 0x%x /n" (40214Ch)

00401078 call edi

printf("&(Base::g): 0x%x /n", &(Base::g));

0040107A push offset Base::`vcall'{4}' (4013B0h)

0040107F push offset string "&(Base::g): 0x%x /n" (402160h)

00401084 call edi

那么从上面我们可以清楚的看到:

Base::f 对应于Base::`vcall'{0}' (4013A0h)

Base::g对应于Base::`vcall'{4}' (4013B0h)

那么Base::`vcall'{0}'和Base::`vcall'{4}'到底是什么呢，继续进行反汇编分析

Base::`vcall'{0}':

004013A0 mov eax,dword ptr [ecx]

004013A2 jmp dword ptr [eax]

......

Base::`vcall'{4}':

004013B0 mov eax,dword ptr [ecx]

004013B2 jmp dword ptr [eax+4]

第一句中, 由于ecx是this指针, 而在VC中一般虚表指针是类的第一个成员, 所以它是把vfptr, 也就是虚表的地址存到了eax中. 第二句

相当于取了虚表的某一项。对于Base::f跳转到Base::`vcall'{0}'，取了虚表的第1项；对于Base::g跳转到Base::`vcall'{4}'，取了虚表第2项。由此都能够正确的获得虚函数的地址。

由此我们可以看出，vc对此的解决方法是由编译器加入了一系列的内部函数"vcall". 一个类中的每个虚函数都有一个唯一与之对应的vcall函数，通过特定的vcall函数跳转到虚函数表中特定的表项。

更深一步的进行讨论，考虑多态的情况，将代码改写如下：

#include <iostream> using namespace std; class Base { public: virtual void f() { cout << "Base::f" << endl; } virtual void g() { cout << "Base::g" << endl; } }; class Derive : public Base{ public: virtual void f() { cout << "Derive::f" << endl; } virtual void g() { cout << "Derive::g" << endl; } }; typedef void(*Fun)(void); void main() { Derive d; Fun pFun = (Fun)*((int*)*(int*)(&d)+0); printf("&(Base::f): 0x%x /n", &(Base::f)); printf("&(Base::g): 0x%x /n", &(Base::g)); printf("&(Derive::f): 0x%x /n", &(Derive::f)); printf("&(Derive::g): 0x%x /n", &(Derive::g)); printf("pFun: 0x%x /n", pFun); pFun(); } #include <iostream>using namespace std;class Base { public: virtual void f() { cout << "Base::f" << endl; } virtual void g() { cout << "Base::g" << endl; }};class Derive : public Base{public: virtual void f() { cout << "Derive::f" << endl; } virtual void g() { cout << "Derive::g" << endl; }};typedef void(*Fun)(void);void main() { Derive d; Fun pFun = (Fun)*((int*)*(int*)(&d)+0); printf("&(Base::f): 0x%x /n", &(Base::f)); printf("&(Base::g): 0x%x /n", &(Base::g)); printf("&(Derive::f): 0x%x /n", &(Derive::f)); printf("&(Derive::g): 0x%x /n", &(Derive::g)); printf("pFun: 0x%x /n", pFun); pFun();}

打印的时候表现出来了多态的性质：

分析可知原因如下：

这是因为类Derive的虚函数表的各项对应的值进行了改写(rewritting)，原来指向Based::f()的地址变成了指向Derive::f()，原来指向Based::g()的地址现在编变成了指向Derive::g()。

反汇编代码如下：

printf("&(Derive::f): 0x%x /n", &(Derive::f));

00401086 push offset Base::`vcall'{0}' (4013B0h)

0040108B push offset string "&(Derive::f): 0x%x /n" (40217Ch)

00401090 call esi

printf("&(Derive::g): 0x%x /n", &(Derive::g));

00401092 push offset Base::`vcall'{4}' (4013C0h)

00401097 push offset string "&(Derive::g): 0x%x /n" (402194h)

0040109C call esi

因此虽然此时Derive::f依然对应Base::`vcall'{0}',而 Derive::g依然对应Base::`vcall'{4}'，但是由于每个类有一个虚函数表，因此跳转到的虚表的位置也发生了改变，同时因为进行了改写，虚表中的每个slot项的值也不一样。

本文来自博客：http://blog.csdn.net/zhanglei8893/archive/2011/04/19/6333751.aspx

专利

最新回复(0)