创建1号进程

技术2024-07-17 74

6 后start_kernel时代

至此，start_kernel()函数完成了Linux内核的初始化工作。几乎每天内核部件都是由这个函数进行初始化的，下面让我们再来回顾一下其中最重要的部分：

● 调用setup_arch()函数，根据处理器硬件平台设置系统；解析linux命令行参数；设置0号进程的内存描述结构init_mm；系统内存管理初始化；统计并注册系统各种资源；以及其它项目的初始化等。

● 调用sched_init()函数来初始化调度程序。

● 调用build_all_zonelists()函数来初始化内存管理区。

● 调用mem_init()函数来初始化伙伴系统分配程序。

● 调用kmem_cache_init()函数来初始化slab分配器。

● 调用vmalloc_init()函数来初始化非连续内存区。

● 调用trap_init()函数和init_IRQ()函数以完成IDT初始化。

● 调用softirq_init()函数初始化软中断的TASKLET_SOFTIRQ和HI_SOFTIRQ。

● 调用time_init()函数来初始化系统日期和时间。

● 调用calibrate_delay()函数以确定CPU时钟的速度。

顺便插入说一句，这个函数的代码被放在.text.init中，我们在链接的时候以__init_begin开始，__init_end结尾，就向模块初始化函数一样，它仅仅被执行一次，内核在最后阶段init_post函数会调用free_init_pages收回这个分节所占用的内存，到时候咱们再来看。

6.1 创建1号进程

好了，start_kernel在初始化了内核之后，最后一行代码是

/* Do the rest non-__init'ed, we're now alive */

rest_init();

可以看到start_kernel最后是调用rest_init函数进行后续的初始化，来自init/main.c文件：

424static noinline void __init_refok rest_init(void)

425 __releases(kernel_lock)

426{

427 int pid;

428

429 rcu_scheduler_starting();

430 kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);

431 numa_default_policy();

432 pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);

433 rcu_read_lock();

434 kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);

435 rcu_read_unlock();

436 unlock_kernel();

437

438 /*

439 * The boot idle thread must execute schedule()

440 * at least once to get things moving:

441 */

442 init_idle_bootup_task(current);

443 preempt_enable_no_resched();

444 schedule();

445 preempt_disable();

446

447 /* Call into cpu_idle with preempt disabled */

448 cpu_idle();

449}

430行，它首先就是执行kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND)得到一个pid为1的进程，这是一个内核线程，执行kernel_init函数。有关内核线程的知识，请查阅博客“内核线程”http://blog.csdn.net/yunsongice/archive/2010/04/23/5522012.aspx。

432行，执行kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES)来启动内核线程kthreadd，它的工作是用来运行kthread_create_list全局链表中的kthread

注意，当前计算机早已淘汰了内核抢占机制，所以一般CONFIG_PREEMPT都没有被设置，443和445行代码是两行空代码。445行代码在rq中找到一个已准备好的进程让它先运行，如果没有，就448行执行cpu_idle()：

void cpu_idle(void)

{

int cpu = smp_processor_id();

boot_init_stack_canary();

current_thread_info()->status |= TS_POLLING;

/* endless idle loop with no priority at all */

while (1) {

tick_nohz_stop_sched_tick(1);

while (!need_resched()) {

check_pgt_cache();

rmb();

if (cpu_is_offline(cpu))

play_dead();

local_irq_disable();

/* Don't trace irqs off for idle */

stop_critical_timings();

pm_idle();

start_critical_timings();

}

tick_nohz_restart_sched_tick();

preempt_enable_no_resched();

schedule();

preempt_disable();

}

它基本就是节省cpu的体力，进入idle循环消耗空闲的时间片，谁要CPU就让给谁，所以就不说了。我们沿着流程走，当1号进程do_fork好了之后，schedule就会在运行队列rq中选中它，执行它的代码，也就是kernel_init函数，也来自同一个文件：

854static int __init kernel_init(void * unused)

855{

856 lock_kernel();

857

858 /*

859 * init can allocate pages on any node

860 */

861 set_mems_allowed(node_states[N_HIGH_MEMORY]);

862 /*

863 * init can run on any cpu.

864 */

865 set_cpus_allowed_ptr(current, cpu_all_mask);

866 /*

867 * Tell the world that we're going to be the grim

868 * reaper of innocent orphaned children.

869 *

870 * We don't want people to have to make incorrect

871 * assumptions about where in the task array this

872 * can be found.

873 */

874 init_pid_ns.child_reaper = current;

875

876 cad_pid = task_pid(current);

877

878 smp_prepare_cpus(setup_max_cpus);

879

880 do_pre_smp_initcalls();

881 start_boot_trace();

882

883 smp_init();

884 sched_init_smp();

885

886 do_basic_setup();

887

888 /* Open the /dev/console on the rootfs, this should never fail */

889 if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)

890 printk(KERN_WARNING "Warning: unable to open an initial console./n");

891

892 (void) sys_dup(0);

893 (void) sys_dup(0);

894 /*

895 * check if there is an early userspace init. If yes, let it do all

896 * the work

897 */

898

899 if (!ramdisk_execute_command)

900 ramdisk_execute_command = "/init";

901

902 if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {

903 ramdisk_execute_command = NULL;

904 prepare_namespace();

905 }

906

907 /*

908 * Ok, we have completed the initial bootup, and

909 * we're essentially up and running. Get rid of the

910 * initmem segments and start the user-mode stuff..

911 */

912

913 init_post();

914 return 0;

915}

856~884是一系列初始化，CONFIG_CPUSETS没有被配置，所以861行set_mems_allowed是个空函数；865行set_cpus_allowed_ptr函数用来修改进程的CPU亲和力

段段段段段段段段段段段节节节节节节节节节节节节节

kernel_init函数的一开始就调用了lock_kernel()函数，当编译时选上了CONFIG_LOCK_KERNEL，就加上大内核锁，否

则啥也不做，紧接着就调用了函数set_cpus_allowed_ptr，由于这些函数对init进程的调起还是有影响的，我们还是一个一个来瞧瞧吧，

不要忘了啥东东最好，

static inline int set_cpus_allowed_ptr(struct task_struct *p,

const cpumask_t *new_mask)

{

if (!cpu_isset(0, *new_mask))

return -EINVAL;

return 0;

}

这函数其实就调用了cpu_isset宏，定义在文件"include/linux/cpumask.h中，如下：

#define cpu_isset(cpu, cpumask) test_bit((cpu), (cpumask).bits)

再来看看set_cpus_allowed_ptr的第二个参数类型吧，也定义在文件include/linux/cpumask.h中，具体如下：

typedef struct { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;

接着尾随着DECLAR_BITMAP宏到文件include/linux/types.h中，定义如下：

#define DECLARE_BITMAP(name,bits) /

unsigned long name[BITS_TO_LONGS(bits)]

而宏BITS_TO_LONGS定义在文件include/linux/bitops.h中，实现如下：

#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))

DIV_ROUND_UP宏定义在文件include/linux/kernel.h中，BITS_PER_BYTE 宏定义在文件include/linux/bitops.h中，实现如下：

#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))

#define BITS_PER_BYTE 8

即当NR_CPUS为1～32时，cpumask_t类型为

struct {

unsigned long bits[1];

}

然后来看看在set_cpus_allowed_ptr(current, CPU_MASK_ALL_PTR);中的 CPU_MASK_ALL_PTR宏，定义在include/linux/cpumask.h中：

#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)

而CPU_MASK_ALL宏也定义在文件include/linux/cpumask.h中：

#define CPU_MASK_ALL /

(cpumask_t) { { /

[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD /

} }

NR_CPUS宏定义在文件include/linux/threads.h中，实现如下：

#ifdef CONFIG_SMP

#define NR_CPUS CONFIG_NR_CPUS

#else

#define NR_CPUS 1

#endif

CPU_MASK_LAST_WORD宏定义在文件include/linux/cpumask.h中，实现如下：

#define CPU_MASK_LAST_WORD BITMAP_LAST_WORD_MASK(NR_CPUS)

BITMAP_LAST_WORD_MASK(NR_CPUS)宏定义在文件include/linux/bitmap.h中，实现如下：

#define BITMAP_LAST_WORD_MASK(nbits) /

( /

((nbits) % BITS_PER_LONG) ? /

(1ULcpu_isset(0,CPU_MASK_ALL_PTR)－－>test_bit(0,CPU_MASK_ALL_PTR.bits)

即当NR_CPUS为n时，就把usigned long bits[0]的第n位置1，应该就如注释所说的，init能运行在任何CPU上吧。

现

在kernel_init中的set_cpus_allowed_ptr(current, CPU_MASK_ALL_PTR);

分析完了，我们接着往下看，首先 init_pid_ns.child_reaper = current;

init_pid_ns定义在kernel/pid.c文件中

struct pid_namespace init_pid_ns = {

.kref = {

.refcount = ATOMIC_INIT(2),

.pidmap = {

[ 0 ... PIDMAP_ENTRIES-1] = { ATOMIC_INIT(BITS_PER_PAGE), NULL }

.last_pid = 0,

.level = 0,

.child_reaper = &init_task,

};

它是一个pid_namespace结构的变量，先来看看pid_namespace的结构，它定义在文件

include/linux/pid_namespace.h中，具体定义如下：

struct pid_namespace {

struct kref kref;

struct pidmap pidmap[PIDMAP_ENTRIES];

int last_pid;

struct task_struct *child_reaper;

struct kmem_cache *pid_cachep;

unsigned int level;

struct pid_namespace *parent;

#ifdef CONFIG_PROC_FS

struct vfsmount *proc_mnt;

#endif

};

即把当前进程设为接受其它孤儿进程的进程，然后取得该进程的进程ID，如：

cad_pid = task_pid(current);

然后调用 smp_prepare_cpus(setup_max_cpus);如果编译时没有指定CONFIG_SMP，它什么也不做，接着往下看，调用do_pre_smp_initcalls()函数，它定义在init/main.c文件中，实现如下：

static void __init do_pre_smp_initcalls(void)

{

extern int spawn_ksoftirqd(void);

migration_init();

spawn_ksoftirqd();

if (!nosoftlockup)

spawn_softlockup_task();

}

其中migration_init()定义在文件include/linux/sched.h中，具体实现如下:

#ifdef CONFIG_SMP

void migration_init(void);

#else

static inline void migration_init(void)

{

}

#endif

好像什么也没有做，然后是调用spawn_ksoftirqd()函数，定义在文件kernel/softirq.c中，代码如下：

__init int spawn_ksoftirqd(void)

{

void *cpu = (void *)(long)smp_processor_id();

int err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);

BUG_ON(err == NOTIFY_BAD);

cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);

register_cpu_notifier(&cpu_nfb);

return 0;

}

在该函数中，首先调用smp_processor_id函数获得当前CPU的ID并把它赋值给变量cpu，然后把cpu连同&cpu_nfb，CPU_UP_PREPARE传递给函数cpu_callback，我们先看cpu_callback的前几行：

static int __cpuinit cpu_callback(struct notifier_block *nfb,

unsigned long action,

void *hcpu)

{

int hotcpu = (unsigned long)hcpu;

struct task_struct *p;

switch (action) {

case CPU_UP_PREPARE:

case CPU_UP_PREPARE_FROZEN:

p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu);

if (IS_ERR(p)) {

printk("ksoftirqd for %i failed/n", hotcpu);

return NOTIFY_BAD;

}

kthread_bind(p, hotcpu);

per_cpu(ksoftirqd, hotcpu) = p;

break;

从

上述代码可以看出当action为CPU_PREPARE时，将创建一个内核线程并把它赋值给p，该进程所要运行的函数为ksoftirqd，传递给该函

数的参数为hcpu，而紧跟其后的”ksoftirqd/%d”,hotcpu为该进程的名字参数，这就是我们在终端用命令ps -ef | grep

ksoftirqd所看到的线程；如果进程创建失败，打印出错信息，否则把创建的线程p绑定到当前CPU的ID上，这就是

kthread_bind(p,hotcpu)所做的，接下来的几行为：

case CPU_ONLINE:

case CPU_ONLINE_FROZEN:

wake_up_process(per_cpu(ksoftirqd, hotcpu));

break;

即

在spawn_ksoftirqd函数中cpu_callback(&cpu_nfb, CPU_ONLINE,

cpu);的action为CPU_ONLINE时，将调用wake_up_process函数来唤醒当前CPU上的ksoftirqd进程。最后调用

register_cpu_notifier(&cpu_nfb)；其实也没做什么，只是简单的返回0。返回到

do_pre_smp_initcalls函数中，接着往下看：

if (!nosoftlockup)

spawn_softlockup_task();

spawn_softlockup_task()函数定义在文件include/linux/sched.h中，是个空函数。

到

现在为止，do_pre_smp_initcalls分析完了，它主要就是创建进程ksoftirqd，把它绑定到当前CPU上，然后再把该进程拷贝给每

个CPU，并唤醒所有CPU上的进程ksoftirqd，就是当我们执行ps -ef | grep ksoftirqd的时候所看到的：

root 4 2 0 08:30 ? 00:00:03 [ksoftirqd/0]

root 7 2 0 08:30 ? 00:00:02 [ksoftirqd/1]

革命尚未成功，同志仍需努力！接着享受吧，呵呵！

现在到了kernel_init函数中的smp_init();了

如果在编译时没有选择CONFIG_SMP，若定义CONFIG_X86_LOCAL_APIC则去调用APIC_init_uniprocessor()函数，否则什么也不做，具体代码定义在文件init/main.c中：

#ifndef CONFIG_SMP

#ifdef CONFIG_X86_LOCAL_APIC

static void __init smp_init(void)

{

APIC_init_uniprocessor();

}

#else

#define smp_init() do { } while (0)

#endif

如果在编译时选择了CONFIG_SMP呢，那么它的实现就如下喽：

/* Called by boot processor to activate the rest. */

static void __init smp_init(void)

{

unsigned int cpu;

/* FIXME: This should be done in userspace --RR */

for_each_present_cpu(cpu) {

if (num_online_cpus() >= setup_max_cpus)

break;

if (!cpu_online(cpu))

cpu_up(cpu);

}

/* Any cleanup work */

printk(KERN_INFO "Brought up %ld CPUs/n", (long)num_online_cpus());

smp_cpus_done(setup_max_cpus);

}

来看看这个函数的，for_each_present_cpu(cpu)宏在文件include/linux/cpumask.h中实现：

#define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map)

而for_each_cpu_mask(cpu,mask)宏也在文件include/linux/cpumask.h中实现：

#if NR_CPUS > 1

#define for_each_cpu_mask(cpu, mask) /

for ((cpu) = first_cpu(mask); /

(cpu) < NR_CPUS; /

(cpu) = next_cpu((cpu), (mask)))

#else /* NR_CPUS == 1 */

#define for_each_cpu_mask(cpu, mask) /

for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)

#endif /* NR_CPUS */

即对于每个cpu都要执行大括号里的语句，如果当前cpu没激活就把它激活的，该函数然后打印一些cpu信息，如当前激活的cpu数目。

kernel_init中紧跟smp_init()函数后的是sched_init_smp()函数和do_basic_setup()函数，而其后便是最后一个函数init_post()，在该函数中将调起init进程。由于内容较多，下次分析......

之后：

1.调用 do_basic_setup函数

2.然后就init进程的执行了

3.最后内核就处于不断等待服务的过程了。

最新回复(0)