QEMU源码全解析 —— 内存虚拟化(15)

接前一篇文章: QEMU源码全解析 —— 内存虚拟化(14)

本文内容参考:

《趣谈 Linux操作系统 》 —— 刘超, 极客时间

QEMU /KVM源码解析与应用》 —— 李强,机械工业出版社

QEMU内存管理模型

浅谈QEMU Memory Region 与 Address Space

【QEMU系统分析之实例篇(七)】-CSDN博客

QEMU内存分析(一):内存虚拟化关键结构体 - Edver - 博客园

特此致谢!

2. QEMU虚拟机内存初始化

上一回开始对于QEMU内存平坦化的核心函数 —— render_memory_region()进行深入解析,本回继续。为了便于理解和回顾,再次贴出render_memory_region函数代码,在softmmu/memory.c中,如下:

/* Render a memory region into the global view.  Ranges in @view obscure
 * ranges in @mr.
 */
static void render_memory_region(FlatView *view,
                                 MemoryRegion *mr,
                                 Int128 base,
                                 AddrRange clip,
                                 bool readonly,
                                 bool nonvolatile)
{
    MemoryRegion *subregion;
    unsigned i;
    hwaddr offset_in_region;
    Int128 remain;
    Int128 now;
    FlatRange fr;
    AddrRange tmp;
 
    if (!mr->enabled) {
        return;
    }
 
    int128_addto(&base, int128_make64(mr->addr));
    readonly |= mr->readonly;
    nonvolatile |= mr->nonvolatile;
 
    tmp = addrrange_make(base, mr->size);
 
    if (!addrrange_intersects(tmp, clip)) {
        return;
    }
 
    clip = addrrange_intersection(tmp, clip);
 
    if (mr->alias) {
        int128_subfrom(&base, int128_make64(mr->alias->addr));
        int128_subfrom(&base, int128_make64(mr->alias_offset));
        render_memory_region(view, mr->alias, base, clip,
                             readonly, nonvolatile);
        return;
    }
 
    /* Render subregions in priority order. */
    QTAILQ_FOREACH(subregion, &mr->subregions, subregions_link) {
        render_memory_region(view, subregion, base, clip,
                             readonly, nonvolatile);
    }
 
    if (!mr->terminates) {
        return;
    }
 
    offset_in_region = int128_get64(int128_sub(clip.start, base));
    base = clip.start;
    remain = clip.size;
 
    fr.mr = mr;
    fr.dirty_log_mask = memory_region_get_dirty_log_mask(mr);
    fr.romd_mode = mr->romd_mode;
    fr.readonly = readonly;
    fr.nonvolatile = nonvolatile;
 
    /* Render the region itself into any gaps left by the current view. */
    for (i = 0; i < view->nr && int128_nz(remain); ++i) {
        if (int128_ge(base, addrrange_end(view->ranges[i].addr))) {
            continue;
        }
        if (int128_lt(base, view->ranges[i].addr.start)) {
            now = int128_min(remain,
                             int128_sub(view->ranges[i].addr.start, base));
            fr.offset_in_region = offset_in_region;
            fr.addr = addrrange_make(base, now);
            flatview_insert(view, i, &fr);
            ++i;
            int128_addto(&base, now);
            offset_in_region += int128_get64(now);
            int128_subfrom(&remain, now);
        }
        now = int128_sub(int128_min(int128_add(base, remain),
                                    addrrange_end(view->ranges[i].addr)),
                         base);
        int128_addto(&base, now);
        offset_in_region += int128_get64(now);
        int128_subfrom(&remain, now);
    }
    if (int128_nz(remain)) {
        fr.offset_in_region = offset_in_region;
        fr.addr = addrrange_make(base, remain);
        flatview_insert(view, i, &fr);
    }
}

上一回讲到:

这样比较抽象、不易理解。这里举一实际,结合实例就容易理解代码了。

假设待展开的根MemoryRegion为下图中的MemoryRegion 1,2~5为其对应的子MemoryRegion(4、5为MemoryRegion 1的孙Region)。

首先将MemoryRegion 1作为render_memory_region函数的第2个参数,然后递归调用自己,展开子Region 2~5。

    /* Render subregions in priority order. */
    QTAILQ_FOREACH(subregion, &mr->subregions, subregions_link) {
        render_memory_region(view, subregion, base, clip,
                             readonly, nonvolatile);
    }

调用关系如下图所示:

只有当最底层的4、5展开之后,第3个render_memory_region函数才能返回。同样,只有当2、3展开之后,第1个render_memory_region函数才能返回。

由于需要考虑到各种情况,render_memory_region函数很复杂。为了更好地理解该函数,在此通过举例来对于其进行解析。假设有一个MemoryRegion mr1以及虚拟机地址空间。mr1有一个子MemoryRegion mr2,虚拟机物理地址空间已经展开了两个FlatRange,分别是fr1和fr2。如下图所示:

假设需要将mr1平坦化,首先要计算出mr1所表示的地址和clip的交界。这里假设clip的值最开始是(0, UINT64_MAX),则新计算出来一个clip,这个clip实际上就是mr1所代表的范围。对应代码片段如下:

    int128_addto(&base, int128_make64(mr->addr));
    readonly |= mr->readonly;
    nonvolatile |= mr->nonvolatile;
 
    tmp = addrrange_make(base, mr->size);
 
    if (!addrrange_intersects(tmp, clip)) {
        return;
    }
 
    clip = addrrange_intersection(tmp, clip);

对照着MemoryRegion的定义以及render_memory_region函数的参数clip的说明来看。

/** MemoryRegion:
 *
 * A struct representing a memory region.
 */
struct MemoryRegion {
    Object parent_obj;
 
    /* private: */
 
    /* The following fields should fit in a cache line */
    bool romd_mode;
    bool ram;
    bool subpage;
    bool readonly; /* For RAM regions */
    bool nonvolatile;
    bool rom_device;
    bool flush_coalesced_mmio;
    uint8_t dirty_log_mask;
    bool is_iommu;
    RAMBlock *ram_block;
    Object *owner;
    /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */
    DeviceState *dev;
 
    const MemoryRegionOps *ops;
    void *opaque;
    MemoryRegion *container;
    int mapped_via_alias; /* Mapped via an alias, container might be NULL */
    Int128 size;
    hwaddr addr;
    void (*destructor)(MemoryRegion *mr);
    uint64_t align;
    bool terminates;
    bool ram_device;
    bool enabled;
    bool warning_printed; /* For reservations */
    uint8_t vga_logging_count;
    MemoryRegion *alias;
    hwaddr alias_offset;
    int32_t priority;
    QTAILQ_HEAD(, MemoryRegion) subregions;
    QTAILQ_ENTRY(MemoryRegion) subregions_link;
    QTAILQ_HEAD(, CoalescedMemoryRange) coalesced;
    const char *name;
    unsigned ioeventfd_nb;
    MemoryRegionIoeventfd *ioeventfds;
    RamDiscardManager *rdm; /* Only for RAM */
 
    /* For devices designed to perform re-entrant IO into their own IO MRs */
    bool disable_reentrancy_guard;
};

平坦化过程针对的是叶子节点,因此如果是alias,则需要找到实际的mr。对应代码片段如下:

    if (mr->alias) {
        int128_subfrom(&base, int128_make64(mr->alias->addr));
        int128_subfrom(&base, int128_make64(mr->alias_offset));
        render_memory_region(view, mr->alias, base, clip,
                             readonly, nonvolatile);
        return;
    }

render_memory_region函数更多内容的解析,请看下回。