接前一篇文章: QEMU源码全解析 —— 内存虚拟化(3)
本文内容参考:
《 QEMU /KVM》源码解析与应用 —— 李强,机械工业出版社
浅谈QEMU Memory Region 与 Address Space
特此致谢!
QEMU 内存初始化
1. 基本结构
上一回对于QEMU中与内存相关的第二个数据结构MemoryRegion进行了深入讲解。本回讲解第三个数据结构RAMBlock。
(3)RAMBlock
内存管理最基础的一部分自然是物理Memory内存,然后还包括MMIO空间、IO端口的地址空间。RAMBlock结构表示的是内存条,一个RAMBlock对应 虚拟机 中的一个内存条。RAMBlock结构的定义在include/qemu/typedefs.h中,代码如下:
typedef struct RAMBlock RAMBlock;
struct RAMBlock的定义在include/exec/ramblock.h中,代码如下:
- struct RAMBlock {
- struct rcu_head rcu;
- struct MemoryRegion *mr;
- uint8_t *host;
- uint8_t *colo_cache; /* For colo, VM's ram cache */
- ram_addr_t offset;
- ram_addr_t used_length;
- ram_addr_t max_length;
- void (*resized)(const char*, uint64_t length, void *host);
- uint32_t flags;
- /* Protected by iothread lock. */
- char idstr[256];
- /* RCU-enabled, writes protected by the ramlist lock */
- QLIST_ENTRY(RAMBlock) next;
- QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
- int fd;
- uint64_t fd_offset;
- size_t page_size;
- /* dirty bitmap used during migration */
- unsigned long *bmap;
- /* bitmap of already received pages in postcopy */
- unsigned long *receivedmap;
-
- /*
- * bitmap to track already cleared dirty bitmap. When the bit is
- * set, it means the corresponding memory chunk needs a log-clear.
- * Set this up to non-NULL to enable the capability to postpone
- * and split clearing of dirty bitmap on the remote node (e.g.,
- * KVM). The bitmap will be set only when doing global sync.
- *
- * It is only used during src side of ram migration, and it is
- * protected by the global ram_state.bitmap_mutex.
- *
- * NOTE: this bitmap is different comparing to the other bitmaps
- * in that one bit can represent multiple guest pages (which is
- * decided by the `clear_bmap_shift' variable below). On
- * destination side, this should always be NULL, and the variable
- * `clear_bmap_shift' is meaningless.
- */
- unsigned long *clear_bmap;
- uint8_t clear_bmap_shift;
-
- /*
- * RAM block length that corresponds to the used_length on the migration
- * source (after RAM block sizes were synchronized). Especially, after
- * starting to run the guest, used_length and postcopy_length can differ.
- * Used to register/unregister uffd handlers and as the size of the received
- * bitmap. Receiving any page beyond this length will bail out, as it
- * could not have been valid on the source.
- */
- ram_addr_t postcopy_length;
- };
上边已提到,RAMBlock结构表示的是虚拟机中的内存条,一个RAMBlock对应虚拟机中的一个内存条。RAMBlock里面记录了该内存条的一些基本信息,如所属的mr(struct MemoryRegion *mr)、如果有文件作为后端,该文件对应的fd(int fd)、系统的页面大小page_size(size_t page_size)、已经使用的大小used_length(ram_addr_t used_length)、该内存条在虚拟机整个内存中的偏移offset(ram_addr_t offset)等。
每个MemoryRegion里都包含一个RAMBlock的指针,但不一定会对应一个RAMBlock。对于物理内存,则其实体MemoryRegion会指向一个实体RAMBlock。回顾一下MemoryRegion结构中的RAMBlock的相关成员,在include/exec/memory.h中,如下:
- /** MemoryRegion:
- *
- * A struct representing a memory region.
- */
- struct MemoryRegion {
- Object parent_obj;
-
- /* private: */
-
- /* The following fields should fit in a cache line */
- bool romd_mode;
- bool ram;
- bool subpage;
- bool readonly; /* For RAM regions */
- bool nonvolatile;
- bool rom_device;
- bool flush_coalesced_mmio;
- uint8_t dirty_log_mask;
- bool is_iommu;
- RAMBlock *ram_block;
- ……
- };
- RAMBlock *ram_block
ram_block表示实际分配的物理内存。
回到struct RAMBlock的定义。其中的主线逻辑变量offset(ram_addr_t offset)(GPA)和host(uint8_t *host)(HVA)。还有bmap(unsigned long *bmap)和receivedmap(unsigned long *receivedmap)是热迁移存储脏页使用。
此外,所有的RAMBlock会通过next(QLIST_ENTRY(RAMBlock) next)域连接到一个链表中,链表头是ram_list.blocks全局变量。
这里顺便提一下struct RAMBlock中ram_addr_t类型的定义。ram_addr_t的定义在include/exec/cpu-common.h中,代码如下:
- /* address in the RAM (different from a physical address) */
- #if defined(CONFIG_XEN_BACKEND)
- typedef uint64_t ram_addr_t;
- # define RAM_ADDR_MAX UINT64_MAX
- # define RAM_ADDR_FMT "%" PRIx64
- #else
- typedef uintptr_t ram_addr_t;
- # define RAM_ADDR_MAX UINTPTR_MAX
- # define RAM_ADDR_FMT "%" PRIxPTR
- #endif
uint64_t和uintptr_t都在roms/opensbi/include/sbi/sbi_types.h中定义,分别如下:
- #if __riscv_xlen == 64
- typedef long s64;
- typedef unsigned long u64;
- typedef long int64_t;
- typedef unsigned long uint64_t;
- #define PRILX "016lx"
- #elif __riscv_xlen == 32
- typedef long long s64;
- typedef unsigned long long u64;
- typedef long long int64_t;
- typedef unsigned long long uint64_t;
- #define PRILX "08lx"
- #else
- #error "Unexpected __riscv_xlen"
- #endif
typedef unsigned long uintptr_t;
至此,QEMU中与内存相关的三个基本结构struct AddressSpace、struct MemoryRegion、struct RAMBlock就讲解完了。
再来回顾和复习一下这三个基本数据结构:
- AddressSpace(struct AddressSpace)
AddressSpace结构用来表示一个虚拟机或者虚拟CPU能够访问的所有物理地址。struct AddressSpace的定义在include/exec/memory.h中,如下:
- /**
- * struct AddressSpace: describes a mapping of addresses to #MemoryRegion objects
- */
- struct AddressSpace {
- /* private: */
- struct rcu_head rcu;
- char *name;
- MemoryRegion *root;
-
- /* Accessed via RCU. */
- struct FlatView *current_map;
-
- int ioeventfd_nb;
- struct MemoryRegionIoeventfd *ioeventfds;
- QTAILQ_HEAD(, MemoryListener) listeners;
- QTAILQ_ENTRY(AddressSpace) address_spaces_link;
- };
- MemoryRegion(struct MemoryRegion)
MemoryRegion表示的是虚拟机中的一段内存区域。MemoryRegion是内存模拟中的核心结构,整个内存的模拟都是通过MemoryRegion构成的无环图完成的。图的叶子节点是实际分配给虚拟机的物理内存或者MMIO,中间节点则表示内存总线,内存控制是其它MemoryRegion的别名。
struct MemoryRegion的定义也在include/exec/memory.h中,代码如下:
- /** MemoryRegion:
- *
- * A struct representing a memory region.
- */
- struct MemoryRegion {
- Object parent_obj;
-
- /* private: */
-
- /* The following fields should fit in a cache line */
- bool romd_mode;
- bool ram;
- bool subpage;
- bool readonly; /* For RAM regions */
- bool nonvolatile;
- bool rom_device;
- bool flush_coalesced_mmio;
- uint8_t dirty_log_mask;
- bool is_iommu;
- RAMBlock *ram_block;
- Object *owner;
- /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */
- DeviceState *dev;
-
- const MemoryRegionOps *ops;
- void *opaque;
- MemoryRegion *container;
- int mapped_via_alias; /* Mapped via an alias, container might be NULL */
- Int128 size;
- hwaddr addr;
- void (*destructor)(MemoryRegion *mr);
- uint64_t align;
- bool terminates;
- bool ram_device;
- bool enabled;
- bool warning_printed; /* For reservations */
- uint8_t vga_logging_count;
- MemoryRegion *alias;
- hwaddr alias_offset;
- int32_t priority;
- QTAILQ_HEAD(, MemoryRegion) subregions;
- QTAILQ_ENTRY(MemoryRegion) subregions_link;
- QTAILQ_HEAD(, CoalescedMemoryRange) coalesced;
- const char *name;
- unsigned ioeventfd_nb;
- MemoryRegionIoeventfd *ioeventfds;
- RamDiscardManager *rdm; /* Only for RAM */
-
- /* For devices designed to perform re-entrant IO into their own IO MRs */
- bool disable_reentrancy_guard;
- };
- RAMBlock(struct RAMBlock)
RAMBlock结构表示的是虚拟机中的内存条,一个RAMBlock对应虚拟机中的一个内存条。RAMBlock里面记录了该内存条的一些基本信息。struct RAMBlock的定义在include/exec/ramblock.h中,如下:
- struct RAMBlock {
- struct rcu_head rcu;
- struct MemoryRegion *mr;
- uint8_t *host;
- uint8_t *colo_cache; /* For colo, VM's ram cache */
- ram_addr_t offset;
- ram_addr_t used_length;
- ram_addr_t max_length;
- void (*resized)(const char*, uint64_t length, void *host);
- uint32_t flags;
- /* Protected by iothread lock. */
- char idstr[256];
- /* RCU-enabled, writes protected by the ramlist lock */
- QLIST_ENTRY(RAMBlock) next;
- QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
- int fd;
- uint64_t fd_offset;
- size_t page_size;
- /* dirty bitmap used during migration */
- unsigned long *bmap;
- /* bitmap of already received pages in postcopy */
- unsigned long *receivedmap;
-
- /*
- * bitmap to track already cleared dirty bitmap. When the bit is
- * set, it means the corresponding memory chunk needs a log-clear.
- * Set this up to non-NULL to enable the capability to postpone
- * and split clearing of dirty bitmap on the remote node (e.g.,
- * KVM). The bitmap will be set only when doing global sync.
- *
- * It is only used during src side of ram migration, and it is
- * protected by the global ram_state.bitmap_mutex.
- *
- * NOTE: this bitmap is different comparing to the other bitmaps
- * in that one bit can represent multiple guest pages (which is
- * decided by the `clear_bmap_shift' variable below). On
- * destination side, this should always be NULL, and the variable
- * `clear_bmap_shift' is meaningless.
- */
- unsigned long *clear_bmap;
- uint8_t clear_bmap_shift;
-
- /*
- * RAM block length that corresponds to the used_length on the migration
- * source (after RAM block sizes were synchronized). Especially, after
- * starting to run the guest, used_length and postcopy_length can differ.
- * Used to register/unregister uffd handlers and as the size of the received
- * bitmap. Receiving any page beyond this length will bail out, as it
- * could not have been valid on the source.
- */
- ram_addr_t postcopy_length;
- };
基础已经打好,下一回开始讲解其中更为详细的内容。