接前一篇文章: QEMU源码全解析 —— 块设备虚拟化(22)
本文内容参考:
《 QEMU /KVM源码解析与应用》 —— 李强,机械工业出版社
特此致谢!
QEMU启动过程中的块设备虚拟化
上一回解析了qcow2格式对应的qcow2_open函数,本回解析raw格式对应的raw_open函数。
raw_open函数在block/raw-format.c中,代码如下:
static int raw_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
AioContext *ctx;
bool has_size;
uint64_t offset, size;
BdrvChildRole file_role;
int ret;
ret = raw_read_options(options, &offset, &has_size, &size, errp);
if (ret < 0) {
return ret;
}
/*
* Without offset and a size limit, this driver behaves very much
* like a filter. With any such limit, it does not.
*/
if (offset || has_size) {
file_role = BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY;
} else {
file_role = BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY;
}
bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
file_role, false, errp);
if (!bs->file) {
return -EINVAL;
}
bs->sg = bdrv_is_sg(bs->file->bs);
bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
(BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
bs->file->bs->supported_zero_flags);
bs->supported_truncate_flags = bs->file->bs->supported_truncate_flags &
BDRV_REQ_ZERO_WRITE;
if (bs->probed && !bdrv_is_read_only(bs)) {
bdrv_refresh_filename(bs->file->bs);
fprintf(stderr,
"WARNING: Image format was not specified for '%s' and probing "
"guessed raw.\n"
" Automatically detecting the format is dangerous for "
"raw images, write operations on block 0 will be restricted.\n"
" Specify the 'raw' format explicitly to remove the "
"restrictions.\n",
bs->file->bs->filename);
}
ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
ret = raw_apply_options(bs, s, offset, has_size, size, errp);
aio_context_release(ctx);
if (ret < 0) {
return ret;
}
if (bdrv_is_sg(bs) && (s->offset || s->has_size)) {
error_setg(errp, "Cannot use offset/size with SCSI generic devices");
return -EINVAL;
}
return 0;
}
raw_open函数和qcow2_open函数机制不同,其中并没有协程。
核心代码段为:
bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
file_role, false, errp);
if (!bs->file) {
return -EINVAL;
}
bdrv_open_child函数在block.c中,代码如下:
/*
* Opens a disk image whose options are given as BlockdevRef in another block
* device's options.
*
* If allow_none is true, no image will be opened if filename is false and no
* BlockdevRef is given. NULL will be returned, but errp remains unset.
*
* bdrev_key specifies the key for the image's BlockdevRef in the options QDict.
* That QDict has to be flattened; therefore, if the BlockdevRef is a QDict
* itself, all options starting with "${bdref_key}." are considered part of the
* BlockdevRef.
*
* The BlockdevRef will be removed from the options QDict.
*
* The caller must hold the lock of the main AioContext and no other AioContext.
* @parent can move to a different AioContext in this function. Callers must
* make sure that their AioContext locking is still correct after this.
*/
BdrvChild *bdrv_open_child(const char *filename,
QDict *options, const char *bdref_key,
BlockDriverState *parent,
const BdrvChildClass *child_class,
BdrvChildRole child_role,
bool allow_none, Error **errp)
{
BlockDriverState *bs;
BdrvChild *child;
AioContext *ctx;
GLOBAL_STATE_CODE();
bs = bdrv_open_child_bs(filename, options, bdref_key, parent, child_class,
child_role, allow_none, errp);
if (bs == NULL) {
return NULL;
}
ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
errp);
aio_context_release(ctx);
return child;
}
仔细读一下函数注释:
函数功能:
打开一个磁盘映像,其选项在另一个块设备的选项中以BlockdevRef给出。
参数说明:
- bool allow_none
如果allow_none为true,则当filename为false且未给出BlockdevRef时,则不会打开任何图像。将返回NULL,但errp仍未设置。
- const char *bdref_key
bdrev_key在选项QDict中指定映像的BlockdevRef的密钥。
QDict必须被平坦化。因此,如果BlockdevRef是QDict本身,所有以“${bdref_key}.”开头的选项都被视为BlockdevRef的一部分。
BlockdevRef将从选项QDict中删除。调用者必须持有主AioContext的锁,而不能持有其他AioContext。
父节点可以在此函数中移动到不同的AioContext。调用者必须确保在此之后它们的AioContext锁定仍然正确。
这里插一句:实际上,上一回的qcow2_open函数中调用的bdrv_open_file_child函数,就是bdrv_open_child函数多了一层封装。
bdrv_open_child_bs函数也在block.c中,代码如下:
static BlockDriverState *
bdrv_open_child_bs(const char *filename, QDict *options, const char *bdref_key,
BlockDriverState *parent, const BdrvChildClass *child_class,
BdrvChildRole child_role, bool allow_none, Error **errp)
{
BlockDriverState *bs = NULL;
QDict *image_options;
char *bdref_key_dot;
const char *reference;
assert(child_class != NULL);
bdref_key_dot = g_strdup_printf("%s.", bdref_key);
qdict_extract_subqdict(options, &image_options, bdref_key_dot);
g_free(bdref_key_dot);
/*
* Caution: while qdict_get_try_str() is fine, getting non-string
* types would require more care. When @options come from
* -blockdev or blockdev_add, its members are typed according to
* the QAPI schema, but when they come from -drive, they're all
* QString.
*/
reference = qdict_get_try_str(options, bdref_key);
if (!filename && !reference && !qdict_size(image_options)) {
if (!allow_none) {
error_setg(errp, "A block device must be specified for \"%s\"",
bdref_key);
}
qobject_unref(image_options);
goto done;
}
bs = bdrv_open_inherit(filename, reference, image_options, 0,
parent, child_class, child_role, errp);
if (!bs) {
goto done;
}
done:
qdict_del(options, bdref_key);
return bs;
}
bdrv_open_child_bs函数中又调用了前文书讲过的bdrv_open_inherit函数(参见 QEMU源码全解析 —— 块设备虚拟化(21)-CSDN博客 )。这似乎有个循环,但以目前的认知,尚未弄清楚结束条件。后续需要待完全捋清楚后,再行详解。
至此,QEMU启动过程中的块设备虚拟化就基本讲解完了。再来回顾一下流程:
1)QEMU的启动命令行中设置存储相关参数(-drive、-device等);
2)qemu_create_early_backends函数(qemu_init函数中调用)中调用configure_blockdev函数对于QEMU命令行中的启动参数-drive进行解析,并且初始化这个设备要调用drive_init_func函数;
3)drive_init_func函数中,会调用drive_new函数创建一个设备;
4)driver_new函数中,也会解析QEMU的启动参数-device,将driver设置为virblk-blk-pci;还会解析file参数;
5)drive_new函数接下来会调用blockdev_init函数,根据参数进行初始化;
6)blockdev_init函数调用blk_new_open函数,打开宿主机上的磁盘文件,返回BlockBackend,即virtio后端;
7)blk_new_open函数通过调用链bdrv_open() -> bdrv_open_inherit() -> bdrv_open_common(),根据磁盘文件的格式,得到BlockDriver。对于不同的格式,其打开文件的方式不同,如对于qcow2格式,其打开函数为qcow2_open函数;而对于raw格式,其打开文件的函数为raw_open函数;
8)drive_new函数最后会创建一个DriverInfo来管理打开的设备。
下一回开始,对于向虚拟机的一个进程写入一个文件的完整过程进行解析。