QEMU源码全解析 —— 块设备虚拟化(23)

接前一篇文章: QEMU源码全解析 —— 块设备虚拟化(22)

本文内容参考:

《趣谈 Linux操作系统 》 —— 刘超, 极客时间

QEMU /KVM源码解析与应用》 —— 李强,机械工业出版社

特此致谢!

QEMU启动过程中的块设备虚拟化

上一回解析了qcow2格式对应的qcow2_open函数,本回解析raw格式对应的raw_open函数。

raw_open函数在block/raw-format.c中,代码如下:

static int raw_open(BlockDriverState *bs, QDict *options, int flags,
                    Error **errp)
{
    BDRVRawState *s = bs->opaque;
    AioContext *ctx;
    bool has_size;
    uint64_t offset, size;
    BdrvChildRole file_role;
    int ret;

    ret = raw_read_options(options, &offset, &has_size, &size, errp);
    if (ret < 0) {
        return ret;
    }

    /*
     * Without offset and a size limit, this driver behaves very much
     * like a filter.  With any such limit, it does not.
     */
    if (offset || has_size) {
        file_role = BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY;
    } else {
        file_role = BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY;
    }

    bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                    file_role, false, errp);
    if (!bs->file) {
        return -EINVAL;
    }

    bs->sg = bdrv_is_sg(bs->file->bs);
    bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
        (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
    bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
        ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
            bs->file->bs->supported_zero_flags);
    bs->supported_truncate_flags = bs->file->bs->supported_truncate_flags &
                                   BDRV_REQ_ZERO_WRITE;

    if (bs->probed && !bdrv_is_read_only(bs)) {
        bdrv_refresh_filename(bs->file->bs);
        fprintf(stderr,
                "WARNING: Image format was not specified for '%s' and probing "
                "guessed raw.\n"
                "         Automatically detecting the format is dangerous for "
                "raw images, write operations on block 0 will be restricted.\n"
                "         Specify the 'raw' format explicitly to remove the "
                "restrictions.\n",
                bs->file->bs->filename);
    }

    ctx = bdrv_get_aio_context(bs);
    aio_context_acquire(ctx);
    ret = raw_apply_options(bs, s, offset, has_size, size, errp);
    aio_context_release(ctx);

    if (ret < 0) {
        return ret;
    }

    if (bdrv_is_sg(bs) && (s->offset || s->has_size)) {
        error_setg(errp, "Cannot use offset/size with SCSI generic devices");
        return -EINVAL;
    }

    return 0;
}

raw_open函数和qcow2_open函数机制不同,其中并没有协程。

核心代码段为:

    bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                    file_role, false, errp);
    if (!bs->file) {
        return -EINVAL;
    }

bdrv_open_child函数在block.c中,代码如下:

/*
 * Opens a disk image whose options are given as BlockdevRef in another block
 * device's options.
 *
 * If allow_none is true, no image will be opened if filename is false and no
 * BlockdevRef is given. NULL will be returned, but errp remains unset.
 *
 * bdrev_key specifies the key for the image's BlockdevRef in the options QDict.
 * That QDict has to be flattened; therefore, if the BlockdevRef is a QDict
 * itself, all options starting with "${bdref_key}." are considered part of the
 * BlockdevRef.
 *
 * The BlockdevRef will be removed from the options QDict.
 *
 * The caller must hold the lock of the main AioContext and no other AioContext.
 * @parent can move to a different AioContext in this function. Callers must
 * make sure that their AioContext locking is still correct after this.
 */
BdrvChild *bdrv_open_child(const char *filename,
                           QDict *options, const char *bdref_key,
                           BlockDriverState *parent,
                           const BdrvChildClass *child_class,
                           BdrvChildRole child_role,
                           bool allow_none, Error **errp)
{
    BlockDriverState *bs;
    BdrvChild *child;
    AioContext *ctx;

    GLOBAL_STATE_CODE();

    bs = bdrv_open_child_bs(filename, options, bdref_key, parent, child_class,
                            child_role, allow_none, errp);
    if (bs == NULL) {
        return NULL;
    }

    ctx = bdrv_get_aio_context(bs);
    aio_context_acquire(ctx);
    child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
                              errp);
    aio_context_release(ctx);

    return child;
}

仔细读一下函数注释:

函数功能:

打开一个磁盘映像,其选项在另一个块设备的选项中以BlockdevRef给出。

参数说明:

  • bool allow_none

如果allow_none为true,则当filename为false且未给出BlockdevRef时,则不会打开任何图像。将返回NULL,但errp仍未设置。

  • const char *bdref_key

bdrev_key在选项QDict中指定映像的BlockdevRef的密钥。

QDict必须被平坦化。因此,如果BlockdevRef是QDict本身,所有以“${bdref_key}.”开头的选项都被视为BlockdevRef的一部分。

BlockdevRef将从选项QDict中删除。调用者必须持有主AioContext的锁,而不能持有其他AioContext。

父节点可以在此函数中移动到不同的AioContext。调用者必须确保在此之后它们的AioContext锁定仍然正确。

这里插一句:实际上,上一回的qcow2_open函数中调用的bdrv_open_file_child函数,就是bdrv_open_child函数多了一层封装。

bdrv_open_child_bs函数也在block.c中,代码如下:

static BlockDriverState *
bdrv_open_child_bs(const char *filename, QDict *options, const char *bdref_key,
                   BlockDriverState *parent, const BdrvChildClass *child_class,
                   BdrvChildRole child_role, bool allow_none, Error **errp)
{
    BlockDriverState *bs = NULL;
    QDict *image_options;
    char *bdref_key_dot;
    const char *reference;

    assert(child_class != NULL);

    bdref_key_dot = g_strdup_printf("%s.", bdref_key);
    qdict_extract_subqdict(options, &image_options, bdref_key_dot);
    g_free(bdref_key_dot);

    /*
     * Caution: while qdict_get_try_str() is fine, getting non-string
     * types would require more care.  When @options come from
     * -blockdev or blockdev_add, its members are typed according to
     * the QAPI schema, but when they come from -drive, they're all
     * QString.
     */
    reference = qdict_get_try_str(options, bdref_key);
    if (!filename && !reference && !qdict_size(image_options)) {
        if (!allow_none) {
            error_setg(errp, "A block device must be specified for \"%s\"",
                       bdref_key);
        }
        qobject_unref(image_options);
        goto done;
    }

    bs = bdrv_open_inherit(filename, reference, image_options, 0,
                           parent, child_class, child_role, errp);
    if (!bs) {
        goto done;
    }

done:
    qdict_del(options, bdref_key);
    return bs;
}

bdrv_open_child_bs函数中又调用了前文书讲过的bdrv_open_inherit函数(参见 QEMU源码全解析 —— 块设备虚拟化(21)-CSDN博客 )。这似乎有个循环,但以目前的认知,尚未弄清楚结束条件。后续需要待完全捋清楚后,再行详解。

至此,QEMU启动过程中的块设备虚拟化就基本讲解完了。再来回顾一下流程:

1)QEMU的启动命令行中设置存储相关参数(-drive、-device等);

2)qemu_create_early_backends函数(qemu_init函数中调用)中调用configure_blockdev函数对于QEMU命令行中的启动参数-drive进行解析,并且初始化这个设备要调用drive_init_func函数;

3)drive_init_func函数中,会调用drive_new函数创建一个设备;

4)driver_new函数中,也会解析QEMU的启动参数-device,将driver设置为virblk-blk-pci;还会解析file参数;

5)drive_new函数接下来会调用blockdev_init函数,根据参数进行初始化;

6)blockdev_init函数调用blk_new_open函数,打开宿主机上的磁盘文件,返回BlockBackend,即virtio后端;

7)blk_new_open函数通过调用链bdrv_open() -> bdrv_open_inherit() -> bdrv_open_common(),根据磁盘文件的格式,得到BlockDriver。对于不同的格式,其打开文件的方式不同,如对于qcow2格式,其打开函数为qcow2_open函数;而对于raw格式,其打开文件的函数为raw_open函数;

8)drive_new函数最后会创建一个DriverInfo来管理打开的设备。

下一回开始,对于向虚拟机的一个进程写入一个文件的完整过程进行解析。