-
Notifications
You must be signed in to change notification settings - Fork 30
Make clusterwide config upload a separate stage #1266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d1edaea
to
037e24b
Compare
bef2018
to
94f3001
Compare
43d6481
to
9321d8e
Compare
16bdca7
to
d03611e
Compare
Recently we've encountered aweird test behavior: a test (unit.upload from #1266) succeded when run alone, but failed in conjunction with unit.cfg_errors test. The reason is in monkeypatching and reload of lua modules used there. With this patch all test functions are moved to a simple empty server and run over net.box (remote-control)
Recently we've encountered a weird test behavior: a test (`unit.upload` from #1266) succeded when run alone, but failed in conjunction with `unit.cfg_errors` test. The reason is in monkeypatching and reloading of lua modules used there. With this patch, all test functions are moved to a simple empty server and run over netbox (remote-control).
Recently we've encountered a weird test behavior: a test (`unit.upload` from #1266) succeded when run alone, but failed in conjunction with `unit.cfg_errors` test. The reason is in monkeypatching and reloading of lua modules used there. With this patch, all test functions are moved to a simple empty server and run over netbox (remote-control).
|
||
local upload_path = get_upload_path(upload_id) | ||
local ok, err = utils.mktree(fio.dirname(upload_path)) | ||
fiber.testcancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testcancel
raises. You won't perform vars.upload_fibers[upload_id] = nil
if this function raises an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok. In this case the upload_fibers
will be cleaned by the cleanup
stage which cancelled the fiber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but it's not so obvious, please add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, in fact we can't guarantee that cleanup stage will be executed someday in case of temporary problems with network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but it's not so obvious, please add a comment.
Done
However, in fact we can't guarantee that cleanup stage will be executed someday in case of temporary problems with network.
The necessity of persisting upload_fibers
is to avoid a race between begin
and cleanup
. The necessity of cleaning up the upload_fibers
is to avoid canceling an innocent fiber that belongs to the iproto fiber pool and has already been reused by that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To sum up, if the cleanup stage wasn't called for some reason, the garbage will remain in /tmp
and in lua context. And it's ok for now.
local upload_path = get_upload_path(upload_id) | ||
local payload_path = fio.pathjoin(upload_path, 'payload') | ||
local ok, err = utils.file_write(payload_path, payload) | ||
fiber.testcancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testcancel
raises. You won't perform vars.upload_fibers[upload_id] = nil
if this function raises an error.
cartridge/upload.lua
Outdated
local random_path = utils.randomize_path(upload_path) | ||
|
||
local ok = fio.rename(upload_path, random_path) | ||
local _errno = errno() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's redundant action to call errno here. It's not needed if ok is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main danger of errno is that it may be overridden by some other consequent fio operation. I've changed this place, but in another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local _errno
local ok, err = fio.some_operation()
if not ok then
_errno = errno()
end
No redefinition, to excess actions.
|
||
local ok = fio.rmtree(random_path) | ||
local _errno = errno() | ||
if not ok then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's redundant action to call errno here. It's not needed if ok is true.
_G.__cartridge_upload_begin = function(...) return errors.pcall('E', upload_begin, ...) end | ||
_G.__cartridge_upload_transmit = function(...) return errors.pcall('E', upload_transmit, ...) end | ||
_G.__cartridge_upload_finish = function(...) return errors.pcall('E', upload_finish, ...) end | ||
_G.__cartridge_upload_cleanup = function(...) return errors.pcall('E', upload_cleanup, ...) end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't actually need to pass ...
here because you exactly know a number of arguments - one upload_id.
varargs are bit slower than usual function call with determined number of arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok to sacrifice that performance for the sake of code brevity. Hope it's not too much.
{env = {TARANTOOL_UPLOAD_PREFIX = g.upload_path_1}}, | ||
{env = {TARANTOOL_UPLOAD_PREFIX = g.upload_path_1}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's reasonable to increase number of servers up to 5 or even more. In case of problems with concurrency and shared resource problems we have more chances to catch a problem in CI.
For historical reasons, there were two issues in the
patch_clusterwide
logics:It was OK when the config was small and there were only a few instances in the cluster, but it's a problem for big installations nowadays.
This patch introduces a new
patch_clusterwide
stage - upload (consolidated in a separate module). It spreads the data across instances in a network-efficient manner.Also in this patch:
issues_test.lua
and remove unnecessary checks from cypresssigstop.spec.js
because the SIGStOP doesn't result in config lock and mismatch anymore.I didn't forget about
Close #1145