US20190311298
(Ab)
Systems and methods are provided for training(*訓練する、「学習させる」?)a machine learned model(*機械学習モデル)on a large number of devices, each device acquiring a local set of training data without sharing data sets across devices. The devices train the model on the respective device's set of training data. The devices communicate a parameter vector from the trained model asynchronously with a parameter server. The parameter server updates a master parameter vector and transmits the master parameter vector to the respective device.
- BACKGROUND
- [0002]
Massive volumes of data are collected every day by large number of devices such as smartphones and navigation devices. The data includes everything from user habits to images to speech and beyond. Analysis of the data could improve learning models(*学習モデル)and user experiences. For example, language models can improve speech recognition and text entry, and image models can help automatically identify photos.
- [0003]
The complex problem of training these models could be solved by large scale distributed computing by taking advantage of the resource storage, computing power, cycles, content, and bandwidth of participating devices available at edges of a network. In such a distributed machine learning(*機械学習)scenario, the dataset is transmitted to or stored among multiple edge devices. The devices solve a distributed optimization problem to collectively learn(*学習する)the underlying model. For distributed computing, similar (or identical) datasets may be allocated to multiple devices that are then able to solve a problem in parallel.
20200042873
[0037]
In act 20, training data(*訓練データ)is obtained. The training data is directed to the purpose of the machine training. For example, the machine is to learn(*学習する)a model to relate input MR values to multiple MR parameters in MR fingerprinting. MR data in the image domain (i.e., after transform from k-space) is to be input to the model, and the model outputs values for two or more parameters. In the examples below, the machine is to train(*訓練する)a model for MR fingerprinting to output T1 and T2 values, but additional and/or different MR parameters may be used. In other embodiments, the model is trained for other types of MR imaging. MR is naturally complex-valued. The phase information may contain important information, such as in Dixon, phase-sensitive inversion recovery, MR fingerprinting, or MR spectroscopy. The output may be a phase image, such as flow encoding, MR elastography, MR temperature mapping, partial Fourier, off-resonance correction, k-space processing, or other phase-sensitive processing. Alternatively, the model is trained for ultrasound or other types of medical imaging. In yet other alternatives, the model is trained for operating on other types of measurements, such as in an electrical power supply system.
-
Different trainable complex-valued activation functions are provided, along with complex-valued linear operations. The complex-valued activation functions are trained, such as training for MR fingerprinting regression. The non-linearities are extended for complex values either by adapting them from the real domain to the complex domain or by adding customizable parameters in their definition. Learnable(*学習可能)parameters are included in the definition of the non-linearities. “Learning,” “learned,” or “learnable” terms refer to the process of backpropagation used to train(*訓練する)the neural networks or another machine-learned network. The shape of the different non-linearities for each layer or neuron is learned from the complex-value data.
- [0030]
Separable real activation functions are less expressive once in the two-dimensional (2D) complex domain. 2D activation functions with trainable parameters in a complex-value neural network provide improvement over non-trainable versions. The complex domain grid shift, rotation in complex value space, and/or variance across the complex domain grid may be learned in the complex representation of the activation function. The grid in the complex value domain and/or the covariance may or may not be used as learnable parameters. Complex kernel non-linearities provide a parameterization to learn(*学習する)the shape of the activation function. These learnable parameters for complex values may be learned and bring benefits by using more information in the input complex values. Complex non-linearities (activation function) are designed to fully exploit complex-valued information. The learned non-linearities may maintain the important phase information in the data.
- [0031]
FIGS. 1 and 8 show methods related to complex-valued machine modeling. The method may be a method to machine learn(*機械学習する)or train(*訓練する)a complex-valued model or may be a method for application of a machine-learned complex-valued model. FIG. 1 is directed to machine training of the model(*モデルの機械訓練、機械学習). FIG. 8 is directed to application of the machine-learned model(*機械学習したモデル), such as a complex-valued neural network.
train a model: モデルを訓練する、機械学習させる
learn a model: モデルを学習する
learned model, trained model: 学習したモデル、学習済みモデル
teach: 教示; タスクをマニピュレータに教示