First, we need to install fastaudio module
.
reticulate::py_install('fastaudio',pip = TRUE)
Grab data from TensorFlow Speech Commands (2.3 GB):
= "SPEECHCOMMANDS"
commands_path = get_audio_files(commands_path)
audio_files length(audio_files$items)
# [1] 105835
Prepare dataset and put into data loader:
= SpectrogramTransformer(mel=TRUE, to_db=TRUE)
DBMelSpec = DBMelSpec()
a2s = ResizeSignal(4000)
crop_4000ms = list(crop_4000ms, a2s) tfms
= DataBlock(blocks = list(AudioBlock(), CategoryBlock()),
auds get_items = get_audio_files,
splitter = RandomSplitter(),
item_tfms = tfms,
get_y = parent_label)
= auds %>% dataloaders(commands_path, item_tfms = tfms, bs = 20) audio_dbunch
See batch:
%>% show_batch(figsize = c(15, 8.5), nrows = 3, ncols = 3, max_n = 9, dpi = 180) audio_dbunch
Before fitting, 3 channels to 1 channel:
= torch()
torch = nn()
nn
= Learner(dls, xresnet18(pretrained = FALSE), nn$CrossEntropyLoss(), metrics=accuracy)
learn
# channel from 3 to 1
$model[0][0][['in_channels']] %f% 1L
learn# reshape
<- torch$nn$parameter$Parameter(
new_weight_shape $model[0][0]$weight %>% narrow('[:,1,:,:]'))$unsqueeze(1L))
(learn
# assign with %f%
$model[0][0][['weight']] %f% new_weight_shape learn
Weights and biases could be save and visualized on wandb.ai:
# login for the 1st time then remove it
login("API_key_from_wandb_dot_ai")
init(project='R')
wandb: Currently logged in as: henry090 (use `wandb login --relogin` to force relogin)
wandb: Tracking run with wandb version 0.10.8
wandb: Syncing run macabre-zombie-2
wandb: ⭐️ View project at https://wandb.ai/henry090/speech_recognition_from_R
wandb: 🚀 View run at https://wandb.ai/henry090/speech_recognition_from_R/runs/2sjw3juv
wandb: Run data is saved locally in wandb/run-20201030_224503-2sjw3juv
wandb: Run `wandb off` to turn off syncing.
Now we can train our model:
%>% fit_one_cycle(3, lr_max=slice(1e-2), cbs = list(WandbCallback())) learn
epoch train_loss valid_loss accuracy time
------ ----------- ----------- --------- -----
epoch train_loss valid_loss accuracy time
------ ----------- ----------- --------- -----
WandbCallback requires use of "SaveModelCallback" to log best model
0 0.590236 0.728817 0.787121 04:18
WandbCallback was not able to get prediction samples -> wandb.log must be passed a dictionary
1 0.288492 0.310335 0.908490 04:19
2 0.182899 0.196792 0.941088 04:10
See beautiful dashboard here:
https://wandb.ai/henry090/speech_recognition_from_R/runs/2sjw3juv?workspace=user-henry090