Overall, I’m really happy that we have the ability to mix and match functions from various awesome packages depending on the problems we’re trying to solve! In short, join()-methods are fastest and use the least memory, fcase() whether in native data.table or dplyr is a bit slower/more memory than join but still ~5x faster/more memory efficient than case_when(), and case_when() is the slowest/most memory hungry (but translates into SQL if needed). Team = "WSH", NA_character_ ) ) } fcase_dt_native % as_tibble ( ) } join_dt_native % as_tibble ( ) }įinally we can check the timing/memory usage for all of the combos. Team_name = fcase ( team = "ARI", "Cardinals", TRUE ~ NA_character_ ) ) %>% as_tibble ( ) } fcase_dplyr % mutate ( TRUE ~ NA_character_ ) ) } join_dplyr % left_join ( team_join, by = "team" ) %>% select ( team, stat, team_name ) %>% as_tibble ( ) } case_when_dtplyr % mutate ( Library ( dtplyr ) case_when_dplyr % mutate ( Note that in most cases this is still pretty much instantaneous in “human time”. This will vary by the execution, but with 3 iterations and 100,000 rows, I have seen about a 10x speed improvement in left_join vs case_when. We can then compare their execution multiple times with the bench package. TRUE ~ NA_character_ ) ) %>% slice ( 1 ) } Team_name = case_when ( team = "ARI" ~ "Cardinals", Join_expr % slice ( 1 ) } case_when_expr % mutate ( While the above method is pretty quick to create and understand, we still had to type quite a bit (which is decreased if you use multiple cursors), and in many situations a true join is more efficient in terms of typing, “speed of thought”, and execution time. Once we have this “skeleton” quickly created we can then add the individual “match outputs”, like “Cardinals” for “ARI”, “Falcons” for “ATL”, etc, but we’ve saved ourself the need to type out a lot of the repeated “skeleton” of the case_when().
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |