Automatic modular rightsizing of Azure VM’s with special focus on Azure Virtual Desktop

It has long annoyed me that all the scaling options in Azure just add and remove hosts. They never target the host itself. Hosts are either under or overutilized in 84% of the case.

And this is especially relevant for AVD personal hostpools where users each have their own personal “VDI”.

So I’m releasing a custom PowerShell module called “ADDRS” (Azure Data Driven Right Sizing) that grabs mem/cpu performance of the VM or all VM’s in a resource group you tell it to check. It will then do some smart voodoo magic to determine what size out of an allowlist best fits.

Instructions / Example:

  • Use -WhatIf if you don’t want it to resize the VM
  • Use -Force if you want to resize a VM even if it is online (which will cause it to be shut down!)
  • Use -Boot if you want the VM to be started after resizing (by default it will stay deallocated)
  • Use -domain with your domain if your VM is domain joined
  • Use -region if your region is not westeurope
  • Use -Verbose if you want the full output incl financial projection
  • Use -Report if you want to output data to csv. Can be used together with -WhatIf
  • Modify minMemoryGB, maxMemoryGB, minvCPUs, maxvCPUs as desired for your usecase
  • You can adjust the preconfigured allowedVMTypes array to only allow specific VM types, by default it contains “Standard_D2ds_v4″,”Standard_D4ds_v4″,”Standard_D8ds_v4″,”Standard_D2ds_v5″,”Standard_D4ds_v5″,”Standard_D8ds_v5″,”Standard_E2ds_v4″,”Standard_E4ds_v4″,”Standard_E8ds_v4″,”Standard_E2ds_v5″,”Standard_E4ds_v5″,”Standard_E8ds_v5”. Overwrite it by using the following parameter:
    -allowedVMTypes @(“Standard_D4ds_v4″,”Standard_D8ds_v4”)
  • use -maintenanceWindowStartHour, -maintenanceWindowLengthInHours and –maintenanceWindowDay if you want to ignore performance data during a maintenance window (e.g. for patching) as that isn’t representative
  • Set an Azure Tag called LCRightSizeConfig with the value disabled on machines you want to ignore
  • Set an Azure Tag called LCRightSizeConfig with a machine type value (e.g. “Standard_D4ds_v4“) if you want to lock a specific size for that machine, this can be useful if you want the script to resize from current to target automatically when it runs while the VM has been deallocated.

Example -Verbose output of two VM’s being resized:

Requirements:

The module requires that you’ve added the % Processor Time and Available MBytes performance counters to Log Analytics:

and that your host(s) have the Azure Monitor agent installed.

The module will check if there is sufficient data about the machine in Azure Monitor, if not, no action will be taken. You can determine how far back the function looks by modifying $measurePeriodHours

If you’re using the more recent Azure Monitoring agent, add the perf counters here:

Required access

Virtual Machine Contributor to the resource group(s) containing your VM’s and Log Analytics Reader on your log analytics workspace.

Download / Installation

Option 1: Install-Module ADDRS

Option 2: get relevant functions/code from Git

and run the set-vmRightSize or set-rsgRightSize function, e.g.:

set-vmRightSize -targetVMName azvm01 -workspaceId 7ccd0949-2fd4-414e-b58c-c013cc6e445d

set-vmRightSize -targetVMName azvm01 -workspaceId 7ccd0949-2fd4-414e-b58c-c013cc6e445d -allowedVMTypes (“Standard_E8ds_v4″,”Standard_E2ds_v5″,”Standard_E4ds_v5″,”Standard_E8ds_v5”)

set-rsgRightSize -targetRSG rg-avd-we-01 -workspaceId 7ccd0949-2fd4-414e-b58c-c013cc6e445d

Scheduling

If you wish to run this automatically on a schedule, I recommend either using an Azure DevOps pipeline or Automation account. I’ve compiled a small guide on how to use ADDRS in an Azure Automation Account.

Right Sizing Frequency

It is recommended to match job schedules to the lookback period, or at least not run multiple times in the same lookback period. Otherwise, the data that is being used for sizing may not be representative if the machine had already been resized in an earlier run.

Issues / notes

  • Make sure you’ve got enough data in Log Analytics
  • Make sure the allowedVMTypes list contains only VM types that you can actually upgrade to. If e.g. your VM has an ephemeral disk, and your allowList has types that do not, the resize will fail with an error message (but no harm will be done to the existing VM)
  • I’ve only tested the maintenance window parameters using UTC time, if you’re using different timezones your results in excluding data generated during the maintenance window may vary from mine
  • Spot and Low Priority Azure pricing is excluded by default

Keyvault RBAC model ARM role assignment

Yes, using ARM, not Bicep, I know, it’s bad!

Ran into a whole bunch of constrains and issue trying to assign an array of principals vs roles on keyvault using the RBAC access method, so sharing my working solution here as I couldn’t find a single good example on google:

        {
            "type": "Microsoft.KeyVault/vaults/providers/roleAssignments",
            "apiVersion": "2018-09-01-preview",
            "copy": {
                "name": "rbac-access-policy-loop",
                "count": "[length(parameters('accessPolicies'))]"
            },            
            "name": "[concat(variables('vaultName'),'/Microsoft.Authorization/',guid(concat(variables('vaultName'), parameters('accessPolicies')[copyIndex('rbac-access-policy-loop')].objectId, parameters('accessPolicies')[copyIndex('rbac-access-policy-loop')].roleId)))]",
            "dependsOn": [
                "[resourceId('Microsoft.KeyVault/vaults', variables('vaultName'))]"
            ],
            "properties": {
                "roleDefinitionId": "[concat('/providers/Microsoft.Authorization/roledefinitions/',parameters('accessPolicies')[copyIndex('rbac-access-policy-loop')].roleId)]",
                "principalId": "[parameters('accessPolicies')[copyIndex('rbac-access-policy-loop')].objectId]",
                "scope": "[resourceId('Microsoft.KeyVault/vaults', variables('vaultName'))]",
                "principalType": "Group"
            }
        }   

An example param would then look like this:

        "accessPolicies": {
            "value": [
                {
                    "roleId": "b86a8fe4-44ce-4948-aee5-eccb2c155cd7",
                    "objectId": "2d9cbd23-20b1-4921-a8e4-54b55161ad04"
                }                
            ]
        }  

ODBC SQL driver error

If you know how to clear the ODBC Driver 17’s token cache, please comment 🙂


Ran into the following error today and putting it here for searches as literally nothing came up:

Run-time error '-2147467259 (80004005)':
[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]User option must be specified, if Authentication option is 'ActiveDirectoryInteractive'.

This was through an excel Macro, connecting to an Azure PaaS database. All other users were just getting prompted, but one kept having this issue.

We managed to ‘solve’ the issue by going into the macro properties and adding User ID=xxx@xxx.com to the connection string where xxx@xxx.com was the user’s actual UPN.

Unsharing Orphaned Onedrive for Business sites with active sharing links

When a user leaves the organization, their Onedrive folders/files remain until either the user is permanently deleted or the retention policy covering their data expires.

Many organizations have set up a retention policy in Office 365 to retain data in Onedrive for several years, sometimes even indefinitely.

Few know, that as long as you retain a user’s onedrive, the files and folders that were shared, by default, remain shared and accessible by those they were shared with, including externals.

This is often undesirable, and can easily be remediated by running a very simple unshare-orphanedOnedriveForBusinessSites.ps1 PowerShell function I’m hereby sharing with you 🙂

Above function detects all Onedrive Sites that no longer have an active associated user, and disables any sharing links on them.

The actual line that unshares an individual site could also be used directly if you have an automatic offboarding process.

(re) configuring hidden VPN Profile properties

Using MEM (Intune) we can automatically deploy VPN profiles to our users’ managed devices directly.

The set of parameters that can be configured in MEM is extremely limited compared to what actually ends up on the rasphone.pbk file (VPN Profile) on a Windows client.

Example of a .pbk file for an Azure P2S VPN connection with Conditional Access/cert based SSO:

[AzureVirtualNetwork]
Encoding=1
PBVersion=6
Type=4
AutoLogon=0
UseRasCredentials=1
LowDateTime=-1117351264
HighDateTime=30942358
DialParamsUID=927022140
Guid=AABC7C8342FD91458105A961BE471F8E
VpnStrategy=7
ExcludedProtocols=8
LcpExtensions=1
DataEncryption=256
SwCompression=1
NegotiateMultilinkAlways=1
SkipDoubleDialDialog=0
DialMode=0
OverridePref=15
RedialAttempts=0
RedialSeconds=0
IdleDisconnectSeconds=0
RedialOnLinkFailure=0
CallbackMode=0
CustomDialDll=
CustomDialFunc=
CustomRasDialDll=%windir%\system32\cmdial32.dll
ForceSecureCompartment=0
DisableIKENameEkuCheck=0
AuthenticateServer=0
ShareMsFilePrint=1
BindMsNetClient=1
SharedPhoneNumbers=0
GlobalDeviceSettings=0
PrerequisiteEntry=
PrerequisitePbk=
PreferredPort=VPN2-0
PreferredDevice=WAN Miniport (IKEv2)
PreferredBps=0
PreferredHwFlow=0
PreferredProtocol=0
PreferredCompression=0
PreferredSpeaker=0
PreferredMdmProtocol=0
PreviewUserPw=0
PreviewDomain=0
PreviewPhoneNumber=0
ShowDialingProgress=0
ShowMonitorIconInTaskBar=1
CustomAuthKey=13
CustomAuthData=314442430D000405C000000020005005C0000001500000014000000A8985D3A65E5E5C4B2D7D66D40C6DD2FB19C5436020001001230FE0006000100FCD02C00
CustomAuthData=3BCB684FDAE6ED1B763A3EDEB989B12C95EFFAFFD330281E75F1C671B03CDD800FF0844797977764005000500
AuthRestrictions=128
IpPrioritizeRemote=0
IpInterfaceMetric=1
IpHeaderCompression=1
IpAddress=0.0.0.0
IpDnsAddress=172.1.230.4
IpDns2Address=172.1.230.5
IpWinsAddress=0.0.0.0
IpWins2Address=0.0.0.0
IpAssign=1
IpNameAssign=2
IpDnsFlags=0
IpNBTFlags=1
TcpWindowSize=0
UseFlags=2
IpSecFlags=0
IpDnsSuffix=
Ipv6Assign=1
Ipv6Address=::
Ipv6PrefixLength=0
Ipv6PrioritizeRemote=1
Ipv6InterfaceMetric=0
Ipv6NameAssign=1
Ipv6DnsAddress=::
Ipv6Dns2Address=::
Ipv6Prefix=0000000000000000
Ipv6InterfaceId=0000000000000000
DisableClassBasedDefaultRoute=1
DisableMobility=0
NetworkOutageTime=0
IDI=
IDR=
ImsConfig=0
IdiType=0
IdrType=0
ProvisionType=0
PreSharedKey=
CacheCredentials=0
NumCustomPolicy=0
NumEku=0
UseMachineRootCert=0
Disable_IKEv2_Fragmentation=0
PlumbIKEv2TSAsRoutes=0
NumServers=0
RouteVersion=1
NumRoutes=0
NumNrptRules=0
AutoTiggerCapable=0
NumAppIds=0
NumClassicAppIds=0
SecurityDescriptor=
ApnInfoProviderId=
ApnInfoUsername=
ApnInfoPassword=
ApnInfoAccessPoint=
ApnInfoAuthentication=1
ApnInfoCompression=0
DeviceComplianceEnabled=0
DeviceComplianceSsoEnabled=0
DeviceComplianceSsoEku=
DeviceComplianceSsoIssuer=
WebAuthEnabled=0
WebAuthClientId=
FlagsSet=0
Options=0
DisableDefaultDnsSuffixes=0
NumTrustedNetworks=0
NumDnsSearchSuffixes=0
PowershellCreatedProfile=0
ProxyFlags=0
ProxySettingsModified=0
ProvisioningAuthority=
AuthTypeOTP=0
GREKeyDefined=0
NumPerAppTrafficFilters=0
AlwaysOnCapable=0
DeviceTunnel=0
PrivateNetwork=0

NETCOMPONENTS=
ms_msclient=1
ms_server=1

MEDIA=rastapi
Port=VPN2-0
Device=WAN Miniport (IKEv2)

DEVICE=vpn
PhoneNumber=azuregateway-12341ef-4922-4edc-a492-589b3e547c58-1ba19cb9ae52.vpn.azure.com
AreaCode=
CountryCode=0
CountryID=0
UseDialingRules=0
Comment=
FriendlyName=
LastSelectedPhone=0
PromoteAlternates=0
TryNextAlternateOnFail=1

Modifying VPN Profile settings

To allow admins further customization of these settings, I’ve written a Proactive Remediation script that can customize any VPN profile property to any value you specify.

In our case, we used it to set IpInterfaceMetric, which defaults to 0, causing ambiguously routed traffic to never prefer the VPN connection (since this is a split tunnel connection). Setting it to 1 resolved our DNS/routing issues to certain private endpoints in our Azure environment.

Code / git link: https://gitlab.com/Lieben/assortedFunctions/-/blob/master/set-vpnConnectionInterfaceMetric.ps1

Microsoft 365, Azure, Automation & Code