You are basically right. There is a SW API that requires some special equipment to work properly. Implementations of semaphores in software, of which there are several, are based on some kind of HW instruction, which is guaranteed to be atomic.
Implementation of a semaphore requires atomicity in HW. Typically, HW instructions are not atomic.
To develop several, you need to implement a semaphore by reading and writing a piece of shared memory visible to more than one processor. Reading and writing the shared part of the memory is not an atomic operation as a whole: for example, if you read and then write, there may be other instructions that are planned between reading and writing.
source
share