How to effectively normalize import data?

Question

How to effectively normalize import data?

I have a flat file containing denormalized data. For good reason, I cannot change this. I need to bring it to normalized linked tables for use in LightSwitch . The data does not contain identifier column values for any of the source tables. I have four columns:

Division    Branch    Position    Location

Normalized data schema: Sections contain branches. Branches contain line items. Positions and locations are many-to-many relationships through the PositionLocationMappings table.

I am doing a BULK INSERT to get denormalized data into a table. Then I process it line by line, calling the stored procedure for each line. The source file has about 16,000 lines, and it takes 27 seconds, which seems a bit slow. Is there any way to do this faster?

This is in my PostDeployment.sql script:

DECLARE @division nvarchar(240)
DECLARE @branch nvarchar(240)
DECLARE @position nvarchar(240)
DECLARE @location nvarchar(60)

DECLARE myCursor CURSOR LOCAL FOR
    SELECT DISTINCT Division,Branch,Position,Location
        FROM [staging].BranchPositions

OPEN myCursor

FETCH NEXT FROM myCursor INTO @division, @branch, @position, @location

WHILE @@FETCH_STATUS = 0 BEGIN
    EXECUTE [dbo].[usp_InsertBranchPositions] @division,@branch,@position,@location

    FETCH NEXT FROM myCursor INTO @division, @branch, @position, @location
END

CLOSE myCursor
DEALLOCATE myCursor

And here is stored proc:

ALTER PROCEDURE [dbo].[usp_InsertBranchPositions] 
    @division nvarchar(240),
    @branch   nvarchar(240),
    @position nvarchar(240),
    @location nvarchar(60)
AS
BEGIN
    SET NOCOUNT ON;

    BEGIN TRANSACTION

    DECLARE @divisionTable TABLE (InsertedDivisionId int)
    DECLARE @branchTable   TABLE (InsertedBranchId   int)
    DECLARE @positionTable TABLE (InsertedPositionId int)
    DECLARE @locationTable TABLE (InsertedLocationid int)

    DECLARE @divisionId int
    DECLARE @branchId   int
    DECLARE @positionId int
    DECLARE @locationId int

    SELECT @divisionId = [Id] FROM [dbo].[Divisions]
     WHERE DivisionName = @division

    IF @divisionId IS NULL
    BEGIN
        INSERT INTO [dbo].[Divisions] (DivisionName, IsDivisionActive)
             VALUES (@division, 1)
        SELECT @divisionId = SCOPE_IDENTITY()
    END

    SELECT @branchId = [Id] FROM [dbo].[Branches]
     WHERE BranchName = @branch

    IF @branchId IS NULL
    BEGIN
       INSERT INTO [dbo].[Branches] (BranchName, IsBranchActive, DivisionId)
             VALUES (@branch, 1, @divisionId)
        SELECT @branchId = SCOPE_IDENTITY()
    END

    SELECT @positionId = [Id] FROM [dbo].[Positions]
     WHERE PositionName = @position

    IF @positionId IS NULL
    BEGIN
        INSERT INTO [dbo].[Positions] (PositionName, IsPositionActive, BranchId)
             VALUES (@position, 1, @branchId)
        SELECT @positionId = SCOPE_IDENTITY()
    END

    SELECT @locationId = [Id] FROM [dbo].[Locations]
     WHERE LocationName = @location

    IF @locationId IS NULL
    BEGIN
        INSERT INTO [dbo].[Locations] (LocationName, IsLocationActive)
             VALUES (@location, 1)
        SELECT @locationId = SCOPE_IDENTITY()
    END

    INSERT INTO [dbo].[PositionLocationMappings] (PositionId, LocationId)
         VALUES (@positionId, @locationId)

    COMMIT TRANSACTION
END

+3

sql-server sql-server-2012

user2871239 Feb 24 '14 at 10:04

source share

1 answer

Andomar · Accepted Answer · 2014-02-24T10:17:00+0000

Instead of calling a procedure for each row, you can import data using set-based operations.

For example, you can change the work that this piece does:

IF @divisionId IS NULL
BEGIN
    INSERT INTO [dbo].[Divisions] (DivisionName, IsDivisionActive)
         VALUES (@division, 1)
    SELECT @divisionId = SCOPE_IDENTITY()
END

To:

insert  Divisions 
        (DivisionName, IsDivisionActive)
select  distinct DivisionName
,       1
from    BranchPositions

Then for Branchesyou should use jointo search DivisionId:

insert  Branches
        (BranchName, IsBranchActive, DivisionId)
select  distinct BranchName
,       1
,       d.Id
from    BranchPositions bp
join    Divisions d
on      bp.DivisionName = d.DivisionName

Etc. It should be much faster, I used it to import billions of rows per minute.

How to effectively normalize import data?

More articles: