Space formatting data in csv

For some time now I have been trying to format data separated by spaces into a CSV structure.

Starting position

The source data table is set:

Dr. Arun Raykar MBBS, MS - ENT 9 years experience Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE    Malleswaram, Bangalore INR 250 MON-SAT7:00PM-9:00PM Book Appointment   
Dr. Hema Sanath C BHMS, CFN 0 years experience Homeopath Sankirana Homeopathic Clinic    Kalyan Nagar, Bangalore INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM Book Appointment   
Dr. Hema Ahuja BDS,M Phil 33 years experience Dentist V2 E City Family Dental Center     Electronics City, Bangalore INR 200 MON-SUN10:00AM-8:00PM Book Appointment

It contains many spaces and unnecessary information. Information is present like this:

Doctor name | Degree | Years of experience | Specialization | Hospital name | Address | Fees | Schedule | and an unnecessary book appointment field.

I want to convert it to the following format

Doctor name,Specialization,Hospital name,Address,Fees,Schedule

So, the current data should look like this:

 Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist,SHAKTHI E.N.T CARE,Malleswaram,INR 250,MON-SAT7:00PM-9:00PM
 Dr. Hema Sanath,Homeopath,Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250,MON-SAT10:00AM-2:00PM6:30PM-8:00PM   
 Dr. Hema Ahuja,Dentist,V2 E City Family Dental Center,Electronics City,INR 200,MON-SUN10:00AM-8:00PM

So far, I have managed to remove the "Book Assignment" field.

Problem

However, I have encountered difficulties in classifying the name of the hospital. Since the interval in it varies greatly. Is this a problem?

EDIT

The conclusion is cat -A fileas follows:

 Dr. Arun Raykar MBBS, MS - ENT 9 years experience Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE ^I Malleswaram, Bangalore INR 250 MON-SAT7:00PM-9:00PM Book Appointment $
 Dr. Hema Sanath C BHMS, CFN 0 years experience Homeopath Sankirana Homeopathic Clinic ^I Kalyan Nagar, Bangalore INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM Book Appointment $
 Dr. Hema Ahuja BDS,M Phil 33 years experience Dentist V2 E City Family Dental Center ^I Electronics City, Bangalore INR 200 MON-SUN10:00AM-8:00PM Book Appointment
+3
source share
2 answers

, perl :

perl -pe 's/^(\S+\s+\S+\s+\S+).+experience\s([^\t]+?)\s+(\b[A-Z0-9]{2}[^\t]+?|(?:(?!\b[A-Z0-9]{2})[^\t])*)\s+\t\s+([^,]+,).+?(INR.+?PM)\s+.*/\1,\2,\3,\4\5/' file

:

Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist,SHAKTHI E.N.T CARE,Malleswaram,INR 250 MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath,Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist,V2 E City Family Dental Center,Electronics City,INR 200 MON-SUN10:00AM-8:00PM

perl, regex101, , . , , , .

. :

  • , , , ;
  • , , - .

, , , , . - (.. , /), , , - .

+3

, . , , gawk ( >= 4.0, , 3.x ):

$ awk -F" \t " -v OFS="," -v S=" " '
{
    sub(/\s+$/, "");
    split($2, Data, /[ ,]{2,}/);
    Address  = Data[1];
    split($2, Data, / +/);
    nData    = length(Data);
    Schedule = Data[nData - 2];
    Fees     = Data[nData - 4] S Data[nData - 3];
    split($1, Data, / +/);
    Name     = Data[1] S Data[2] S Data[3]; # assume all names are Dr. Xxx Xxx only
    match($1, /[0-9]+ years experience /);
    SpecializationHospital = substr($1, RSTART + RLENGTH);
    print Name, SpecializationHospital, Address, Fees, Schedule;
} ' data.txt
Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE,Malleswaram,INR 250,MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250,MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist V2 E City Family Dental Center,Electronics City,INR 200,MON-SUN10:00AM-8:00PM
+2

All Articles