Convert CSV to SequenceFile

I have a CSV file that I would like to convert to a SequenceFile, which I would eventually use to create NamedVectors for use in a clustering job. I used the seqdirectory command to try to create a SequenceFile, and then passed that output to seq2sparse with the -nv option to create NamedVectors. It seems like this gives one big vector as output, but in the end I want each line of my CSV to become a NamedVector. Where am I mistaken?

+5
source share
1 answer

seqdirectorythe command accepts each file as a document, so in fact you only have one document, so you only get one vector. In order for it to work correctly, you made each line of your CSV file the file itself, where the document key is the file name and the value is its contents, However, it is quite impractical if your case is large, since reading and writing a disk can become painful slow.

In practice, you better follow the links that I share in this comment.

+2
source

All Articles