I downloaded and installed VM Cloudera 4.4 to play with Hadoop. I already have a cluster on the platform for my work, so I know a little how chaos works. Therefore, I think that my problem comes from my misunderstanding of Linux and its users and group.
With a hive:
I am trying to create a hive table with a shell and it works. I have a table in / user / hive / warehouse / test witch owned by user cloudera of cloudera group.
I have some data files (.txt) in hdfs: / user / cloudera (user: cloudera and group: hive) that I upload to my hive table with:
LOAD DATA INPATH '/user/cloudera/*.txt' INTO TABLE test;
This is what I got:
hive> LOAD DATA INPATH '/user/cloudera/jeuDeTest/*.txt' INTO TABLE test;
Loading data to table default.test
chgrp: changing ownership of '/user/hive/warehouse/test/_log24310.txt': User does not belong to hive
chgrp: changing ownership of '/user/hive/warehouse/test/_log24311.txt': User does not belong to hive
Table default.test stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 10161843, raw_data_size: 0]
OK
Time taken: 2.472 seconds
I never had such an error message, but the files were moved. If I try SELECT *, there will be no result.
With HBase:
HBase. , importTSV:
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
-Dimporttsv.columns=HBASE_ROW_KEY,cf:nl,ch:nt,cf:ti,cf:ip,cf:cr,cf:am,cf:op,cf:mr,cf:ct
'-Dimporttsv.separator=|' testhbase -Dimporttsv.skip.bad.lines=false
/user/cloudera/jeuDeTest/*.txt
:
ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE)
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist:
hdfs://localhost.localdomain:8020/user/cloudera/jeuDeTest/_logGeneral_C_24310_SO.txt
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist:
hdfs://localhost.localdomain:8020/user/cloudera/jeuDeTest/_logGeneral_C_24310_SO.txt
, - , , , , . ( , , , , )
, .
Angelik
cloudera . , .
hive> LOAD DATA INPATH '/user/cloudera/jeuDeTest/*.txt' INTO TABLE test;
Loading data to table default.test
Table default.test stats: [num_partitions: 0, num_files: 10, num_rows: 0, total_size: 10161843, raw_data_size: 0]
OK
Time taken: 0.486 seconds
hive> select * from test limit 20;
OK
Time taken: 0.303 seconds