SORTARRAY 函数声明如下。ARRAYsortarray(ARRAYT) 用途:对给定数组中的数据排序。 参数说明:ARRAY,ARRAY类型的数据。数组中的数据可为任意类型。 返回值:ARRAY类型。 示例如下。建表CREATETABLEsortarray(c1ARRAYSTRING,c2ARRAYINT);装载数据INSERTOVERWRITETABLEsortarraySELECTarray(d,c,b,a)ASc1,array(4,3,2,1)ASc2;查询SELECTsortarray(c1),sortarray(c2)FROM结果〔a,b,c,d〕〔1,2,3,4〕分析函数 基本语法analyticfunctionname(〔argumentlist〕)OVER(〔PARTITIONBYpartitionexpression,〕〔ORDERBYsortexpression,〔ASCDESC〕〕)analyticfunctionname:函数名称比如RANK(),SUM(),FIRST()等等partitionexpression:分区列sortexpression:排序列案例数据准备CREATETABLEordersnew(ordernumStringCOMMENT订单号,orderamountDECIMAL(12,2)COMMENT订单金额,advanceamountDECIMAL(12,2)COMMENT预付款,orderdatestringCOMMENT订单日期,custcodestringCOMMENT客户,agentcodestringCOMMENT代理商);INSERTINTOordersVALUES(200100,1000。00,600。00,20200801,C00013,A003);INSERTINTOordersVALUES(200110,3000。00,500。00,20200415,C00019,A010);INSERTINTOordersVALUES(200107,4500。00,900。00,20200830,C00007,A010);INSERTINTOordersVALUES(200112,2000。00,400。00,20200530,C00016,A007);INSERTINTOordersVALUES(200113,4000。00,600。00,20200610,C00022,A002);INSERTINTOordersVALUES(200102,2000。00,300。00,20200525,C00012,A012);INSERTINTOordersVALUES(200114,3500。00,2000。00,20200815,C00002,A008);INSERTINTOordersVALUES(200122,2500。00,400。00,20200916,C00003,A004);INSERTINTOordersVALUES(200118,500。00,100。00,20200720,C00023,A006);INSERTINTOordersVALUES(200119,4000。00,700。00,20200916,C00007,A010);INSERTINTOordersVALUES(200121,1500。00,600。00,20200923,C00008,A004);INSERTINTOordersVALUES(200130,2500。00,400。00,20200730,C00025,A011);INSERTINTOordersVALUES(200134,4200。00,1800。00,20200925,C00004,A005);INSERTINTOordersVALUES(200108,4000。00,600。00,20200215,C00008,A004);INSERTINTOordersVALUES(200103,1500。00,700。00,20200515,C00021,A005);INSERTINTOordersVALUES(200105,2500。00,500。00,20200718,C00025,A011);INSERTINTOordersVALUES(200109,3500。00,800。00,20200730,C00011,A010);INSERTINTOordersVALUES(200101,3000。00,1000。00,20200715,C00001,A008);INSERTINTOordersVALUES(200111,1000。00,300。00,20200710,C00020,A008);INSERTINTOordersVALUES(200104,1500。00,500。00,20200313,C00006,A004);INSERTINTOordersVALUES(200106,2500。00,700。00,20200420,C00005,A002);INSERTINTOordersVALUES(200125,2000。00,600。00,20201001,C00018,A005);INSERTINTOordersVALUES(200117,800。00,200。00,20201020,C00014,A001);INSERTINTOordersVALUES(200123,500。00,100。00,20200916,C00022,A002);INSERTINTOordersVALUES(200120,500。00,100。00,20200720,C00009,A002);INSERTINTOordersVALUES(200116,500。00,100。00,20200713,C00010,A009);INSERTINTOordersVALUES(200124,500。00,100。00,20200620,C00017,A007);INSERTINTOordersVALUES(200126,500。00,100。00,20200624,C00022,A002);INSERTINTOordersVALUES(200129,2500。00,500。00,20200720,C00024,A006);INSERTINTOordersVALUES(200127,2500。00,400。00,20200720,C00015,A003);INSERTINTOordersVALUES(200128,3500。00,1500。00,20200720,C00009,A002);INSERTINTOordersVALUES(200135,2000。00,800。00,20200916,C00007,A010);INSERTINTOordersVALUES(200131,900。00,150。00,20200826,C00012,A012);INSERTINTOordersVALUES(200133,1200。00,400。00,20200629,C00009,A002);排序累加SELECTagentcode,orderdate,orderamount,SUM(orderamount)OVER(PARTITIONBYagentcodeORDERBYorderdatedescrowsBETWEENunboundedprecedingANDcurrentrow)totalrevFROMordersnewWHEREorderdate20200701ANDorderdate20200930;结果A00220200916500。00500。00A002202007203500。004000。00A00220200720500。004500。00A003202008011000。001000。00A003202007202500。003500。00A004202009231500。001500。00A004202009162500。004000。00A005202009254200。004200。00A006202007202500。002500。00A00620200720500。003000。00A008202008153500。003500。00A008202007153000。006500。00A008202007101000。007500。00A00920200713500。00500。00A010202009162000。002000。00A010202009164000。006000。00A010202008304500。0010500。00A010202007303500。0014000。00A011202007302500。002500。00A011202007182500。005000。00A01220200826900。00900。00AVG()和SUM()需求描述: 第三季度每个代理商的移动平均收入和总收入SELECTagentcode,orderdate,AVG(orderamount)OVER(PARTITIONBYagentcodeORDERBYorderdate)avgrev,SUM(orderamount)OVER(PARTITIONBYagentcodeORDERBYorderdate)totalrevFROMordersWHEREorderdate20200701ANDorderdate20200930;结果输出A0022020072020004000A0022020072020004000A0022020091615004500A0032020072025002500A0032020080117503500A0042020091625002500A0042020092320004000A0052020092542004200A0062020072015003000A0062020072015003000A0082020071010001000A0082020071520004000A0082020081525007500A00920200713500500A0102020073035003500A0102020083040008000A01020200916350014000A01020200916350014000A0112020071825002500A0112020073025005000A01220200826900900FIRSTVALUE()和LASTVALUE()firstvalue:取分组内排序后,截止到当前行,第一个值lastvalue:取分组内排序后,截止到当前行,最后一个值需求描述 客户首次购买后多少天才进行下一次购买SELECTcustcode,orderdate,datediff(orderdate,FIRSTVALUE(orderdate)OVER(PARTITIONBYcustcodeORDERBYorderdate))nextordergapFROMordersorderbycustcode,nextordergap结果输出C00001202007150C00002202008150C00003202009160C00004202009250C00005202004200C00006202003130C00007202008300C000072020091617C000072020091617C00008202002150C0000820200923221C00009202006290C000092020072021C000092020072021C00010202007130C00011202007300C00012202005250C000122020082693C00013202008010C00014202010200C00015202007200C00016202005300C00017202006200C00018202010010C00019202004150C00020202007100C00021202005150C00022202006100C000222020062414C000222020091698C00023202007200C00024202007200C00025202007180C000252020073012LEAD()和LAG()lead(valueexpr〔,offset〔,default〕〕):用于统计窗口内往下第n行值。第一个参数为列名,第二个参数为往下第n行(可选,默认为1),第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULLlag(valueexpr〔,offset〔,default〕〕):与lead相反,用于统计窗口内往上第n行值。第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)需求描述 代理商最近一次出售的最高订单金额是多少?SELECTagentcode,orderamount,LAG(orderamount,1)OVER(PARTITIONBYagentcodeORDERBYorderamountDESC)lasthighestamountFROMordersORDERBYagentcode,orderamountDESC;结果输出A001800NULLA0024000NULLA00235004000A00225003500A00212002500A0025001200A002500500A002500500A0032500NULLA00310002500A0044000NULLA00425004000A00415002500A00415001500A0054200NULLA00520004200A00515002000A0062500NULLA0065002500A0072000NULLA0075002000A0083500NULLA00830003500A00810003000A009500NULLA0104500NULLA01040004500A01035004000A01030003500A01020003000A0112500NULLA01125002500A0122000NULLA0129002000RANK()和DENSERANK() rank:对组中的数据进行排名,如果名次相同,则排名也相同,但是下一个名次的排名序号会出现不连续。比如查找具体条件的topN行。RANK()排序为(1,2,2,4) denserank:denserank函数的功能与rank函数类似,denserank函数在生成序号时是连续的,而rank函数生成的序号有可能不连续。当出现名次相同时,则排名序号也相同。而下一个排名的序号与上一个排名序号是连续的。 DENSERANK()排序为(1,2,2,3)需求描述 每月第二高的订单金额是多少?SELECTordernum,orderdate,orderamount,ordermonthFROM(SELECTordernum,orderdate,orderamount,DATEFORMAT(orderdate,YYYYMM)ASordermonth,DENSERANK()OVER(PARTITIONBYDATEFORMAT(orderdate,YYYYMM)ORDERBYorderamountDESC)orderrankFROMorders)tWHEREorderrank2ORDERBY结果输出20010620200420250020200420010320200515150020200520013320200629120020200620010120200715300020200720011420200815350020200820011920200916400020200920011720201020800202010REGEXPEXTRACT命令格式stringregexpextract(string,string〔,bigint〕)li命令说明将字符串source按照pattern的规则拆分为组,返回第occurrence个组的字符串。参数说明source:必填。STRING类型,待拆分的字符串。pattern:必填。STRING类型常量或正则表达式。待匹配的模型。occurrence:可选。BIGINT类型常量,必须大于等于0。返回值说明返回STRING类型。返回规则如下:如果pattern为空串或pattern中没有分组,返回报错。occurrence非BIGINT类型或小于0时,返回报错。不指定时默认为1,表示返回第一个组。如果occurrence等于0,则返回满足整个pattern的子串。source、pattern或occurrence值为NULL时,返回NULL。示例selectregexpextract(foothebar,(foo)(。?)(bar),0);返回foothebar selectregexpextract(foothebar,(foo)(。?)(bar),1);返回foo selectregexpextract(foothebar,(foo)(。?)(bar),2);返回the selectregexpextract(foothebar,(foo)(。?)(bar),3);返回barul多行数据合并为一行数据WMCONCAT命令格式stringwmconcat(string,string)li命令说明用指定的separator做分隔符,连接colname中的值。参数说明separator:必填。STRING类型常量,分隔符。colname:必填。STRING类型。如果输入为BIGINT、DOUBLE或DATETIME类型,会隐式转换为STRING类型后参与运算。返回值说明返回STRING类型。返回规则如下:separator非STRING类型常量时,返回报错。colname非STRING、BIGINT、DOUBLE或DATETIME类型时,返回报错。colname值为NULL时,该行不会参与计算。示例ul建表CREATETABLEstu(classSTRING,genderSTRING,nameSTRING);装载数据INSERTINTOTABLEstuSELECT1,M,INSERTINTOTABLEstuSELECT1,F,INSERTINTOTABLEstuSELECT1,M,INSERTINTOTABLEstuSELECT1,M,INSERTINTOTABLEstuSELECT2,F,INSERTINTOTABLEstuSELECT2,M,查询SELECTclass,wmconcat(distinct,,name)FROMstuGROUPBYKEYVALUE命令格式keyvalue(string,〔string,string,〕string) keyvalue(string,string)li命令说明将字符串str按照split1分成KeyValue对,并按split2将KeyValue对分开,返回key所对应的Value。参数说明即默认的分隔符是;,KV之间的分割是:key:必填。STRING类型。将字符串按照split1和split2拆分后,返回key值对应的Value。str:必填。STRING类型。待拆分的字符串。split1、split2:可选。STRING类型。用于作为分隔符的字符串,按照指定的两个分隔符拆分源字符串。如果表达式中没有指定这两项,默认split1为;,split2为:。当某个被split1拆分后的字符串中有多个split2时,返回结果未定义。返回值说明返回STRING类型。返回规则如下:split1或split2值为NULL时,返回NULL。str或key值为NULL或没有匹配的key时,返回NULL。如果有多个KeyValue匹配,返回第一个匹配上的key对应的Value。ulselectkeyvalue(0:1;1:2,1);返回2selectkeyvalue(spm123。qwe,cpn101,act890,,,,spm)返回123。qwe优化相关distributebysortbyV。Sorderbyorderby将结果按某字段全局排序,这会导致所有map端数据都进入一个reducer中,在数据量大时可能会长时间计算不完distributeby用于控制map端数据分配到reducer的key,sortby会视情况启动多个reducer进行排序,并且保证每个reducer内局部有序groupbyV。Scount(distinct)当数据量级很大,用groupby,可以启动多个job数据集很小或者key的倾斜比较明显时,用count(distinct),少量的reduce就可以处理mapjoinHive会将buildtable和probetable在map端直接完成join过程,消灭了reduce,效率很高sethive。auto。convent。MAPJOIN(t1,t3,t4)